Green Fern
Green Fern

Janus-Pro-7B

Image

Janus-Pro 7B (≈ 7 B params, MIT + DeepSeek Model License)

Unified transformer that both understands images and generates them.

  • Spec sheet. 30 decoder layers · 4 096-dim embed · 32 heads · 4 096-token context window; SigLIP-L vision encoder (384 × 384) for “look”, VQ tokenizer (↓16×) for “draw”.

  • Dual super-powers. Hits 79 MMBench on visual Q&A and 0.80 GenEval on text-to-image, edging past DALL-E 3 and Stable Diffusion 3-M in public leaderboards.

  • Hardware reality. Full-precision needs ~24 GB VRAM; 8-/4-bit quant cuts that to 12--14 GB, so one 3090/4090 or A10G card is plenty for prod inference.

  • Any-to-Any I/O. Accepts text ↔ image ↔ mixed sequences in one stream—no separate diffusion server, no cross-model routing.

  • Dev-ready. Works with transformers >= 4.51, Diffusers JanusProPipeline, vLLM, llama.cpp GGUF, Ollama, and a one-liner CLI (pip install janus-pro && janus-pro "<prompt>").

  • License heads-up. Code is MIT; weight use follows DeepSeek’s model license—commercial OK, but mind the usual “no illegal/abusive content” clauses.

Why pick it for Norman AI?

Janus-Pro 7B lets us ship multimodal chat and on-the-fly image generation from a single 24 GB GPU—no diffusion backend, no model juggling, Apache-style freedom. Perfect for a unified “creative assistant” tier or edge deployments where every watt and GPU slot counts.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "phi-4",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)