Green Fern
Green Fern

stable-diffusion-2-base

Image

Stable Diffusion v2-base (≈ 1 B params, CreativeML Open RAIL++-M)

Latent-diffusion upgrade that trades the old CLIP-L text encoder for a beefier OpenCLIP-ViT/H and cleaner training data.

  • Sharper prompt understanding. New OpenCLIP-H embedding makes prompts more literal and gives you better edge detail than v1.5—just note that old prompt hacks won’t map 1-to-1.

  • 512 × 512 by default, scales up. The base checkpoint is trained at 512×512; a 768×768 “v-pred” sibling is finetuned on the same weights, with no extra parameters.

  • Same lightweight U-Net. Roughly the same ~860 M U-Net params as v1.5, so hardware needs don’t spike.

  • Runs on everyday GPUs. FP16 fits in about 6 GB VRAM; 4-bit quant or CPU inference can drop that under 3 GB.

  • Cleaned-up dataset. Trained from scratch on LAION-5B after heavy NSFW + aesthetic filtering, reducing nasty artefacts out-of-the-box.

  • Plug-and-play tooling. One-liner with diffusers, and fully supported in Automatic1111, ComfyUI, SD.Next, etc.—same workflow as any SD-1.x model.

Why pick it for Norman AI?

SD v2-base gives us cleaner images, better prompt control, and “fits-on-a-laptop” VRAM—all without touching our existing diffusion pipeline. Use it for quick hero art, social banners, or on-device generative features without re-architecting the stack.

response = await norman.invoke(
    {
        "model_name": "stable-diffusion-2-base",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": "A cat playing with a ball on mars"
            }
        ]
    }
)