Developers

NVIDIA-Nemotron-3-Nano-4B (GGUF)

Text

NVIDIA-Nemotron-3-Nano-4B (GGUF)

Ultra-compact, high-efficiency text model optimized for local deployment and edge devices.

Lightweight powerhouse. A 4-billion parameter model designed to deliver high-quality reasoning and text generation with a minimal hardware footprint.
GGUF Optimized. Provided in the GGUF format for universal compatibility; runs efficiently on CPUs and consumer GPUs via frameworks like llama.cpp or local AI runners.
Mamba-Transformer Hybrid. Uses a unique architecture that combines the long-range memory of Transformers with the extreme speed of State Space Models (SSM), making it faster than standard 4B models.
Tool-calling ready. Despite its small size, it is specifically fine-tuned for structured tasks like JSON extraction, basic function calling, and following complex instructions.
Low-latency throughput. Built for instant responses. It is ideal for real-time applications where "time-to-first-token" must be near-zero.

Why pick it for Norman AI?

Nemotron-3-Nano 4B is the best choice for on-device or low-cost intelligence. Pick this if you need a model that can run locally on a laptop or a small server for simple automation, chat assistance, or data formatting without needing a massive NVIDIA cluster.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)

View Docs

‹ NVIDIA-Nemotron-3-Nano-30B-A3B (NVFP4)

FLUX.2 [klein] 9B ›

Home

Developers

Join Us

Contact

NVIDIA-Nemotron-3-Nano-4B (GGUF)

NVIDIA-Nemotron-3-Nano-4B (GGUF)

Why pick it for Norman AI?