NVIDIA-Nemotron-3-Nano-4B (GGUF)

Text

NVIDIA-Nemotron-3-Nano-4B (GGUF)

Ultra-compact, high-efficiency text model optimized for local deployment and edge devices.

  • Lightweight powerhouse. A 4-billion parameter model designed to deliver high-quality reasoning and text generation with a minimal hardware footprint.

  • GGUF Optimized. Provided in the GGUF format for universal compatibility; runs efficiently on CPUs and consumer GPUs via frameworks like llama.cpp or local AI runners.

  • Mamba-Transformer Hybrid. Uses a unique architecture that combines the long-range memory of Transformers with the extreme speed of State Space Models (SSM), making it faster than standard 4B models.

  • Tool-calling ready. Despite its small size, it is specifically fine-tuned for structured tasks like JSON extraction, basic function calling, and following complex instructions.

  • Low-latency throughput. Built for instant responses. It is ideal for real-time applications where "time-to-first-token" must be near-zero.

Why pick it for Norman AI?

Nemotron-3-Nano 4B is the best choice for on-device or low-cost intelligence. Pick this if you need a model that can run locally on a laptop or a small server for simple automation, chat assistance, or data formatting without needing a massive NVIDIA cluster.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)