M Llama 4 Scout 💬 general Meta · 109B (17B active MoE) · 10M ctx · Llama 4 Community Largest open context window ever (10M tokens). MoE = only 17B params active, so inference is fast despite 109B total. Massive context (10M tokens)Document analysisMulti-turn reasoning ⚡ 8 tok/s Q4_K_M 58GB 24GB+ Q8 109GB 24GB+ FP16 218GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Studio M4 Max 192GB (FP16) or RTX 4090 (Q4) Velocidad (M3 Pro 32GB) Q4: 8 tok/s MoE: 17B active Instalar con Ollama ollama pull llama4-scout Copiar meta-llama/Llama-4-Scout-109B-Instruct 🤗 Modelo en HuggingFace Hardware
M Llama 4 Maverick 💬 general Meta · 400B (17B active, 128 experts) · 1M ctx · Llama 4 Community 128-expert MoE with only 17B active. Competes with GPT-4 class. Requires multi-GPU or cloud. Top-tier reasoningComplex analysisResearch ⚡ 3 tok/s Q4_K_M 200GB 24GB+ Q8 400GB 24GB+ FP16 800GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Multi-GPU cluster (not practical for edge) Velocidad (M3 Pro 32GB) Q4: 3 tok/s Requires 200GB+ Instalar con Ollama ollama pull llama4-maverick Copiar meta-llama/Llama-4-Maverick-400B-Instruct 🤗 Modelo en HuggingFace Hardware
G Gemma 3 4B 👁️ vision Google · 4B · 128K ctx · Gemma License Smallest multimodal model available. Only 3GB at Q4. Fits Jetson Orin Nano. Perfect for vision tasks on minimal hardware. Lightweight multimodalImage understandingEdge deployment128K context ⚡ 65 tok/s Q4_K_M 3GB 8GB Q8 4GB 8GB FP16 8GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB | Mac Mini M4 16GB Velocidad (M3 Pro 32GB) Q4: 65 tok/s FP16: 40 tok/s Instalar con Ollama ollama pull gemma3:4b Copiar google/gemma-3-4b-it 🤗 Modelo en HuggingFace Hardware
G Gemma 3 27B 👁️ vision Google · 27B (also 1B/4B/12B) · 128K ctx · Gemma License First multimodal open model that fits on a Mac Mini. 4B variant needs only 3GB. SigLIP vision encoder. 140+ languages. Multimodal (text + images)Document scanningInvoice processing128K context window ⚡ 14 tok/s Q4_K_M 16GB 16GB Q8 27GB 24GB+ FP16 54GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado 4B: any 8GB+ device | 27B: Mac Mini M4 Pro 32GB+ Velocidad (M3 Pro 32GB) Q4: 14 tok/s FP16: 7 tok/s Instalar con Ollama ollama pull gemma3:27b Copiar google/gemma-3-27b-it 🤗 Modelo en HuggingFace Hardware
G Gemma 4 27B 💬 general Google · 27B · 128K ctx · Apache 2.0 Apache 2.0 license (fully open). Best quality-per-param in the 27B class. Runs on Mac Mini M4 24GB at Q4. High-quality chatInstruction followingCreative writing ⚡ 14 tok/s Q4_K_M 16GB 16GB Q8 27GB 24GB+ FP16 54GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 24GB (Q4) or RTX 4060 Ti 16GB (Q4) Velocidad (M3 Pro 32GB) Q4: 14 tok/s FP16: 7 tok/s Instalar con Ollama ollama pull gemma4:27b Copiar google/gemma-4-27b 🤗 Modelo en HuggingFace Hardware
G Gemma 4 E4B 💬 general Google · 4B · 128K ctx · Apache 2.0 Tiny but punches above its weight. Runs on Jetson Orin Nano. Apache 2.0. Fast inferenceEdge deploymentMobile/IoT ⚡ 58 tok/s Q4_K_M 2.5GB 8GB Q8 4GB 8GB FP16 8GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (FP16) or any device Velocidad (M3 Pro 32GB) Q4: 58 tok/s FP16: 35 tok/s Instalar con Ollama ollama pull gemma4:e4b Copiar google/gemma-4-4b 🤗 Modelo en HuggingFace Hardware
A Qwen 3 32B 💬 general Alibaba · 32B · 131.072K ctx · Apache 2.0 Thinking mode (deep reasoning) + non-thinking mode. Best multilingual open model. Apache 2.0. Multilingual (29 languages)ReasoningTool use ⚡ 12 tok/s Q4_K_M 18GB 24GB Q8 32GB 24GB+ FP16 64GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4) Velocidad (M3 Pro 32GB) Q4: 12 tok/s FP16: 6 tok/s Instalar con Ollama ollama pull qwen3:32b Copiar Qwen/Qwen3-32B 🤗 Modelo en HuggingFace Hardware
A Qwen 3 8B 💬 general Alibaba · 8B · 131.072K ctx · Apache 2.0 Sweet spot for edge: great quality at 8B, dual thinking modes, strong Spanish. Fits on 8GB devices. General purposeSpanish languageFast reasoning ⚡ 45 tok/s Q4_K_M 5GB 8GB Q8 8GB 8GB FP16 16GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q8) or Mac Mini M4 (Q4) Velocidad (M3 Pro 32GB) Q4: 45 tok/s FP16: 22 tok/s Instalar con Ollama ollama pull qwen3:8b Copiar Qwen/Qwen3-8B 🤗 Modelo en HuggingFace Hardware
M Phi-4 💬 general Microsoft · 14B · 16.384K ctx · MIT Best-in-class for STEM at 14B. MIT license. Fits on 8GB devices at Q4. STEM reasoningMathCode generation ⚡ 28 tok/s Q4_K_M 8GB 8GB Q8 14GB 16GB FP16 28GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or Mac Mini M4 (Q8) Velocidad (M3 Pro 32GB) Q4: 28 tok/s FP16: 14 tok/s Instalar con Ollama ollama pull phi4 Copiar microsoft/phi-4 🤗 Modelo en HuggingFace Hardware
D DeepSeek V3 💬 general DeepSeek · 671B (37B active MoE) · 128K ctx · DeepSeek License Only 37B active params via MoE. Cloud-only for most users but API is very cheap ($0.27/1M input). GPT-4 class reasoningComplex tasksResearch ⚡ 5 tok/s Q4_K_M 336GB 24GB+ Q8 671GB 24GB+ FP16 1342GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Cloud API recommended ($0.27/$1.10 per 1M tokens) Velocidad (M3 Pro 32GB) Q4: 5 tok/s 671B MoE, needs multi-GPU Instalar con Ollama ollama pull deepseek-v3 Copiar deepseek-ai/DeepSeek-V3 🤗 Modelo en HuggingFace Hardware
D DeepSeek R1 💬 general DeepSeek · 671B (MoE) / 7B-32B distilled · 128K ctx · MIT Best open-source reasoning model. 14B distill fits Mac Mini M4 16GB. Shows explicit thinking process. MIT license. Chain-of-thought reasoningMath (97.3% MATH-500)Code debuggingLegal/financial analysis ⚡ 4 tok/s Q4_K_M 10GB 16GB Q8 14GB 16GB FP16 28GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado 14B: Mac Mini M4 16GB | 32B: 32GB+ unified memory Velocidad (M3 Pro 32GB) Q4: 4 tok/s 685B MoE, needs multi-GPU Instalar con Ollama ollama pull deepseek-r1:14b Copiar deepseek-ai/DeepSeek-R1 🤗 Modelo en HuggingFace Hardware
A Qwen3-Coder-Next 💻 coding Alibaba · 80B (3B active MoE) · 262.144K ctx · Apache 2.0 Only 3B active params — blazing fast. 256K context for huge codebases. >70% SWE-Bench. Apache 2.0. Code generation (370 languages)SWE-Bench >70%Large repos ⚡ 5 tok/s Q4_K_M 42GB 24GB+ Q8 80GB 24GB+ FP16 160GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q4) or RTX 4090 (Q4) Velocidad (M3 Pro 32GB) Q4: 5 tok/s Needs 48GB+ Instalar con Ollama ollama pull qwen3-coder-next Copiar Qwen/Qwen3-Coder-Next-80B 🤗 Modelo en HuggingFace Hardware
A Qwen2.5-Coder 32B 💻 coding Alibaba · 32B · 131.072K ctx · Apache 2.0 Best open coding model at 32B class. Strong at Python, JS, Go. Apache 2.0. Code completionDebuggingRefactoring ⚡ 12 tok/s Q4_K_M 18GB 24GB Q8 32GB 24GB+ FP16 64GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4) Velocidad (M3 Pro 32GB) Q4: 12 tok/s FP16: 6 tok/s Instalar con Ollama ollama pull qwen2.5-coder:32b Copiar Qwen/Qwen2.5-Coder-32B-Instruct 🤗 Modelo en HuggingFace Hardware
A Qwen2.5-Coder 7B 💻 coding Alibaba · 7B · 131.072K ctx · Apache 2.0 Best small coding model. Fits on any device. Fast enough for real-time autocomplete. Fast code assistAutocompleteEdge coding ⚡ 48 tok/s Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q8) or any device Velocidad (M3 Pro 32GB) Q4: 48 tok/s FP16: 27 tok/s Instalar con Ollama ollama pull qwen2.5-coder:7b Copiar Qwen/Qwen2.5-Coder-7B-Instruct 🤗 Modelo en HuggingFace Hardware
D DeepSeek-Coder V2 💻 coding DeepSeek · 236B (21B active MoE) · 128K ctx · DeepSeek License MoE with 21B active. Near GPT-4 coding quality. Best via API ($0.14/$0.28 per 1M). Complex code tasksMulti-file editsArchitecture ⚡ 10 tok/s Q4_K_M 120GB 24GB+ Q8 236GB 24GB+ FP16 472GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Cloud API recommended Velocidad (M3 Pro 32GB) Q4: 10 tok/s 236B MoE Instalar con Ollama ollama pull deepseek-coder-v2 Copiar deepseek-ai/DeepSeek-Coder-V2-Instruct 🤗 Modelo en HuggingFace Hardware
B StarCoder2 15B 💻 coding BigCode · 15B · 16.384K ctx · BigCode OpenRAIL-M Trained on The Stack v2. Great for autocomplete. Supports 600+ programming languages. Code completionFill-in-the-middle600+ languages ⚡ 22 tok/s Q4_K_M 9GB 16GB Q8 15GB 16GB FP16 30GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 24GB (Q8) or Jetson 8GB (Q4) Velocidad (M3 Pro 32GB) Q4: 22 tok/s FP16: 11 tok/s Instalar con Ollama ollama pull starcoder2:15b Copiar bigcode/starcoder2-15b 🤗 Modelo en HuggingFace Hardware
M CodeLlama 34B 💻 coding Meta · 34B · 16.384K ctx · Llama 2 Community Proven and stable. Strong at Python, C++, Java. Good for production use where stability matters. Infill generationPython specialistLarge codebase nav ⚡ 10 tok/s Q4_K_M 19GB 24GB Q8 34GB 24GB+ FP16 68GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4) Velocidad (M3 Pro 32GB) Q4: 10 tok/s Instalar con Ollama ollama pull codellama:34b Copiar codellama/CodeLlama-34b-Instruct-hf 🤗 Modelo en HuggingFace Hardware
L LLaVA 1.6 34B 👁️ vision LLaVA Team · 34B · 4.096K ctx · Apache 2.0 Best open vision model for image understanding. Can describe images, answer questions about photos, read documents. Image understandingVisual QADocument OCR ⚡ 10 tok/s Q4_K_M 19GB 24GB Q8 34GB 24GB+ FP16 68GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q8) or RTX 4090 (Q4) Velocidad (M3 Pro 32GB) Q4: 10 tok/s Instalar con Ollama ollama pull llava:34b Copiar llava-hf/llava-v1.6-34b-hf 🤗 Modelo en HuggingFace Hardware
G Gemma 3n E4B 👁️ vision Google · 4B (edge-optimized) · 32.768K ctx · Apache 2.0 Edge-optimized multimodal. Runs on phones and Jetson. Audio, video, image, text — all in 4B params. On-device visionAudio+Video+Image+TextMobile AI ⚡ 55 tok/s Q4_K_M 2.5GB 8GB Q8 4GB 8GB FP16 8GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB or phone Velocidad (M3 Pro 32GB) Q4: 55 tok/s FP16: 32 tok/s Nano variant Instalar con Ollama ollama pull gemma3n:e4b Copiar google/gemma-3n-e4b 🤗 Modelo en HuggingFace Hardware
A Qwen2.5-VL 72B 👁️ vision Alibaba · 72B · 32.768K ctx · Apache 2.0 Understands video, multiple images, and documents. Agent-ready with tool use. Apache 2.0. Video understandingMulti-image analysisDocument parsing ⚡ 5 tok/s Q4_K_M 40GB 24GB+ Q8 72GB 24GB+ FP16 144GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 Pro 48GB (Q4) or cloud Velocidad (M3 Pro 32GB) Q4: 5 tok/s Needs 48GB+ Instalar con Ollama ollama pull qwen2.5-vl:72b Copiar Qwen/Qwen2.5-VL-72B-Instruct 🤗 Modelo en HuggingFace Hardware
O InternVL 2.5 78B 👁️ vision OpenGVLab · 78B · 32.768K ctx · MIT Top-tier on chart/diagram understanding benchmarks. MIT license. Strong OCR. Chart/diagram understandingOCRVisual reasoning ⚡ 4 tok/s Q4_K_M 42GB 24GB+ Q8 78GB 24GB+ FP16 156GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Cloud or multi-GPU Velocidad (M3 Pro 32GB) Q4: 4 tok/s Needs 48GB+ Instalar con Ollama ollama pull internvl2.5:78b Copiar OpenGVLab/InternVL2.5-78B 🤗 Modelo en HuggingFace Hardware
M Florence 2 Large 👁️ vision Microsoft · 0.77B · 4.096K ctx · MIT Tiny (0.77B) but excellent at CV tasks. Runs on any device. MIT license. Perfect for edge vision. Object detectionImage captioningOCRSegmentation Q4_K_M 512MB 8GB Q8 819MB 8GB FP16 1.5GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device (even Raspberry Pi) Velocidad (M3 Pro 32GB) FP16: 40 tok/s Light vision model Instalar con Ollama ollama pull florence2 Copiar microsoft/Florence-2-large 🤗 Modelo en HuggingFace Hardware
M Moondream 2 👁️ vision Moondream · 1.8B · 2.048K ctx · Apache 2.0 Smallest practical vision model. Runs on Jetson, RPi, phones. Apache 2.0. Great for IoT vision. Edge visionImage QALightweight multimodal Q4_K_M 1.1GB 8GB Q8 1.8GB 8GB FP16 3.6GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano, RPi 5, any device Velocidad (M3 Pro 32GB) FP16: 65 tok/s Ultra-light vision Instalar con Ollama ollama pull moondream Copiar moondream/moondream2 🤗 Modelo en HuggingFace Hardware
S all-MiniLM-L6-v2 🔗 embedding Sentence Transformers · 22M · 0.512K ctx · Apache 2.0 Industry standard for semantic search. 384 dimensions. Blazing fast. We use this in Vorlux AI RAG system. Semantic searchRAG retrievalDocument similarity Q4_K_M 31MB 8GB Q8 51MB 8GB FP16 92MB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device (22M params = negligible VRAM) Velocidad (M3 Pro 32GB) FP16: 500 tok/s Embedding Instalar con Ollama ollama pull all-minilm:l6-v2 Copiar sentence-transformers/all-MiniLM-L6-v2 🤗 Modelo en HuggingFace Hardware
N Nomic Embed Text v1.5 🔗 embedding Nomic AI · 137M · 8.192K ctx · Apache 2.0 8K context (16x MiniLM). Better for long documents. 768 dimensions. Apache 2.0. Long document embedding8K context chunksSearch Q4_K_M 82MB 8GB Q8 143MB 8GB FP16 276MB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device Velocidad (M3 Pro 32GB) FP16: 350 tok/s Embedding model Instalar con Ollama ollama pull nomic-embed-text Copiar nomic-ai/nomic-embed-text-v1.5 🤗 Modelo en HuggingFace Hardware
B BGE Large v1.5 🔗 embedding BAAI · 335M · 0.512K ctx · MIT Top MTEB benchmark scores. 1024 dimensions. MIT license. Best quality per compute for English. High-quality retrievalRerankingEnglish focus Q4_K_M 184MB 8GB Q8 348MB 8GB FP16 686MB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device Velocidad (M3 Pro 32GB) FP16: 320 tok/s Embedding Instalar con Ollama ollama pull bge-large Copiar BAAI/bge-large-en-v1.5 🤗 Modelo en HuggingFace Hardware
A GTE Base 🔗 embedding Alibaba DAMO · 109M · 0.512K ctx · MIT Great balance of speed and quality. 768 dimensions. MIT license. Good for production RAG. Balanced speed/qualityMultilingual embeddingProduction RAG Q4_K_M 61MB 8GB Q8 113MB 8GB FP16 225MB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device Velocidad (M3 Pro 32GB) FP16: 400 tok/s Embedding Instalar con Ollama ollama pull gte-base Copiar thenlper/gte-base 🤗 Modelo en HuggingFace Hardware
S Arctic Embed L v2 🔗 embedding Snowflake · 568M · 8.192K ctx · Apache 2.0 8K context, 1024 dimensions, strong multilingual. Top MTEB scores. Apache 2.0. Enterprise searchLong-context retrievalMultilingual Q4_K_M 307MB 8GB Q8 584MB 8GB FP16 1.1GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device Velocidad (M3 Pro 32GB) FP16: 300 tok/s Embedding Instalar con Ollama ollama pull snowflake-arctic-embed:l Copiar Snowflake/snowflake-arctic-embed-l-v2.0 🤗 Modelo en HuggingFace Hardware
B FLUX.1 Dev 🎨 image gen Black Forest Labs · 12B · — ctx · FLUX.1-dev Non-Commercial Best open image model. Exceptional text rendering and prompt adherence. NVFP4 variant runs on 8GB GPUs. High-quality image generationText renderingPrompt following Q4_K_M 7GB 8GB Q8 12GB 16GB FP16 24GB 24GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti 16GB (FP8) or RTX 4090 (FP16) Velocidad (M3 Pro 32GB) ~2 images/min on RTX 3080 Usar con ComfyUI → Ver workflows black-forest-labs/FLUX.1-dev 🤗 Modelo en HuggingFace Hardware
S SDXL 1.0 🎨 image gen Stability AI · 6.6B (2.6B UNet) · — ctx · Stability AI Community Massive LoRA/ControlNet ecosystem. Fast on consumer GPUs. Still the most flexible image model. Fast image genLoRA ecosystemControlNet Q4_K_M 2GB 8GB Q8 3.5GB 8GB FP16 6.5GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti (FP16) or any 8GB+ GPU Velocidad (M3 Pro 32GB) ~6 images/min on RTX 3080 Usar con ComfyUI → Ver workflows stabilityai/stable-diffusion-xl-base-1.0 🤗 Modelo en HuggingFace Hardware
S SDXL Turbo 🎨 image gen Stability AI · 6.6B · — ctx · Stability AI Community Single-step generation — near real-time images. Great for interactive apps and demos. Real-time generation1-step inferenceInteractive Q4_K_M 2GB 8GB Q8 3.5GB 8GB FP16 6.5GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any 8GB+ GPU Velocidad (M3 Pro 32GB) ~15 images/min (RTX 3080), ~3/min (M3 Pro) Usar con ComfyUI → Ver workflows stabilityai/sdxl-turbo 🤗 Modelo en HuggingFace Hardware
S SD 3.5 Large 🎨 image gen Stability AI · 8B · — ctx · Stability AI Community MMDiT architecture. Better prompt understanding than SDXL. Good text rendering. PhotorealismComplex compositionsHigh resolution Q4_K_M 5GB 8GB Q8 8GB 8GB FP16 16GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti 16GB or Mac Mini M4 24GB Velocidad (M3 Pro 32GB) ~4 images/min (RTX 3080), ~1/min (M3 Pro) Usar con ComfyUI → Ver workflows stabilityai/stable-diffusion-3.5-large 🤗 Modelo en HuggingFace Hardware
S Stable Cascade 🎨 image gen Stability AI · 5.1B (3-stage) · — ctx · Stability AI Non-Commercial 3-stage pipeline is more VRAM efficient than SDXL. Good quality at lower compute cost. Efficient generationLower VRAMFast training Q4_K_M 3GB 8GB Q8 5GB 8GB FP16 10GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any 6GB+ GPU Velocidad (M3 Pro 32GB) ~3 images/min (RTX 3080) Usar con ComfyUI → Ver workflows stabilityai/stable-cascade 🤗 Modelo en HuggingFace Hardware
B FLUX.1 Schnell 🎨 image gen Black Forest Labs · 12B · — ctx · Apache 2.0 Apache 2.0 licensed FLUX variant. 4-step generation (vs 50 for dev). Commercial use OK. Fast FLUX generation4-step inferenceCommercial use Q4_K_M 7GB 8GB Q8 12GB 16GB FP16 24GB 24GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti 16GB (FP8) or RTX 4090 Velocidad (M3 Pro 32GB) ~8 images/min (RTX 3080), ~2/min (M3 Pro) Usar con ComfyUI → Ver workflows black-forest-labs/FLUX.1-schnell 🤗 Modelo en HuggingFace Hardware
L LTX Video 🎬 video gen Lightricks · 2B · — ctx · Apache 2.0 Fastest open video model. Generates 5-second clips from text or images. NVFP4 optimized. Apache 2.0. Text-to-videoImage-to-videoFast generation Q4_K_M 1.2GB 8GB Q8 2GB 8GB FP16 4GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti (FP16) or any 8GB+ GPU Velocidad (M3 Pro 32GB) ~30s per 4s clip (RTX 3080) Usar con ComfyUI → Ver workflows Lightricks/LTX-Video 🤗 Modelo en HuggingFace Hardware
T Hunyuan Video 🎬 video gen Tencent · 13B · — ctx · Tencent Hunyuan Community Best open video quality. Longer clips than LTX. Strong motion consistency. Needs more VRAM. High-quality videoMotion consistencyLong clips Q4_K_M 8GB 8GB Q8 13GB 16GB FP16 26GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4090 24GB or A100 Velocidad (M3 Pro 32GB) ~60s per 4s clip (RTX 3080) Usar con ComfyUI → Ver workflows tencent/HunyuanVideo 🤗 Modelo en HuggingFace Hardware
N Cosmos 1.0 Diffusion 7B 🎬 video gen NVIDIA · 7B · — ctx · NVIDIA Open Model License NVIDIA's world generation model. Understands physics. Great for game dev and simulation. World generationPhysical simulationGame content Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4060 Ti 16GB or RTX 4090 Velocidad (M3 Pro 32GB) ~45s per 4s clip (RTX 3080) Usar con ComfyUI → Ver workflows NVIDIA/Cosmos-1.0-Diffusion-7B-Text2World 🤗 Modelo en HuggingFace Hardware
W WAN 2.1 T2V 14B 🎬 video gen WAN AI · 14B · — ctx · Apache 2.0 Strong Chinese-English bilingual video generation. Apache 2.0. Good motion and detail. Text-to-videoChinese + EnglishHigh resolution Q4_K_M 8GB 8GB Q8 14GB 16GB FP16 28GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4090 or A100 Velocidad (M3 Pro 32GB) ~90s per 4s clip (RTX 3080) Usar con ComfyUI → Ver workflows Wan-AI/Wan2.1-T2V-14B 🤗 Modelo en HuggingFace Hardware
G Mochi 1 🎬 video gen Genmo · 10B · — ctx · Apache 2.0 Exceptional motion quality. Apache 2.0. Natural human movement and physics. High-fidelity motionCinematic qualityNatural movement Q4_K_M 6GB 8GB Q8 10GB 16GB FP16 20GB 24GB 08GB16GB24GB Detalles del modelo Hardware recomendado RTX 4090 or cloud Velocidad (M3 Pro 32GB) ~120s per 4s clip (A100) Usar con ComfyUI → Ver workflows genmo/mochi-1-preview 🤗 Modelo en HuggingFace Hardware
O Whisper Large v3 Turbo 🎵 audio OpenAI · 809M · 0.03K ctx · MIT Industry standard for transcription. 99 languages. MIT license. Runs on any device. Speech-to-textTranscription99 languages Q4_K_M 512MB 8GB Q8 819MB 8GB FP16 1.6GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device (809M params) Velocidad (M3 Pro 32GB) ~30x real-time (M3 Pro), ~50x (RTX 3080) Instalar con Ollama ollama pull whisper-large-v3-turbo Copiar openai/whisper-large-v3-turbo 🤗 Modelo en HuggingFace Hardware
S Bark 🎵 audio Suno · 1.3B · — ctx · MIT Generates realistic speech, music, and sound effects from text. MIT license. Multilingual. Text-to-speechVoice cloningSound effects Q4_K_M 819MB 8GB Q8 1.3GB 8GB FP16 2.6GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device with 4GB+ RAM Velocidad (M3 Pro 32GB) ~0.5x real-time (RTX 3080) — slow TTS Instalar con Ollama ollama pull bark Copiar suno/bark 🤗 Modelo en HuggingFace Hardware
M MusicGen Large 🎵 audio Meta · 3.3B · — ctx · CC BY-NC 4.0 Generate music from text descriptions. Great for content creators. Multiple styles. Music generationBackground musicAudio branding Q4_K_M 2GB 8GB Q8 3.3GB 8GB FP16 6.6GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device with 8GB+ RAM Velocidad (M3 Pro 32GB) ~2x real-time (RTX 3080) — music generation Instalar con Ollama ollama pull musicgen-large Copiar facebook/musicgen-large 🤗 Modelo en HuggingFace Hardware
C XTTS v2 🎵 audio Coqui · 0.5B · — ctx · Coqui Public License Clone any voice from 6-second sample. 17 languages including Spanish. Real-time on CPU. Voice cloning17 languagesReal-time TTS Q4_K_M 307MB 8GB Q8 512MB 8GB FP16 1GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device (runs on CPU) Velocidad (M3 Pro 32GB) ~3x real-time (RTX 3080) — voice cloning TTS Instalar con Ollama ollama pull xtts-v2 Copiar coqui/XTTS-v2 🤗 Modelo en HuggingFace Hardware
P Parler TTS Large 🎵 audio Parler TTS · 2.3B · — ctx · Apache 2.0 Control voice via text description ('A warm female voice, clear, podcast style'). Apache 2.0. Controllable TTSStyle descriptionPodcast generation Q4_K_M 1.4GB 8GB Q8 2.3GB 8GB FP16 4.6GB 8GB 08GB16GB24GB Detalles del modelo Hardware recomendado Any device with 4GB+ RAM Velocidad (M3 Pro 32GB) ~5x real-time (RTX 3080) — natural TTS Instalar con Ollama ollama pull parler-tts Copiar parler-tts/parler-tts-large-v1 🤗 Modelo en HuggingFace Hardware
B BioMistral 7B 🔬 specialized BioMistral · 7B · 32.768K ctx · Apache 2.0 Fine-tuned on PubMed. Medical QA, clinical text analysis. Apache 2.0. EU AI Act: high-risk category. Medical QAClinical NLPBiomedical research ⚡ 42 tok/s Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or Mac Mini M4 Velocidad (M3 Pro 32GB) Q4: 42 tok/s Medical domain fine-tune Instalar con Ollama ollama pull biomistral Copiar BioMistral/BioMistral-7B 🤗 Modelo en HuggingFace Hardware
E SaulLM 7B 🔬 specialized Equall · 7B · 8.192K ctx · MIT Trained on legal corpora. Understanding of legal concepts, statutes, case law. MIT license. Legal analysisContract reviewCase law research ⚡ 42 tok/s Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or Mac Mini M4 Velocidad (M3 Pro 32GB) Q4: 42 tok/s Legal domain fine-tune Instalar con Ollama ollama pull saullm Copiar Equall/Saul-7B-Instruct-v1 🤗 Modelo en HuggingFace Hardware
D DeepSeek-Math 7B 🔬 specialized DeepSeek · 7B · 4.096K ctx · DeepSeek License State-of-the-art math reasoning at 7B. Solves complex problems step by step. Mathematical reasoningTheorem provingCalculations ⚡ 45 tok/s Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or any device Velocidad (M3 Pro 32GB) Q4: 45 tok/s Math specialized Instalar con Ollama ollama pull deepseek-math Copiar deepseek-ai/DeepSeek-Math-7B-Instruct 🤗 Modelo en HuggingFace Hardware
M Mistral 7B v0.3 🔬 specialized Mistral AI · 7B · 32.768K ctx · Apache 2.0 Best 7B for structured tasks: function calling, JSON output, tool use. Apache 2.0. Fast and reliable. Function callingStructured outputRAG ⚡ 44 tok/s Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or any device Velocidad (M3 Pro 32GB) Q4: 44 tok/s FP16: 22 tok/s Instalar con Ollama ollama pull mistral:7b-instruct-v0.3 Copiar mistralai/Mistral-7B-Instruct-v0.3 🤗 Modelo en HuggingFace Hardware
N Hermes 3 8B 🔬 specialized Nous Research · 8B · 131.072K ctx · Llama 3.1 Community Best community fine-tune for agentic use. Excellent at following complex system prompts and using tools. Agentic tasksTool useSystem prompts ⚡ 42 tok/s Q4_K_M 5GB 8GB Q8 8GB 8GB FP16 16GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q8) or Mac Mini M4 Velocidad (M3 Pro 32GB) Q4: 42 tok/s FP16: 20 tok/s Instalar con Ollama ollama pull hermes3:8b Copiar NousResearch/Hermes-3-Llama-3.1-8B 🤗 Modelo en HuggingFace Hardware
T Finance-LLM 13B 🔬 specialized TheBloke (quantized) · 13B · 4.096K ctx · Llama 2 Community Fine-tuned on financial data. Understands financial terminology, ratios, and market concepts. Financial analysisReport generationMarket research ⚡ 22 tok/s Q4_K_M 7.5GB 8GB Q8 13GB 16GB FP16 26GB 24GB+ 08GB16GB24GB Detalles del modelo Hardware recomendado Mac Mini M4 24GB (Q8) or RTX 4060 Ti (Q4) Velocidad (M3 Pro 32GB) Q4: 22 tok/s Financial domain Instalar con Ollama ollama pull finance-llm:13b Copiar TheBloke/finance-LLM-13B-GGUF 🤗 Modelo en HuggingFace Hardware
C Dolphin 2.9 8B 🔬 specialized Cognitive Computations · 8B · 8.192K ctx · Llama 3 Community Uncensored fine-tune — no alignment filtering. Useful for creative tasks, red-teaming, research. Uncensored assistantCreative writingRoleplay ⚡ 42 tok/s Q4_K_M 5GB 8GB Q8 8GB 8GB FP16 16GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q8) or any device Velocidad (M3 Pro 32GB) Q4: 42 tok/s FP16: 20 tok/s Instalar con Ollama ollama pull dolphin-llama3:8b Copiar cognitivecomputations/dolphin-2.9-llama3-8b 🤗 Modelo en HuggingFace Hardware
M Llama Guard 3 🔬 specialized Meta · 8B · 131.072K ctx · Llama 3.1 Community Meta's safety classifier. Run alongside any model to filter harmful content. Essential for production deployments. Content moderationSafety classificationGuardrails ⚡ 40 tok/s Q4_K_M 5GB 8GB Q8 8GB 8GB FP16 16GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or any device Velocidad (M3 Pro 32GB) Q4: 40 tok/s Safety classifier Instalar con Ollama ollama pull llama-guard3:8b Copiar meta-llama/Llama-Guard-3-8B 🤗 Modelo en HuggingFace Hardware
P OCRonos Vintage 🔬 specialized PleIAs · 7B · 8.192K ctx · Apache 2.0 Specialized in correcting OCR errors from historical documents. Perfect for digitization projects. Apache 2.0. Historical OCR correctionDocument digitizationArchive processing Q4_K_M 4.5GB 8GB Q8 7GB 8GB FP16 14GB 16GB 08GB16GB24GB Detalles del modelo Hardware recomendado Jetson Orin Nano 8GB (Q4) or any device Velocidad (M3 Pro 32GB) ~5 pages/min (M3 Pro) — OCR model Instalar con Ollama ollama pull ocronos Copiar PleIAs/OCRonos-Vintage 🤗 Modelo en HuggingFace Hardware