Modelos de IA

Nuestra seleccion de los mejores modelos

50 modelos de codigo abierto, seleccionados y probados para despliegue local. Cada uno con su caso de uso, requisitos de VRAM y guia de instalacion.

Fuente: HuggingFace, Ollama. Actualizado abril 2026.

Agnósticos de proveedor: modelos de Anthropic, OpenAI, Google, Meta, Mistral, y comunidad open-source.

0 Modelos
0 Categorias
0 Cloud Providers
0 Dispositivos
Proveedor:
💻 Mi hardware:
/

53 modelos

← Haz clic en cualquier modelo para ver detalles

M

Llama 4 Scout

💬 general

Meta · 109B (17B active MoE) · 10M ctx · Llama 4 Community

Largest open context window ever (10M tokens). MoE = only 17B params active, so inference is fast despite 109B total.

Massive context (10M tokens)Document analysisMulti-turn reasoning ⚡ 8 tok/s
Q4_K_M
58GB 24GB+
Q8
109GB 24GB+
FP16
218GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Studio M4 Max 192GB (FP16) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)
Q4: 8 tok/s MoE: 17B active
Instalar con Ollama
ollama pull llama4-scout
meta-llama/Llama-4-Scout-109B-Instruct
M

Llama 4 Maverick

💬 general

Meta · 400B (17B active, 128 experts) · 1M ctx · Llama 4 Community

128-expert MoE with only 17B active. Competes with GPT-4 class. Requires multi-GPU or cloud.

Top-tier reasoningComplex analysisResearch ⚡ 3 tok/s
Q4_K_M
200GB 24GB+
Q8
400GB 24GB+
FP16
800GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Multi-GPU cluster (not practical for edge)

Velocidad (M3 Pro 32GB)
Q4: 3 tok/s Requires 200GB+
Instalar con Ollama
ollama pull llama4-maverick
meta-llama/Llama-4-Maverick-400B-Instruct
G

Gemma 3 4B

👁️ vision

Google · 4B · 128K ctx · Gemma License

Smallest multimodal model available. Only 3GB at Q4. Fits Jetson Orin Nano. Perfect for vision tasks on minimal hardware.

Lightweight multimodalImage understandingEdge deployment128K context ⚡ 65 tok/s
Q4_K_M
3GB 8GB
Q8
4GB 8GB
FP16
8GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB | Mac Mini M4 16GB

Velocidad (M3 Pro 32GB)
Q4: 65 tok/s FP16: 40 tok/s
Instalar con Ollama
ollama pull gemma3:4b
google/gemma-3-4b-it
G

Gemma 3 27B

👁️ vision

Google · 27B (also 1B/4B/12B) · 128K ctx · Gemma License

First multimodal open model that fits on a Mac Mini. 4B variant needs only 3GB. SigLIP vision encoder. 140+ languages.

Multimodal (text + images)Document scanningInvoice processing128K context window ⚡ 14 tok/s
Q4_K_M
16GB 16GB
Q8
27GB 24GB+
FP16
54GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

4B: any 8GB+ device | 27B: Mac Mini M4 Pro 32GB+

Velocidad (M3 Pro 32GB)
Q4: 14 tok/s FP16: 7 tok/s
Instalar con Ollama
ollama pull gemma3:27b
google/gemma-3-27b-it
G

Gemma 4 27B

💬 general

Google · 27B · 128K ctx · Apache 2.0

Apache 2.0 license (fully open). Best quality-per-param in the 27B class. Runs on Mac Mini M4 24GB at Q4.

High-quality chatInstruction followingCreative writing ⚡ 14 tok/s
Q4_K_M
16GB 16GB
Q8
27GB 24GB+
FP16
54GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 24GB (Q4) or RTX 4060 Ti 16GB (Q4)

Velocidad (M3 Pro 32GB)
Q4: 14 tok/s FP16: 7 tok/s
Instalar con Ollama
ollama pull gemma4:27b
google/gemma-4-27b
G

Gemma 4 E4B

💬 general

Google · 4B · 128K ctx · Apache 2.0

Tiny but punches above its weight. Runs on Jetson Orin Nano. Apache 2.0.

Fast inferenceEdge deploymentMobile/IoT ⚡ 58 tok/s
Q4_K_M
2.5GB 8GB
Q8
4GB 8GB
FP16
8GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (FP16) or any device

Velocidad (M3 Pro 32GB)
Q4: 58 tok/s FP16: 35 tok/s
Instalar con Ollama
ollama pull gemma4:e4b
google/gemma-4-4b
A

Qwen 3 32B

💬 general

Alibaba · 32B · 131.072K ctx · Apache 2.0

Thinking mode (deep reasoning) + non-thinking mode. Best multilingual open model. Apache 2.0.

Multilingual (29 languages)ReasoningTool use ⚡ 12 tok/s
Q4_K_M
18GB 24GB
Q8
32GB 24GB+
FP16
64GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)
Q4: 12 tok/s FP16: 6 tok/s
Instalar con Ollama
ollama pull qwen3:32b
Qwen/Qwen3-32B
A

Qwen 3 8B

💬 general

Alibaba · 8B · 131.072K ctx · Apache 2.0

Sweet spot for edge: great quality at 8B, dual thinking modes, strong Spanish. Fits on 8GB devices.

General purposeSpanish languageFast reasoning ⚡ 45 tok/s
Q4_K_M
5GB 8GB
Q8
8GB 8GB
FP16
16GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q8) or Mac Mini M4 (Q4)

Velocidad (M3 Pro 32GB)
Q4: 45 tok/s FP16: 22 tok/s
Instalar con Ollama
ollama pull qwen3:8b
Qwen/Qwen3-8B
M

Phi-4

💬 general

Microsoft · 14B · 16.384K ctx · MIT

Best-in-class for STEM at 14B. MIT license. Fits on 8GB devices at Q4.

STEM reasoningMathCode generation ⚡ 28 tok/s
Q4_K_M
8GB 8GB
Q8
14GB 16GB
FP16
28GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4 (Q8)

Velocidad (M3 Pro 32GB)
Q4: 28 tok/s FP16: 14 tok/s
Instalar con Ollama
ollama pull phi4
microsoft/phi-4
D

DeepSeek V3

💬 general

DeepSeek · 671B (37B active MoE) · 128K ctx · DeepSeek License

Only 37B active params via MoE. Cloud-only for most users but API is very cheap ($0.27/1M input).

GPT-4 class reasoningComplex tasksResearch ⚡ 5 tok/s
Q4_K_M
336GB 24GB+
Q8
671GB 24GB+
FP16
1342GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Cloud API recommended ($0.27/$1.10 per 1M tokens)

Velocidad (M3 Pro 32GB)
Q4: 5 tok/s 671B MoE, needs multi-GPU
Instalar con Ollama
ollama pull deepseek-v3
deepseek-ai/DeepSeek-V3
D

DeepSeek R1

💬 general

DeepSeek · 671B (MoE) / 7B-32B distilled · 128K ctx · MIT

Best open-source reasoning model. 14B distill fits Mac Mini M4 16GB. Shows explicit thinking process. MIT license.

Chain-of-thought reasoningMath (97.3% MATH-500)Code debuggingLegal/financial analysis ⚡ 4 tok/s
Q4_K_M
10GB 16GB
Q8
14GB 16GB
FP16
28GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

14B: Mac Mini M4 16GB | 32B: 32GB+ unified memory

Velocidad (M3 Pro 32GB)
Q4: 4 tok/s 685B MoE, needs multi-GPU
Instalar con Ollama
ollama pull deepseek-r1:14b
deepseek-ai/DeepSeek-R1
A

Qwen3-Coder-Next

💻 coding

Alibaba · 80B (3B active MoE) · 262.144K ctx · Apache 2.0

Only 3B active params — blazing fast. 256K context for huge codebases. >70% SWE-Bench. Apache 2.0.

Code generation (370 languages)SWE-Bench >70%Large repos ⚡ 5 tok/s
Q4_K_M
42GB 24GB+
Q8
80GB 24GB+
FP16
160GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q4) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)
Q4: 5 tok/s Needs 48GB+
Instalar con Ollama
ollama pull qwen3-coder-next
Qwen/Qwen3-Coder-Next-80B
A

Qwen2.5-Coder 32B

💻 coding

Alibaba · 32B · 131.072K ctx · Apache 2.0

Best open coding model at 32B class. Strong at Python, JS, Go. Apache 2.0.

Code completionDebuggingRefactoring ⚡ 12 tok/s
Q4_K_M
18GB 24GB
Q8
32GB 24GB+
FP16
64GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)
Q4: 12 tok/s FP16: 6 tok/s
Instalar con Ollama
ollama pull qwen2.5-coder:32b
Qwen/Qwen2.5-Coder-32B-Instruct
A

Qwen2.5-Coder 7B

💻 coding

Alibaba · 7B · 131.072K ctx · Apache 2.0

Best small coding model. Fits on any device. Fast enough for real-time autocomplete.

Fast code assistAutocompleteEdge coding ⚡ 48 tok/s
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q8) or any device

Velocidad (M3 Pro 32GB)
Q4: 48 tok/s FP16: 27 tok/s
Instalar con Ollama
ollama pull qwen2.5-coder:7b
Qwen/Qwen2.5-Coder-7B-Instruct
D

DeepSeek-Coder V2

💻 coding

DeepSeek · 236B (21B active MoE) · 128K ctx · DeepSeek License

MoE with 21B active. Near GPT-4 coding quality. Best via API ($0.14/$0.28 per 1M).

Complex code tasksMulti-file editsArchitecture ⚡ 10 tok/s
Q4_K_M
120GB 24GB+
Q8
236GB 24GB+
FP16
472GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Cloud API recommended

Velocidad (M3 Pro 32GB)
Q4: 10 tok/s 236B MoE
Instalar con Ollama
ollama pull deepseek-coder-v2
deepseek-ai/DeepSeek-Coder-V2-Instruct
B

StarCoder2 15B

💻 coding

BigCode · 15B · 16.384K ctx · BigCode OpenRAIL-M

Trained on The Stack v2. Great for autocomplete. Supports 600+ programming languages.

Code completionFill-in-the-middle600+ languages ⚡ 22 tok/s
Q4_K_M
9GB 16GB
Q8
15GB 16GB
FP16
30GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 24GB (Q8) or Jetson 8GB (Q4)

Velocidad (M3 Pro 32GB)
Q4: 22 tok/s FP16: 11 tok/s
Instalar con Ollama
ollama pull starcoder2:15b
bigcode/starcoder2-15b
M

CodeLlama 34B

💻 coding

Meta · 34B · 16.384K ctx · Llama 2 Community

Proven and stable. Strong at Python, C++, Java. Good for production use where stability matters.

Infill generationPython specialistLarge codebase nav ⚡ 10 tok/s
Q4_K_M
19GB 24GB
Q8
34GB 24GB+
FP16
68GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)
Q4: 10 tok/s
Instalar con Ollama
ollama pull codellama:34b
codellama/CodeLlama-34b-Instruct-hf
L

LLaVA 1.6 34B

👁️ vision

LLaVA Team · 34B · 4.096K ctx · Apache 2.0

Best open vision model for image understanding. Can describe images, answer questions about photos, read documents.

Image understandingVisual QADocument OCR ⚡ 10 tok/s
Q4_K_M
19GB 24GB
Q8
34GB 24GB+
FP16
68GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)
Q4: 10 tok/s
Instalar con Ollama
ollama pull llava:34b
llava-hf/llava-v1.6-34b-hf
G

Gemma 3n E4B

👁️ vision

Google · 4B (edge-optimized) · 32.768K ctx · Apache 2.0

Edge-optimized multimodal. Runs on phones and Jetson. Audio, video, image, text — all in 4B params.

On-device visionAudio+Video+Image+TextMobile AI ⚡ 55 tok/s
Q4_K_M
2.5GB 8GB
Q8
4GB 8GB
FP16
8GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB or phone

Velocidad (M3 Pro 32GB)
Q4: 55 tok/s FP16: 32 tok/s Nano variant
Instalar con Ollama
ollama pull gemma3n:e4b
google/gemma-3n-e4b
A

Qwen2.5-VL 72B

👁️ vision

Alibaba · 72B · 32.768K ctx · Apache 2.0

Understands video, multiple images, and documents. Agent-ready with tool use. Apache 2.0.

Video understandingMulti-image analysisDocument parsing ⚡ 5 tok/s
Q4_K_M
40GB 24GB+
Q8
72GB 24GB+
FP16
144GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 Pro 48GB (Q4) or cloud

Velocidad (M3 Pro 32GB)
Q4: 5 tok/s Needs 48GB+
Instalar con Ollama
ollama pull qwen2.5-vl:72b
Qwen/Qwen2.5-VL-72B-Instruct
O

InternVL 2.5 78B

👁️ vision

OpenGVLab · 78B · 32.768K ctx · MIT

Top-tier on chart/diagram understanding benchmarks. MIT license. Strong OCR.

Chart/diagram understandingOCRVisual reasoning ⚡ 4 tok/s
Q4_K_M
42GB 24GB+
Q8
78GB 24GB+
FP16
156GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Cloud or multi-GPU

Velocidad (M3 Pro 32GB)
Q4: 4 tok/s Needs 48GB+
Instalar con Ollama
ollama pull internvl2.5:78b
OpenGVLab/InternVL2.5-78B
M

Florence 2 Large

👁️ vision

Microsoft · 0.77B · 4.096K ctx · MIT

Tiny (0.77B) but excellent at CV tasks. Runs on any device. MIT license. Perfect for edge vision.

Object detectionImage captioningOCRSegmentation
Q4_K_M
512MB 8GB
Q8
819MB 8GB
FP16
1.5GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device (even Raspberry Pi)

Velocidad (M3 Pro 32GB)
FP16: 40 tok/s Light vision model
Instalar con Ollama
ollama pull florence2
microsoft/Florence-2-large
M

Moondream 2

👁️ vision

Moondream · 1.8B · 2.048K ctx · Apache 2.0

Smallest practical vision model. Runs on Jetson, RPi, phones. Apache 2.0. Great for IoT vision.

Edge visionImage QALightweight multimodal
Q4_K_M
1.1GB 8GB
Q8
1.8GB 8GB
FP16
3.6GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano, RPi 5, any device

Velocidad (M3 Pro 32GB)
FP16: 65 tok/s Ultra-light vision
Instalar con Ollama
ollama pull moondream
moondream/moondream2
S

all-MiniLM-L6-v2

🔗 embedding

Sentence Transformers · 22M · 0.512K ctx · Apache 2.0

Industry standard for semantic search. 384 dimensions. Blazing fast. We use this in Vorlux AI RAG system.

Semantic searchRAG retrievalDocument similarity
Q4_K_M
31MB 8GB
Q8
51MB 8GB
FP16
92MB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device (22M params = negligible VRAM)

Velocidad (M3 Pro 32GB)
FP16: 500 tok/s Embedding
Instalar con Ollama
ollama pull all-minilm:l6-v2
sentence-transformers/all-MiniLM-L6-v2
N

Nomic Embed Text v1.5

🔗 embedding

Nomic AI · 137M · 8.192K ctx · Apache 2.0

8K context (16x MiniLM). Better for long documents. 768 dimensions. Apache 2.0.

Long document embedding8K context chunksSearch
Q4_K_M
82MB 8GB
Q8
143MB 8GB
FP16
276MB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)
FP16: 350 tok/s Embedding model
Instalar con Ollama
ollama pull nomic-embed-text
nomic-ai/nomic-embed-text-v1.5
B

BGE Large v1.5

🔗 embedding

BAAI · 335M · 0.512K ctx · MIT

Top MTEB benchmark scores. 1024 dimensions. MIT license. Best quality per compute for English.

High-quality retrievalRerankingEnglish focus
Q4_K_M
184MB 8GB
Q8
348MB 8GB
FP16
686MB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)
FP16: 320 tok/s Embedding
Instalar con Ollama
ollama pull bge-large
BAAI/bge-large-en-v1.5
A

GTE Base

🔗 embedding

Alibaba DAMO · 109M · 0.512K ctx · MIT

Great balance of speed and quality. 768 dimensions. MIT license. Good for production RAG.

Balanced speed/qualityMultilingual embeddingProduction RAG
Q4_K_M
61MB 8GB
Q8
113MB 8GB
FP16
225MB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)
FP16: 400 tok/s Embedding
Instalar con Ollama
ollama pull gte-base
thenlper/gte-base
S

Arctic Embed L v2

🔗 embedding

Snowflake · 568M · 8.192K ctx · Apache 2.0

8K context, 1024 dimensions, strong multilingual. Top MTEB scores. Apache 2.0.

Enterprise searchLong-context retrievalMultilingual
Q4_K_M
307MB 8GB
Q8
584MB 8GB
FP16
1.1GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)
FP16: 300 tok/s Embedding
Instalar con Ollama
ollama pull snowflake-arctic-embed:l
Snowflake/snowflake-arctic-embed-l-v2.0
B

FLUX.1 Dev

🎨 image gen

Black Forest Labs · 12B · — ctx · FLUX.1-dev Non-Commercial

Best open image model. Exceptional text rendering and prompt adherence. NVFP4 variant runs on 8GB GPUs.

High-quality image generationText renderingPrompt following
Q4_K_M
7GB 8GB
Q8
12GB 16GB
FP16
24GB 24GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti 16GB (FP8) or RTX 4090 (FP16)

Velocidad (M3 Pro 32GB)
~2 images/min on RTX 3080
Usar con ComfyUI → Ver workflows
black-forest-labs/FLUX.1-dev
S

SDXL 1.0

🎨 image gen

Stability AI · 6.6B (2.6B UNet) · — ctx · Stability AI Community

Massive LoRA/ControlNet ecosystem. Fast on consumer GPUs. Still the most flexible image model.

Fast image genLoRA ecosystemControlNet
Q4_K_M
2GB 8GB
Q8
3.5GB 8GB
FP16
6.5GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti (FP16) or any 8GB+ GPU

Velocidad (M3 Pro 32GB)
~6 images/min on RTX 3080
Usar con ComfyUI → Ver workflows
stabilityai/stable-diffusion-xl-base-1.0
S

SDXL Turbo

🎨 image gen

Stability AI · 6.6B · — ctx · Stability AI Community

Single-step generation — near real-time images. Great for interactive apps and demos.

Real-time generation1-step inferenceInteractive
Q4_K_M
2GB 8GB
Q8
3.5GB 8GB
FP16
6.5GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any 8GB+ GPU

Velocidad (M3 Pro 32GB)
~15 images/min (RTX 3080), ~3/min (M3 Pro)
Usar con ComfyUI → Ver workflows
stabilityai/sdxl-turbo
S

SD 3.5 Large

🎨 image gen

Stability AI · 8B · — ctx · Stability AI Community

MMDiT architecture. Better prompt understanding than SDXL. Good text rendering.

PhotorealismComplex compositionsHigh resolution
Q4_K_M
5GB 8GB
Q8
8GB 8GB
FP16
16GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti 16GB or Mac Mini M4 24GB

Velocidad (M3 Pro 32GB)
~4 images/min (RTX 3080), ~1/min (M3 Pro)
Usar con ComfyUI → Ver workflows
stabilityai/stable-diffusion-3.5-large
S

Stable Cascade

🎨 image gen

Stability AI · 5.1B (3-stage) · — ctx · Stability AI Non-Commercial

3-stage pipeline is more VRAM efficient than SDXL. Good quality at lower compute cost.

Efficient generationLower VRAMFast training
Q4_K_M
3GB 8GB
Q8
5GB 8GB
FP16
10GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any 6GB+ GPU

Velocidad (M3 Pro 32GB)
~3 images/min (RTX 3080)
Usar con ComfyUI → Ver workflows
stabilityai/stable-cascade
B

FLUX.1 Schnell

🎨 image gen

Black Forest Labs · 12B · — ctx · Apache 2.0

Apache 2.0 licensed FLUX variant. 4-step generation (vs 50 for dev). Commercial use OK.

Fast FLUX generation4-step inferenceCommercial use
Q4_K_M
7GB 8GB
Q8
12GB 16GB
FP16
24GB 24GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti 16GB (FP8) or RTX 4090

Velocidad (M3 Pro 32GB)
~8 images/min (RTX 3080), ~2/min (M3 Pro)
Usar con ComfyUI → Ver workflows
black-forest-labs/FLUX.1-schnell
L

LTX Video

🎬 video gen

Lightricks · 2B · — ctx · Apache 2.0

Fastest open video model. Generates 5-second clips from text or images. NVFP4 optimized. Apache 2.0.

Text-to-videoImage-to-videoFast generation
Q4_K_M
1.2GB 8GB
Q8
2GB 8GB
FP16
4GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti (FP16) or any 8GB+ GPU

Velocidad (M3 Pro 32GB)
~30s per 4s clip (RTX 3080)
Usar con ComfyUI → Ver workflows
Lightricks/LTX-Video
T

Hunyuan Video

🎬 video gen

Tencent · 13B · — ctx · Tencent Hunyuan Community

Best open video quality. Longer clips than LTX. Strong motion consistency. Needs more VRAM.

High-quality videoMotion consistencyLong clips
Q4_K_M
8GB 8GB
Q8
13GB 16GB
FP16
26GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4090 24GB or A100

Velocidad (M3 Pro 32GB)
~60s per 4s clip (RTX 3080)
Usar con ComfyUI → Ver workflows
tencent/HunyuanVideo
N

Cosmos 1.0 Diffusion 7B

🎬 video gen

NVIDIA · 7B · — ctx · NVIDIA Open Model License

NVIDIA's world generation model. Understands physics. Great for game dev and simulation.

World generationPhysical simulationGame content
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4060 Ti 16GB or RTX 4090

Velocidad (M3 Pro 32GB)
~45s per 4s clip (RTX 3080)
Usar con ComfyUI → Ver workflows
NVIDIA/Cosmos-1.0-Diffusion-7B-Text2World
W

WAN 2.1 T2V 14B

🎬 video gen

WAN AI · 14B · — ctx · Apache 2.0

Strong Chinese-English bilingual video generation. Apache 2.0. Good motion and detail.

Text-to-videoChinese + EnglishHigh resolution
Q4_K_M
8GB 8GB
Q8
14GB 16GB
FP16
28GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4090 or A100

Velocidad (M3 Pro 32GB)
~90s per 4s clip (RTX 3080)
Usar con ComfyUI → Ver workflows
Wan-AI/Wan2.1-T2V-14B
G

Mochi 1

🎬 video gen

Genmo · 10B · — ctx · Apache 2.0

Exceptional motion quality. Apache 2.0. Natural human movement and physics.

High-fidelity motionCinematic qualityNatural movement
Q4_K_M
6GB 8GB
Q8
10GB 16GB
FP16
20GB 24GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

RTX 4090 or cloud

Velocidad (M3 Pro 32GB)
~120s per 4s clip (A100)
Usar con ComfyUI → Ver workflows
genmo/mochi-1-preview
O

Whisper Large v3 Turbo

🎵 audio

OpenAI · 809M · 0.03K ctx · MIT

Industry standard for transcription. 99 languages. MIT license. Runs on any device.

Speech-to-textTranscription99 languages
Q4_K_M
512MB 8GB
Q8
819MB 8GB
FP16
1.6GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device (809M params)

Velocidad (M3 Pro 32GB)
~30x real-time (M3 Pro), ~50x (RTX 3080)
Instalar con Ollama
ollama pull whisper-large-v3-turbo
openai/whisper-large-v3-turbo
S

Bark

🎵 audio

Suno · 1.3B · — ctx · MIT

Generates realistic speech, music, and sound effects from text. MIT license. Multilingual.

Text-to-speechVoice cloningSound effects
Q4_K_M
819MB 8GB
Q8
1.3GB 8GB
FP16
2.6GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device with 4GB+ RAM

Velocidad (M3 Pro 32GB)
~0.5x real-time (RTX 3080) — slow TTS
Instalar con Ollama
ollama pull bark
suno/bark
M

MusicGen Large

🎵 audio

Meta · 3.3B · — ctx · CC BY-NC 4.0

Generate music from text descriptions. Great for content creators. Multiple styles.

Music generationBackground musicAudio branding
Q4_K_M
2GB 8GB
Q8
3.3GB 8GB
FP16
6.6GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device with 8GB+ RAM

Velocidad (M3 Pro 32GB)
~2x real-time (RTX 3080) — music generation
Instalar con Ollama
ollama pull musicgen-large
facebook/musicgen-large
C

XTTS v2

🎵 audio

Coqui · 0.5B · — ctx · Coqui Public License

Clone any voice from 6-second sample. 17 languages including Spanish. Real-time on CPU.

Voice cloning17 languagesReal-time TTS
Q4_K_M
307MB 8GB
Q8
512MB 8GB
FP16
1GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device (runs on CPU)

Velocidad (M3 Pro 32GB)
~3x real-time (RTX 3080) — voice cloning TTS
Instalar con Ollama
ollama pull xtts-v2
coqui/XTTS-v2
P

Parler TTS Large

🎵 audio

Parler TTS · 2.3B · — ctx · Apache 2.0

Control voice via text description ('A warm female voice, clear, podcast style'). Apache 2.0.

Controllable TTSStyle descriptionPodcast generation
Q4_K_M
1.4GB 8GB
Q8
2.3GB 8GB
FP16
4.6GB 8GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Any device with 4GB+ RAM

Velocidad (M3 Pro 32GB)
~5x real-time (RTX 3080) — natural TTS
Instalar con Ollama
ollama pull parler-tts
parler-tts/parler-tts-large-v1
B

BioMistral 7B

🔬 specialized

BioMistral · 7B · 32.768K ctx · Apache 2.0

Fine-tuned on PubMed. Medical QA, clinical text analysis. Apache 2.0. EU AI Act: high-risk category.

Medical QAClinical NLPBiomedical research ⚡ 42 tok/s
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4

Velocidad (M3 Pro 32GB)
Q4: 42 tok/s Medical domain fine-tune
Instalar con Ollama
ollama pull biomistral
BioMistral/BioMistral-7B
E

SaulLM 7B

🔬 specialized

Equall · 7B · 8.192K ctx · MIT

Trained on legal corpora. Understanding of legal concepts, statutes, case law. MIT license.

Legal analysisContract reviewCase law research ⚡ 42 tok/s
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4

Velocidad (M3 Pro 32GB)
Q4: 42 tok/s Legal domain fine-tune
Instalar con Ollama
ollama pull saullm
Equall/Saul-7B-Instruct-v1
D

DeepSeek-Math 7B

🔬 specialized

DeepSeek · 7B · 4.096K ctx · DeepSeek License

State-of-the-art math reasoning at 7B. Solves complex problems step by step.

Mathematical reasoningTheorem provingCalculations ⚡ 45 tok/s
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)
Q4: 45 tok/s Math specialized
Instalar con Ollama
ollama pull deepseek-math
deepseek-ai/DeepSeek-Math-7B-Instruct
M

Mistral 7B v0.3

🔬 specialized

Mistral AI · 7B · 32.768K ctx · Apache 2.0

Best 7B for structured tasks: function calling, JSON output, tool use. Apache 2.0. Fast and reliable.

Function callingStructured outputRAG ⚡ 44 tok/s
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)
Q4: 44 tok/s FP16: 22 tok/s
Instalar con Ollama
ollama pull mistral:7b-instruct-v0.3
mistralai/Mistral-7B-Instruct-v0.3
N

Hermes 3 8B

🔬 specialized

Nous Research · 8B · 131.072K ctx · Llama 3.1 Community

Best community fine-tune for agentic use. Excellent at following complex system prompts and using tools.

Agentic tasksTool useSystem prompts ⚡ 42 tok/s
Q4_K_M
5GB 8GB
Q8
8GB 8GB
FP16
16GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q8) or Mac Mini M4

Velocidad (M3 Pro 32GB)
Q4: 42 tok/s FP16: 20 tok/s
Instalar con Ollama
ollama pull hermes3:8b
NousResearch/Hermes-3-Llama-3.1-8B
T

Finance-LLM 13B

🔬 specialized

TheBloke (quantized) · 13B · 4.096K ctx · Llama 2 Community

Fine-tuned on financial data. Understands financial terminology, ratios, and market concepts.

Financial analysisReport generationMarket research ⚡ 22 tok/s
Q4_K_M
7.5GB 8GB
Q8
13GB 16GB
FP16
26GB 24GB+
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Mac Mini M4 24GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)
Q4: 22 tok/s Financial domain
Instalar con Ollama
ollama pull finance-llm:13b
TheBloke/finance-LLM-13B-GGUF
C

Dolphin 2.9 8B

🔬 specialized

Cognitive Computations · 8B · 8.192K ctx · Llama 3 Community

Uncensored fine-tune — no alignment filtering. Useful for creative tasks, red-teaming, research.

Uncensored assistantCreative writingRoleplay ⚡ 42 tok/s
Q4_K_M
5GB 8GB
Q8
8GB 8GB
FP16
16GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q8) or any device

Velocidad (M3 Pro 32GB)
Q4: 42 tok/s FP16: 20 tok/s
Instalar con Ollama
ollama pull dolphin-llama3:8b
cognitivecomputations/dolphin-2.9-llama3-8b
M

Llama Guard 3

🔬 specialized

Meta · 8B · 131.072K ctx · Llama 3.1 Community

Meta's safety classifier. Run alongside any model to filter harmful content. Essential for production deployments.

Content moderationSafety classificationGuardrails ⚡ 40 tok/s
Q4_K_M
5GB 8GB
Q8
8GB 8GB
FP16
16GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)
Q4: 40 tok/s Safety classifier
Instalar con Ollama
ollama pull llama-guard3:8b
meta-llama/Llama-Guard-3-8B
P

OCRonos Vintage

🔬 specialized

PleIAs · 7B · 8.192K ctx · Apache 2.0

Specialized in correcting OCR errors from historical documents. Perfect for digitization projects. Apache 2.0.

Historical OCR correctionDocument digitizationArchive processing
Q4_K_M
4.5GB 8GB
Q8
7GB 8GB
FP16
14GB 16GB
08GB16GB24GB
Detalles del modelo
Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)
~5 pages/min (M3 Pro) — OCR model
Instalar con Ollama
ollama pull ocronos
PleIAs/OCRonos-Vintage

Modelos Cloud / API

24 modelos evaluados con benchmarks Artificial Analysis

Estos modelos se acceden via API. Comparados por Intelligence Index (II), MMLU-Pro y GPQA. Los precios son $/1M tokens.

O

GPT-4o

OpenAI
II 86
MMLU-Pro 88.7% GPQA 53.6% 142 tok/s
In: $5.00 Out: $15.00
2024-05-13
O

GPT-4o-mini

OpenAI
II 82
MMLU-Pro 86.2% GPQA 48.6% 215 tok/s
In: $0.15 Out: $0.60
2024-07-18
O

o1-preview

OpenAI
II 91
MMLU-Pro 87.7% GPQA 75% 38 tok/s
In: $15.00 Out: $60.00
2024-09-12
O

o1-mini

OpenAI
II 87
MMLU-Pro 85.2% GPQA 64.4% 95 tok/s
In: $3.00 Out: $12.00
2024-09-12
A

Claude 3.5 Sonnet

Anthropic
II 88
MMLU-Pro 88.3% GPQA 55.2% 118 tok/s
In: $3.00 Out: $15.00
2024-06-20
A

Claude 3.5 Sonnet v2

Anthropic
II 89
MMLU-Pro 89.1% GPQA 57.8% 125 tok/s
In: $3.00 Out: $15.00
2024-10-22
A

Claude 3.5 Opus

Anthropic
II 92
MMLU-Pro 90.8% GPQA 62.1% 48 tok/s
In: $15.00 Out: $75.00
2024-07-22
A

Claude 3 Haiku

Anthropic
II 75
MMLU-Pro 77.2% GPQA 38.5% 280 tok/s
In: $0.25 Out: $1.25
2024-03-07
G

Gemini 1.5 Pro

Google
II 87
MMLU-Pro 88.6% GPQA 54.8% 98 tok/s
In: $3.50 Out: $10.50
2024-05-14
G

Gemini 1.5 Flash

Google
II 83
MMLU-Pro 85.3% GPQA 46.2% 245 tok/s
In: $0.07 Out: $0.30
2024-05-14
G

Gemini 2.0 Flash

Google
II 85
MMLU-Pro 86.7% GPQA 49.8% 310 tok/s
In: $0.00 Out: $0.00
2024-12-11
G

Gemini 2.0 Flash Thinking

Google
II 88
MMLU-Pro 88.4% GPQA 55.6% 78 tok/s
In: $0.00 Out: $0.00
2024-12-19
M

Llama 3.1 405B

Meta
II 84
MMLU-Pro 86.1% GPQA 46.8% 42 tok/s
In: $3.50 Out: $14.00
2024-07-23
M

Llama 3.1 70B

Meta
II 82
MMLU-Pro 84.2% GPQA 43.5% 88 tok/s
In: $0.65 Out: $2.75
2024-07-23
M

Llama 3.1 8B

Meta
II 77
MMLU-Pro 79.4% GPQA 36.2% 195 tok/s
In: $0.20 Out: $0.80
2024-07-23
M

Llama 3.2 1B

Meta
II 72
MMLU-Pro 74.8% GPQA 31.5% 380 tok/s
In: $0.10 Out: $0.40
2024-09-25
M

Llama 3.2 3B

Meta
II 76
MMLU-Pro 77.9% GPQA 35.8% 320 tok/s
In: $0.10 Out: $0.40
2024-09-25
M

Mistral Large

Mistral AI
II 85
MMLU-Pro 86.9% GPQA 48.2% 72 tok/s
In: $2.00 Out: $6.00
2024-02-29
M

Mistral Nemo

Mistral AI
II 78
MMLU-Pro 79.8% GPQA 38.4% 185 tok/s
In: $0.15 Out: $0.15
2024-07-22
M

Mixtral 8x7B

Mistral AI
II 79
MMLU-Pro 80.6% GPQA 39.2% 95 tok/s
In: $0.70 Out: $2.40
2023-12-11
M

Mistral Small

Mistral AI
II 80
MMLU-Pro 82.1% GPQA 41.3% 155 tok/s
In: $0.20 Out: $0.60
2024-09-17
Q

Qwen 2.5 72B

Qwen
II 83
MMLU-Pro 85.4% GPQA 44.8% 82 tok/s
In: $0.90 Out: $3.60
2024-09-19
Q

Qwen 2.5 32B

Qwen
II 80
MMLU-Pro 82.8% GPQA 40.5% 145 tok/s
In: $0.50 Out: $2.00
2024-09-19
Q

Qwen 2.5 7B

Qwen
II 76
MMLU-Pro 78.6% GPQA 35.2% 210 tok/s
In: $0.20 Out: $0.80
2024-09-19
Q

QwQ 32B

Qwen
II 84
MMLU-Pro 86.2% GPQA 47.8% 78 tok/s
In: $0.50 Out: $2.00
2024-11-28

Explorador de VRAM Interactivo

Selecciona un modelo y una cuantización para ver que hardware lo soporta.

Explorador VRAM

¿Qué modelo cabe en su hardware?

VRAM
Parámetros
Contexto
Licencia
Mejor para
Velocidad estimada:
Jetson Orin Nano Super 8GB • €230
RPi 5 + AI HAT+ 2 16GB • €200
Beelink SER8 32GB • €550
ASUS NUC 14 Pro 64GB • €650
Minisforum UM890 Pro 96GB • €750
Mac Mini M4 (24GB) 24GB • €920
Mac Mini M4 Pro (48GB) 48GB • €1840
Mac Studio M4 Max (48GB) 48GB • €2300
Jetson AGX Orin 64GB 64GB • €1840
RTX 4060 Ti 16GB Build 16GB • €1000
RTX 4090 24GB Build 24GB • €2500
RTX 3090 24GB (Used) Build 24GB • €1200
AMD W7900 48GB Build 48GB • €4000
Jetson AGX Thor (128GB) 128GB • €3499
Jetson T4000 (64GB) 64GB • €1999
Cloud A100 80GB (RunPod) 80GB • €from 1.29
Cloud H100 80GB (RunPod) 80GB • €from 2.49
MacBook Pro M4 Pro (24GB) 24GB • €2199
Lenovo ThinkPad P16s Gen 3 (RTX A500) 32GB • €1650
Framework 16 (Radeon RX 7700S) 32GB • €1900
Supermicro 1U Server (L40S 48GB) 48GB • €12000
Dell PowerEdge R760xa (2x RTX A6000) 96GB • €18500
Cabe perfectamente Ajustado (>80%) No cabe

Consejo: Para modelos >13B parámetros, considere el Mac Mini M4 Pro (48GB) o superior.

Guarde su análisis VRAM

WhatsApp
Herramienta

Comparador de Modelos IA

Seleccione 2-3 modelos para una comparación lado a lado.

Proveedores Cloud

Cuando necesitas la nube

Para tareas que requieren modelos mas grandes de lo que cabe en hardware local, estos proveedores ofrecen acceso API con precios competitivos.

OP

OpenAI

$5 gratis

Highest quality reasoning, multimodal, image generation

GPT-4oGPT-4o-minio1 +3
Ver precios →
AN

Anthropic

$5 gratis

Best coding model, longest context (200K), safest

Claude Opus 4Claude Sonnet 4Claude Haiku 3.5
Ver precios →
GO

Google

Free tier (15 RPM) gratis

Cheapest flash model, 1M+ context, multimodal

Gemini 2.0 FlashGemini 1.5 ProImagen 3 +1
Ver precios →
MI

Mistral AI

€5 gratis

EU-based provider, strong coding, function calling

Mistral LargeMistral SmallCodestral +1
Ver precios →
TO

Together AI

$5 gratis

Run any open model via API, cheapest open model hosting

Llama 4 ScoutQwen 3FLUX.1 +1
Ver precios →
GR

Groq

Free tier (30 RPM) gratis

Fastest inference (LPU hardware), free tier, low latency

Llama 4 ScoutGemma 4Mistral (ultra-fast LPU inference)
Ver precios →
RE

Replicate

Free predictions gratis

GPU rental per-second, image/video models, easy deployment

FLUX.1Stable DiffusionWhisper +1
Ver precios →
Modelo Proveedor Input $/1M Output $/1M Contexto
GPT-4o OpenAI $2.50 $10.00 128K
GPT-4o Mini OpenAI $0.15 $0.60 128K
GPT-4 Turbo OpenAI $10.00 $30.00 128K
Claude 3.5 Sonnet Anthropic $3.00 $15.00 200K
Claude 3 Haiku Anthropic $0.25 $1.25 200K
Claude 3 Opus Anthropic $15.00 $75.00 200K
Gemini 1.5 Pro Google $1.25 $5.00 2000K
Gemini 1.5 Flash Google $0.07 $0.30 1000K
Gemini 2.0 Flash Google $0.10 $0.40 1000K
Mistral Large Mistral $2.00 $6.00 128K
Mistral Small Mistral $0.10 $0.30 128K
Llama 3.1 70B Meta/Together $0.88 $0.88 128K
Llama 3.1 8B Meta/Together $0.18 $0.18 128K
DeepSeek V3 DeepSeek $0.27 $1.10 128K

¿Cloud o Local?

Cloud API

  • + Acceso a los modelos mas grandes
  • + Sin hardware que mantener
  • + Escala instantanea
  • - Datos salen de su red
  • - Coste mensual escalable
  • - Dependencia del proveedor

IA Local (Edge)

  • + Datos nunca salen de su red
  • + Coste fijo (hardware una vez)
  • + GDPR by design
  • + Sin latencia de red
  • ~ Modelos mas pequenos
  • ~ Inversion inicial
Consultar despliegue local →
Newsletter

Acceda a recursos exclusivos

Suscríbase para desbloquear 230+ workflows, 43 agentes y 26 plantillas profesionales. Insights semanales sin spam.

Bonus: Checklist EU AI Act gratis al suscribirte
1x por semana Sin spam Cancela cuando quieras
EU AI Act: 99 días para el deadline

15 minutos para evaluar su caso

Consultoría inicial sin compromiso. Analizamos su infraestructura y le recomendamos la arquitectura híbrida óptima.

Sin compromiso 15 minutos Propuesta personalizada

136 páginas de recursos gratuitos · 26 plantillas de compliance · 22 dispositivos certificados