Modelos de IA

Nuestra seleccion de los mejores modelos

50 modelos de codigo abierto, seleccionados y probados para despliegue local. Cada uno con su caso de uso, requisitos de VRAM y guia de instalacion.

Fuente: HuggingFace, Ollama. Actualizado abril 2026.

Agnósticos de proveedor: modelos de Anthropic, OpenAI, Google, Meta, Mistral, y comunidad open-source.

0 Modelos

0 Categorias

0 Cloud Providers

0 Dispositivos

Proveedor:

💻 Mi hardware:

/

Recomendaciones VORLUX

Probados en producción

Llama 4 Scout Meta · 109B (17B active MoE)

Largest open context window ever (10M tokens). MoE = only 17B params active, so inference is fast despite 109B total.

Llama 4 Maverick Meta · 400B (17B active, 128 experts)

128-expert MoE with only 17B active. Competes with GPT-4 class. Requires multi-GPU or cloud.

Gemma 3 4B Google · 4B

Smallest multimodal model available. Only 3GB at Q4. Fits Jetson Orin Nano. Perfect for vision tasks on minimal hardware.

53 modelos

← Haz clic en cualquier modelo para ver detalles

Llama 4 Scout

💬 general

Meta · 109B (17B active MoE) · 10M ctx · Llama 4 Community

Largest open context window ever (10M tokens). MoE = only 17B params active, so inference is fast despite 109B total.

Massive context (10M tokens)Document analysisMulti-turn reasoning ⚡ 8 tok/s

Q4_K_M

58GB 24GB+

109GB 24GB+

FP16

218GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Studio M4 Max 192GB (FP16) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)

Q4: 8 tok/s MoE: 17B active

Instalar con Ollama

ollama pull llama4-scout

meta-llama/Llama-4-Scout-109B-Instruct

🤗 Modelo en HuggingFace Hardware

Llama 4 Maverick

💬 general

Meta · 400B (17B active, 128 experts) · 1M ctx · Llama 4 Community

128-expert MoE with only 17B active. Competes with GPT-4 class. Requires multi-GPU or cloud.

Top-tier reasoningComplex analysisResearch ⚡ 3 tok/s

Q4_K_M

200GB 24GB+

400GB 24GB+

FP16

800GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Multi-GPU cluster (not practical for edge)

Velocidad (M3 Pro 32GB)

Q4: 3 tok/s Requires 200GB+

Instalar con Ollama

ollama pull llama4-maverick

meta-llama/Llama-4-Maverick-400B-Instruct

🤗 Modelo en HuggingFace Hardware

Gemma 3 4B

👁️ vision

Google · 4B · 128K ctx · Gemma License

Smallest multimodal model available. Only 3GB at Q4. Fits Jetson Orin Nano. Perfect for vision tasks on minimal hardware.

Lightweight multimodalImage understandingEdge deployment128K context ⚡ 65 tok/s

Q4_K_M

3GB 8GB

4GB 8GB

FP16

8GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB | Mac Mini M4 16GB

Velocidad (M3 Pro 32GB)

Q4: 65 tok/s FP16: 40 tok/s

Instalar con Ollama

ollama pull gemma3:4b

google/gemma-3-4b-it

🤗 Modelo en HuggingFace Hardware

Gemma 3 27B

👁️ vision

Google · 27B (also 1B/4B/12B) · 128K ctx · Gemma License

First multimodal open model that fits on a Mac Mini. 4B variant needs only 3GB. SigLIP vision encoder. 140+ languages.

Multimodal (text + images)Document scanningInvoice processing128K context window ⚡ 14 tok/s

Q4_K_M

16GB 16GB

27GB 24GB+

FP16

54GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

4B: any 8GB+ device | 27B: Mac Mini M4 Pro 32GB+

Velocidad (M3 Pro 32GB)

Q4: 14 tok/s FP16: 7 tok/s

Instalar con Ollama

ollama pull gemma3:27b

google/gemma-3-27b-it

🤗 Modelo en HuggingFace Hardware

Gemma 4 27B

💬 general

Google · 27B · 128K ctx · Apache 2.0

Apache 2.0 license (fully open). Best quality-per-param in the 27B class. Runs on Mac Mini M4 24GB at Q4.

High-quality chatInstruction followingCreative writing ⚡ 14 tok/s

Q4_K_M

16GB 16GB

27GB 24GB+

FP16

54GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 24GB (Q4) or RTX 4060 Ti 16GB (Q4)

Velocidad (M3 Pro 32GB)

Q4: 14 tok/s FP16: 7 tok/s

Instalar con Ollama

ollama pull gemma4:27b

google/gemma-4-27b

🤗 Modelo en HuggingFace Hardware

Gemma 4 E4B

💬 general

Google · 4B · 128K ctx · Apache 2.0

Tiny but punches above its weight. Runs on Jetson Orin Nano. Apache 2.0.

Fast inferenceEdge deploymentMobile/IoT ⚡ 58 tok/s

Q4_K_M

2.5GB 8GB

4GB 8GB

FP16

8GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (FP16) or any device

Velocidad (M3 Pro 32GB)

Q4: 58 tok/s FP16: 35 tok/s

Instalar con Ollama

ollama pull gemma4:e4b

google/gemma-4-4b

🤗 Modelo en HuggingFace Hardware

Qwen 3 32B

💬 general

Alibaba · 32B · 131.072K ctx · Apache 2.0

Thinking mode (deep reasoning) + non-thinking mode. Best multilingual open model. Apache 2.0.

Multilingual (29 languages)ReasoningTool use ⚡ 12 tok/s

Q4_K_M

18GB 24GB

32GB 24GB+

FP16

64GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)

Q4: 12 tok/s FP16: 6 tok/s

Instalar con Ollama

ollama pull qwen3:32b

Qwen/Qwen3-32B

🤗 Modelo en HuggingFace Hardware

Qwen 3 8B

💬 general

Alibaba · 8B · 131.072K ctx · Apache 2.0

Sweet spot for edge: great quality at 8B, dual thinking modes, strong Spanish. Fits on 8GB devices.

General purposeSpanish languageFast reasoning ⚡ 45 tok/s

Q4_K_M

5GB 8GB

8GB 8GB

FP16

16GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q8) or Mac Mini M4 (Q4)

Velocidad (M3 Pro 32GB)

Q4: 45 tok/s FP16: 22 tok/s

Instalar con Ollama

ollama pull qwen3:8b

Qwen/Qwen3-8B

🤗 Modelo en HuggingFace Hardware

Phi-4

💬 general

Microsoft · 14B · 16.384K ctx · MIT

Best-in-class for STEM at 14B. MIT license. Fits on 8GB devices at Q4.

STEM reasoningMathCode generation ⚡ 28 tok/s

Q4_K_M

8GB 8GB

14GB 16GB

FP16

28GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4 (Q8)

Velocidad (M3 Pro 32GB)

Q4: 28 tok/s FP16: 14 tok/s

Instalar con Ollama

ollama pull phi4

microsoft/phi-4

🤗 Modelo en HuggingFace Hardware

DeepSeek V3

💬 general

DeepSeek · 671B (37B active MoE) · 128K ctx · DeepSeek License

Only 37B active params via MoE. Cloud-only for most users but API is very cheap ($0.27/1M input).

GPT-4 class reasoningComplex tasksResearch ⚡ 5 tok/s

Q4_K_M

336GB 24GB+

671GB 24GB+

FP16

1342GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Cloud API recommended ($0.27/$1.10 per 1M tokens)

Velocidad (M3 Pro 32GB)

Q4: 5 tok/s 671B MoE, needs multi-GPU

Instalar con Ollama

ollama pull deepseek-v3

deepseek-ai/DeepSeek-V3

🤗 Modelo en HuggingFace Hardware

DeepSeek R1

💬 general

DeepSeek · 671B (MoE) / 7B-32B distilled · 128K ctx · MIT

Best open-source reasoning model. 14B distill fits Mac Mini M4 16GB. Shows explicit thinking process. MIT license.

Chain-of-thought reasoningMath (97.3% MATH-500)Code debuggingLegal/financial analysis ⚡ 4 tok/s

Q4_K_M

10GB 16GB

14GB 16GB

FP16

28GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

14B: Mac Mini M4 16GB | 32B: 32GB+ unified memory

Velocidad (M3 Pro 32GB)

Q4: 4 tok/s 685B MoE, needs multi-GPU

Instalar con Ollama

ollama pull deepseek-r1:14b

deepseek-ai/DeepSeek-R1

🤗 Modelo en HuggingFace Hardware

Qwen3-Coder-Next

💻 coding

Alibaba · 80B (3B active MoE) · 262.144K ctx · Apache 2.0

Only 3B active params — blazing fast. 256K context for huge codebases. >70% SWE-Bench. Apache 2.0.

Code generation (370 languages)SWE-Bench >70%Large repos ⚡ 5 tok/s

Q4_K_M

42GB 24GB+

80GB 24GB+

FP16

160GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q4) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)

Q4: 5 tok/s Needs 48GB+

Instalar con Ollama

ollama pull qwen3-coder-next

Qwen/Qwen3-Coder-Next-80B

🤗 Modelo en HuggingFace Hardware

Qwen2.5-Coder 32B

💻 coding

Alibaba · 32B · 131.072K ctx · Apache 2.0

Best open coding model at 32B class. Strong at Python, JS, Go. Apache 2.0.

Code completionDebuggingRefactoring ⚡ 12 tok/s

Q4_K_M

18GB 24GB

32GB 24GB+

FP16

64GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)

Q4: 12 tok/s FP16: 6 tok/s

Instalar con Ollama

ollama pull qwen2.5-coder:32b

Qwen/Qwen2.5-Coder-32B-Instruct

🤗 Modelo en HuggingFace Hardware

Qwen2.5-Coder 7B

💻 coding

Alibaba · 7B · 131.072K ctx · Apache 2.0

Best small coding model. Fits on any device. Fast enough for real-time autocomplete.

Fast code assistAutocompleteEdge coding ⚡ 48 tok/s

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q8) or any device

Velocidad (M3 Pro 32GB)

Q4: 48 tok/s FP16: 27 tok/s

Instalar con Ollama

ollama pull qwen2.5-coder:7b

Qwen/Qwen2.5-Coder-7B-Instruct

🤗 Modelo en HuggingFace Hardware

DeepSeek-Coder V2

💻 coding

DeepSeek · 236B (21B active MoE) · 128K ctx · DeepSeek License

MoE with 21B active. Near GPT-4 coding quality. Best via API ($0.14/$0.28 per 1M).

Complex code tasksMulti-file editsArchitecture ⚡ 10 tok/s

Q4_K_M

120GB 24GB+

236GB 24GB+

FP16

472GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Cloud API recommended

Velocidad (M3 Pro 32GB)

Q4: 10 tok/s 236B MoE

Instalar con Ollama

ollama pull deepseek-coder-v2

deepseek-ai/DeepSeek-Coder-V2-Instruct

🤗 Modelo en HuggingFace Hardware

StarCoder2 15B

💻 coding

BigCode · 15B · 16.384K ctx · BigCode OpenRAIL-M

Trained on The Stack v2. Great for autocomplete. Supports 600+ programming languages.

Code completionFill-in-the-middle600+ languages ⚡ 22 tok/s

Q4_K_M

9GB 16GB

15GB 16GB

FP16

30GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 24GB (Q8) or Jetson 8GB (Q4)

Velocidad (M3 Pro 32GB)

Q4: 22 tok/s FP16: 11 tok/s

Instalar con Ollama

ollama pull starcoder2:15b

bigcode/starcoder2-15b

🤗 Modelo en HuggingFace Hardware

CodeLlama 34B

💻 coding

Meta · 34B · 16.384K ctx · Llama 2 Community

Proven and stable. Strong at Python, C++, Java. Good for production use where stability matters.

Infill generationPython specialistLarge codebase nav ⚡ 10 tok/s

Q4_K_M

19GB 24GB

34GB 24GB+

FP16

68GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)

Q4: 10 tok/s

Instalar con Ollama

ollama pull codellama:34b

codellama/CodeLlama-34b-Instruct-hf

🤗 Modelo en HuggingFace Hardware

LLaVA 1.6 34B

👁️ vision

LLaVA Team · 34B · 4.096K ctx · Apache 2.0

Best open vision model for image understanding. Can describe images, answer questions about photos, read documents.

Image understandingVisual QADocument OCR ⚡ 10 tok/s

Q4_K_M

19GB 24GB

34GB 24GB+

FP16

68GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q8) or RTX 4090 (Q4)

Velocidad (M3 Pro 32GB)

Q4: 10 tok/s

Instalar con Ollama

ollama pull llava:34b

llava-hf/llava-v1.6-34b-hf

🤗 Modelo en HuggingFace Hardware

Gemma 3n E4B

👁️ vision

Google · 4B (edge-optimized) · 32.768K ctx · Apache 2.0

Edge-optimized multimodal. Runs on phones and Jetson. Audio, video, image, text — all in 4B params.

On-device visionAudio+Video+Image+TextMobile AI ⚡ 55 tok/s

Q4_K_M

2.5GB 8GB

4GB 8GB

FP16

8GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB or phone

Velocidad (M3 Pro 32GB)

Q4: 55 tok/s FP16: 32 tok/s Nano variant

Instalar con Ollama

ollama pull gemma3n:e4b

google/gemma-3n-e4b

🤗 Modelo en HuggingFace Hardware

Qwen2.5-VL 72B

👁️ vision

Alibaba · 72B · 32.768K ctx · Apache 2.0

Understands video, multiple images, and documents. Agent-ready with tool use. Apache 2.0.

Video understandingMulti-image analysisDocument parsing ⚡ 5 tok/s

Q4_K_M

40GB 24GB+

72GB 24GB+

FP16

144GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 Pro 48GB (Q4) or cloud

Velocidad (M3 Pro 32GB)

Q4: 5 tok/s Needs 48GB+

Instalar con Ollama

ollama pull qwen2.5-vl:72b

Qwen/Qwen2.5-VL-72B-Instruct

🤗 Modelo en HuggingFace Hardware

InternVL 2.5 78B

👁️ vision

OpenGVLab · 78B · 32.768K ctx · MIT

Top-tier on chart/diagram understanding benchmarks. MIT license. Strong OCR.

Chart/diagram understandingOCRVisual reasoning ⚡ 4 tok/s

Q4_K_M

42GB 24GB+

78GB 24GB+

FP16

156GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Cloud or multi-GPU

Velocidad (M3 Pro 32GB)

Q4: 4 tok/s Needs 48GB+

Instalar con Ollama

ollama pull internvl2.5:78b

OpenGVLab/InternVL2.5-78B

🤗 Modelo en HuggingFace Hardware

Florence 2 Large

👁️ vision

Microsoft · 0.77B · 4.096K ctx · MIT

Tiny (0.77B) but excellent at CV tasks. Runs on any device. MIT license. Perfect for edge vision.

Object detectionImage captioningOCRSegmentation

Q4_K_M

512MB 8GB

819MB 8GB

FP16

1.5GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device (even Raspberry Pi)

Velocidad (M3 Pro 32GB)

FP16: 40 tok/s Light vision model

Instalar con Ollama

ollama pull florence2

microsoft/Florence-2-large

🤗 Modelo en HuggingFace Hardware

Moondream 2

👁️ vision

Moondream · 1.8B · 2.048K ctx · Apache 2.0

Smallest practical vision model. Runs on Jetson, RPi, phones. Apache 2.0. Great for IoT vision.

Edge visionImage QALightweight multimodal

Q4_K_M

1.1GB 8GB

1.8GB 8GB

FP16

3.6GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano, RPi 5, any device

Velocidad (M3 Pro 32GB)

FP16: 65 tok/s Ultra-light vision

Instalar con Ollama

ollama pull moondream

moondream/moondream2

🤗 Modelo en HuggingFace Hardware

all-MiniLM-L6-v2

🔗 embedding

Sentence Transformers · 22M · 0.512K ctx · Apache 2.0

Industry standard for semantic search. 384 dimensions. Blazing fast. We use this in Vorlux AI RAG system.

Semantic searchRAG retrievalDocument similarity

Q4_K_M

31MB 8GB

51MB 8GB

FP16

92MB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device (22M params = negligible VRAM)

Velocidad (M3 Pro 32GB)

FP16: 500 tok/s Embedding

Instalar con Ollama

ollama pull all-minilm:l6-v2

sentence-transformers/all-MiniLM-L6-v2

🤗 Modelo en HuggingFace Hardware

Nomic Embed Text v1.5

🔗 embedding

Nomic AI · 137M · 8.192K ctx · Apache 2.0

8K context (16x MiniLM). Better for long documents. 768 dimensions. Apache 2.0.

Long document embedding8K context chunksSearch

Q4_K_M

82MB 8GB

143MB 8GB

FP16

276MB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)

FP16: 350 tok/s Embedding model

Instalar con Ollama

ollama pull nomic-embed-text

nomic-ai/nomic-embed-text-v1.5

🤗 Modelo en HuggingFace Hardware

BGE Large v1.5

🔗 embedding

BAAI · 335M · 0.512K ctx · MIT

Top MTEB benchmark scores. 1024 dimensions. MIT license. Best quality per compute for English.

High-quality retrievalRerankingEnglish focus

Q4_K_M

184MB 8GB

348MB 8GB

FP16

686MB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)

FP16: 320 tok/s Embedding

Instalar con Ollama

ollama pull bge-large

BAAI/bge-large-en-v1.5

🤗 Modelo en HuggingFace Hardware

GTE Base

🔗 embedding

Alibaba DAMO · 109M · 0.512K ctx · MIT

Great balance of speed and quality. 768 dimensions. MIT license. Good for production RAG.

Balanced speed/qualityMultilingual embeddingProduction RAG

Q4_K_M

61MB 8GB

113MB 8GB

FP16

225MB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)

FP16: 400 tok/s Embedding

Instalar con Ollama

ollama pull gte-base

thenlper/gte-base

🤗 Modelo en HuggingFace Hardware

Arctic Embed L v2

🔗 embedding

Snowflake · 568M · 8.192K ctx · Apache 2.0

8K context, 1024 dimensions, strong multilingual. Top MTEB scores. Apache 2.0.

Enterprise searchLong-context retrievalMultilingual

Q4_K_M

307MB 8GB

584MB 8GB

FP16

1.1GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device

Velocidad (M3 Pro 32GB)

FP16: 300 tok/s Embedding

Instalar con Ollama

ollama pull snowflake-arctic-embed:l

Snowflake/snowflake-arctic-embed-l-v2.0

🤗 Modelo en HuggingFace Hardware

FLUX.1 Dev

🎨 image gen

Black Forest Labs · 12B · — ctx · FLUX.1-dev Non-Commercial

Best open image model. Exceptional text rendering and prompt adherence. NVFP4 variant runs on 8GB GPUs.

High-quality image generationText renderingPrompt following

Q4_K_M

7GB 8GB

12GB 16GB

FP16

24GB 24GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti 16GB (FP8) or RTX 4090 (FP16)

Velocidad (M3 Pro 32GB)

~2 images/min on RTX 3080

Usar con ComfyUI → Ver workflows

black-forest-labs/FLUX.1-dev

🤗 Modelo en HuggingFace Hardware

SDXL 1.0

🎨 image gen

Stability AI · 6.6B (2.6B UNet) · — ctx · Stability AI Community

Massive LoRA/ControlNet ecosystem. Fast on consumer GPUs. Still the most flexible image model.

Fast image genLoRA ecosystemControlNet

Q4_K_M

2GB 8GB

3.5GB 8GB

FP16

6.5GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti (FP16) or any 8GB+ GPU

Velocidad (M3 Pro 32GB)

~6 images/min on RTX 3080

Usar con ComfyUI → Ver workflows

stabilityai/stable-diffusion-xl-base-1.0

🤗 Modelo en HuggingFace Hardware

SDXL Turbo

🎨 image gen

Stability AI · 6.6B · — ctx · Stability AI Community

Single-step generation — near real-time images. Great for interactive apps and demos.

Real-time generation1-step inferenceInteractive

Q4_K_M

2GB 8GB

3.5GB 8GB

FP16

6.5GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any 8GB+ GPU

Velocidad (M3 Pro 32GB)

~15 images/min (RTX 3080), ~3/min (M3 Pro)

Usar con ComfyUI → Ver workflows

stabilityai/sdxl-turbo

🤗 Modelo en HuggingFace Hardware

SD 3.5 Large

🎨 image gen

Stability AI · 8B · — ctx · Stability AI Community

MMDiT architecture. Better prompt understanding than SDXL. Good text rendering.

PhotorealismComplex compositionsHigh resolution

Q4_K_M

5GB 8GB

8GB 8GB

FP16

16GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti 16GB or Mac Mini M4 24GB

Velocidad (M3 Pro 32GB)

~4 images/min (RTX 3080), ~1/min (M3 Pro)

Usar con ComfyUI → Ver workflows

stabilityai/stable-diffusion-3.5-large

🤗 Modelo en HuggingFace Hardware

Stable Cascade

🎨 image gen

Stability AI · 5.1B (3-stage) · — ctx · Stability AI Non-Commercial

3-stage pipeline is more VRAM efficient than SDXL. Good quality at lower compute cost.

Efficient generationLower VRAMFast training

Q4_K_M

3GB 8GB

5GB 8GB

FP16

10GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any 6GB+ GPU

Velocidad (M3 Pro 32GB)

~3 images/min (RTX 3080)

Usar con ComfyUI → Ver workflows

stabilityai/stable-cascade

🤗 Modelo en HuggingFace Hardware

FLUX.1 Schnell

🎨 image gen

Black Forest Labs · 12B · — ctx · Apache 2.0

Apache 2.0 licensed FLUX variant. 4-step generation (vs 50 for dev). Commercial use OK.

Fast FLUX generation4-step inferenceCommercial use

Q4_K_M

7GB 8GB

12GB 16GB

FP16

24GB 24GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti 16GB (FP8) or RTX 4090

Velocidad (M3 Pro 32GB)

~8 images/min (RTX 3080), ~2/min (M3 Pro)

Usar con ComfyUI → Ver workflows

black-forest-labs/FLUX.1-schnell

🤗 Modelo en HuggingFace Hardware

LTX Video

🎬 video gen

Lightricks · 2B · — ctx · Apache 2.0

Fastest open video model. Generates 5-second clips from text or images. NVFP4 optimized. Apache 2.0.

Text-to-videoImage-to-videoFast generation

Q4_K_M

1.2GB 8GB

2GB 8GB

FP16

4GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti (FP16) or any 8GB+ GPU

Velocidad (M3 Pro 32GB)

~30s per 4s clip (RTX 3080)

Usar con ComfyUI → Ver workflows

Lightricks/LTX-Video

🤗 Modelo en HuggingFace Hardware

Hunyuan Video

🎬 video gen

Tencent · 13B · — ctx · Tencent Hunyuan Community

Best open video quality. Longer clips than LTX. Strong motion consistency. Needs more VRAM.

High-quality videoMotion consistencyLong clips

Q4_K_M

8GB 8GB

13GB 16GB

FP16

26GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4090 24GB or A100

Velocidad (M3 Pro 32GB)

~60s per 4s clip (RTX 3080)

Usar con ComfyUI → Ver workflows

tencent/HunyuanVideo

🤗 Modelo en HuggingFace Hardware

Cosmos 1.0 Diffusion 7B

🎬 video gen

NVIDIA · 7B · — ctx · NVIDIA Open Model License

NVIDIA's world generation model. Understands physics. Great for game dev and simulation.

World generationPhysical simulationGame content

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4060 Ti 16GB or RTX 4090

Velocidad (M3 Pro 32GB)

~45s per 4s clip (RTX 3080)

Usar con ComfyUI → Ver workflows

NVIDIA/Cosmos-1.0-Diffusion-7B-Text2World

🤗 Modelo en HuggingFace Hardware

WAN 2.1 T2V 14B

🎬 video gen

WAN AI · 14B · — ctx · Apache 2.0

Strong Chinese-English bilingual video generation. Apache 2.0. Good motion and detail.

Text-to-videoChinese + EnglishHigh resolution

Q4_K_M

8GB 8GB

14GB 16GB

FP16

28GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4090 or A100

Velocidad (M3 Pro 32GB)

~90s per 4s clip (RTX 3080)

Usar con ComfyUI → Ver workflows

Wan-AI/Wan2.1-T2V-14B

🤗 Modelo en HuggingFace Hardware

Mochi 1

🎬 video gen

Genmo · 10B · — ctx · Apache 2.0

Exceptional motion quality. Apache 2.0. Natural human movement and physics.

High-fidelity motionCinematic qualityNatural movement

Q4_K_M

6GB 8GB

10GB 16GB

FP16

20GB 24GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

RTX 4090 or cloud

Velocidad (M3 Pro 32GB)

~120s per 4s clip (A100)

Usar con ComfyUI → Ver workflows

genmo/mochi-1-preview

🤗 Modelo en HuggingFace Hardware

Whisper Large v3 Turbo

🎵 audio

OpenAI · 809M · 0.03K ctx · MIT

Industry standard for transcription. 99 languages. MIT license. Runs on any device.

Speech-to-textTranscription99 languages

Q4_K_M

512MB 8GB

819MB 8GB

FP16

1.6GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device (809M params)

Velocidad (M3 Pro 32GB)

~30x real-time (M3 Pro), ~50x (RTX 3080)

Instalar con Ollama

ollama pull whisper-large-v3-turbo

openai/whisper-large-v3-turbo

🤗 Modelo en HuggingFace Hardware

Bark

🎵 audio

Suno · 1.3B · — ctx · MIT

Generates realistic speech, music, and sound effects from text. MIT license. Multilingual.

Text-to-speechVoice cloningSound effects

Q4_K_M

819MB 8GB

1.3GB 8GB

FP16

2.6GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device with 4GB+ RAM

Velocidad (M3 Pro 32GB)

~0.5x real-time (RTX 3080) — slow TTS

Instalar con Ollama

ollama pull bark

suno/bark

🤗 Modelo en HuggingFace Hardware

MusicGen Large

🎵 audio

Meta · 3.3B · — ctx · CC BY-NC 4.0

Generate music from text descriptions. Great for content creators. Multiple styles.

Music generationBackground musicAudio branding

Q4_K_M

2GB 8GB

3.3GB 8GB

FP16

6.6GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device with 8GB+ RAM

Velocidad (M3 Pro 32GB)

~2x real-time (RTX 3080) — music generation

Instalar con Ollama

ollama pull musicgen-large

facebook/musicgen-large

🤗 Modelo en HuggingFace Hardware

XTTS v2

🎵 audio

Coqui · 0.5B · — ctx · Coqui Public License

Clone any voice from 6-second sample. 17 languages including Spanish. Real-time on CPU.

Voice cloning17 languagesReal-time TTS

Q4_K_M

307MB 8GB

512MB 8GB

FP16

1GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device (runs on CPU)

Velocidad (M3 Pro 32GB)

~3x real-time (RTX 3080) — voice cloning TTS

Instalar con Ollama

ollama pull xtts-v2

coqui/XTTS-v2

🤗 Modelo en HuggingFace Hardware

Parler TTS Large

🎵 audio

Parler TTS · 2.3B · — ctx · Apache 2.0

Control voice via text description ('A warm female voice, clear, podcast style'). Apache 2.0.

Controllable TTSStyle descriptionPodcast generation

Q4_K_M

1.4GB 8GB

2.3GB 8GB

FP16

4.6GB 8GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Any device with 4GB+ RAM

Velocidad (M3 Pro 32GB)

~5x real-time (RTX 3080) — natural TTS

Instalar con Ollama

ollama pull parler-tts

parler-tts/parler-tts-large-v1

🤗 Modelo en HuggingFace Hardware

BioMistral 7B

🔬 specialized

BioMistral · 7B · 32.768K ctx · Apache 2.0

Fine-tuned on PubMed. Medical QA, clinical text analysis. Apache 2.0. EU AI Act: high-risk category.

Medical QAClinical NLPBiomedical research ⚡ 42 tok/s

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4

Velocidad (M3 Pro 32GB)

Q4: 42 tok/s Medical domain fine-tune

Instalar con Ollama

ollama pull biomistral

BioMistral/BioMistral-7B

🤗 Modelo en HuggingFace Hardware

SaulLM 7B

🔬 specialized

Equall · 7B · 8.192K ctx · MIT

Trained on legal corpora. Understanding of legal concepts, statutes, case law. MIT license.

Legal analysisContract reviewCase law research ⚡ 42 tok/s

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or Mac Mini M4

Velocidad (M3 Pro 32GB)

Q4: 42 tok/s Legal domain fine-tune

Instalar con Ollama

ollama pull saullm

Equall/Saul-7B-Instruct-v1

🤗 Modelo en HuggingFace Hardware

DeepSeek-Math 7B

🔬 specialized

DeepSeek · 7B · 4.096K ctx · DeepSeek License

State-of-the-art math reasoning at 7B. Solves complex problems step by step.

Mathematical reasoningTheorem provingCalculations ⚡ 45 tok/s

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)

Q4: 45 tok/s Math specialized

Instalar con Ollama

ollama pull deepseek-math

deepseek-ai/DeepSeek-Math-7B-Instruct

🤗 Modelo en HuggingFace Hardware

Mistral 7B v0.3

🔬 specialized

Mistral AI · 7B · 32.768K ctx · Apache 2.0

Best 7B for structured tasks: function calling, JSON output, tool use. Apache 2.0. Fast and reliable.

Function callingStructured outputRAG ⚡ 44 tok/s

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)

Q4: 44 tok/s FP16: 22 tok/s

Instalar con Ollama

ollama pull mistral:7b-instruct-v0.3

mistralai/Mistral-7B-Instruct-v0.3

🤗 Modelo en HuggingFace Hardware

Hermes 3 8B

🔬 specialized

Nous Research · 8B · 131.072K ctx · Llama 3.1 Community

Best community fine-tune for agentic use. Excellent at following complex system prompts and using tools.

Agentic tasksTool useSystem prompts ⚡ 42 tok/s

Q4_K_M

5GB 8GB

8GB 8GB

FP16

16GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q8) or Mac Mini M4

Velocidad (M3 Pro 32GB)

Q4: 42 tok/s FP16: 20 tok/s

Instalar con Ollama

ollama pull hermes3:8b

NousResearch/Hermes-3-Llama-3.1-8B

🤗 Modelo en HuggingFace Hardware

Finance-LLM 13B

🔬 specialized

TheBloke (quantized) · 13B · 4.096K ctx · Llama 2 Community

Fine-tuned on financial data. Understands financial terminology, ratios, and market concepts.

Financial analysisReport generationMarket research ⚡ 22 tok/s

Q4_K_M

7.5GB 8GB

13GB 16GB

FP16

26GB 24GB+

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Mac Mini M4 24GB (Q8) or RTX 4060 Ti (Q4)

Velocidad (M3 Pro 32GB)

Q4: 22 tok/s Financial domain

Instalar con Ollama

ollama pull finance-llm:13b

TheBloke/finance-LLM-13B-GGUF

🤗 Modelo en HuggingFace Hardware

Dolphin 2.9 8B

🔬 specialized

Cognitive Computations · 8B · 8.192K ctx · Llama 3 Community

Uncensored fine-tune — no alignment filtering. Useful for creative tasks, red-teaming, research.

Uncensored assistantCreative writingRoleplay ⚡ 42 tok/s

Q4_K_M

5GB 8GB

8GB 8GB

FP16

16GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q8) or any device

Velocidad (M3 Pro 32GB)

Q4: 42 tok/s FP16: 20 tok/s

Instalar con Ollama

ollama pull dolphin-llama3:8b

cognitivecomputations/dolphin-2.9-llama3-8b

🤗 Modelo en HuggingFace Hardware

Llama Guard 3

🔬 specialized

Meta · 8B · 131.072K ctx · Llama 3.1 Community

Meta's safety classifier. Run alongside any model to filter harmful content. Essential for production deployments.

Content moderationSafety classificationGuardrails ⚡ 40 tok/s

Q4_K_M

5GB 8GB

8GB 8GB

FP16

16GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)

Q4: 40 tok/s Safety classifier

Instalar con Ollama

ollama pull llama-guard3:8b

meta-llama/Llama-Guard-3-8B

🤗 Modelo en HuggingFace Hardware

OCRonos Vintage

🔬 specialized

PleIAs · 7B · 8.192K ctx · Apache 2.0

Specialized in correcting OCR errors from historical documents. Perfect for digitization projects. Apache 2.0.

Historical OCR correctionDocument digitizationArchive processing

Q4_K_M

4.5GB 8GB

7GB 8GB

FP16

14GB 16GB

08GB16GB24GB

Detalles del modelo

Hardware recomendado

Jetson Orin Nano 8GB (Q4) or any device

Velocidad (M3 Pro 32GB)

~5 pages/min (M3 Pro) — OCR model

Instalar con Ollama

ollama pull ocronos

PleIAs/OCRonos-Vintage

🤗 Modelo en HuggingFace Hardware

☁

Modelos Cloud / API

24 modelos evaluados con benchmarks Artificial Analysis

Estos modelos se acceden via API. Comparados por Intelligence Index (II), MMLU-Pro y GPQA. Los precios son $/1M tokens.

GPT-4o

OpenAI

II 86

MMLU-Pro 88.7% GPQA 53.6% 142 tok/s

In: $5.00 Out: $15.00

2024-05-13

GPT-4o-mini

OpenAI

II 82

MMLU-Pro 86.2% GPQA 48.6% 215 tok/s

In: $0.15 Out: $0.60

2024-07-18

o1-preview

OpenAI

II 91

MMLU-Pro 87.7% GPQA 75% 38 tok/s

In: $15.00 Out: $60.00

2024-09-12

o1-mini

OpenAI

II 87

MMLU-Pro 85.2% GPQA 64.4% 95 tok/s

In: $3.00 Out: $12.00

2024-09-12

Claude 3.5 Sonnet

Anthropic

II 88

MMLU-Pro 88.3% GPQA 55.2% 118 tok/s

In: $3.00 Out: $15.00

2024-06-20

Claude 3.5 Sonnet v2

Anthropic

II 89

MMLU-Pro 89.1% GPQA 57.8% 125 tok/s

In: $3.00 Out: $15.00

2024-10-22

Claude 3.5 Opus

Anthropic

II 92

MMLU-Pro 90.8% GPQA 62.1% 48 tok/s

In: $15.00 Out: $75.00

2024-07-22

Claude 3 Haiku

Anthropic

II 75

MMLU-Pro 77.2% GPQA 38.5% 280 tok/s

In: $0.25 Out: $1.25

2024-03-07

Gemini 1.5 Pro

Google

II 87

MMLU-Pro 88.6% GPQA 54.8% 98 tok/s

In: $3.50 Out: $10.50

2024-05-14

Gemini 1.5 Flash

Google

II 83

MMLU-Pro 85.3% GPQA 46.2% 245 tok/s

In: $0.07 Out: $0.30

2024-05-14

Gemini 2.0 Flash

Google

II 85

MMLU-Pro 86.7% GPQA 49.8% 310 tok/s

In: $0.00 Out: $0.00

2024-12-11

Gemini 2.0 Flash Thinking

Google

II 88

MMLU-Pro 88.4% GPQA 55.6% 78 tok/s

In: $0.00 Out: $0.00

2024-12-19

Llama 3.1 405B

Meta

II 84

MMLU-Pro 86.1% GPQA 46.8% 42 tok/s

In: $3.50 Out: $14.00

2024-07-23

Llama 3.1 70B

Meta

II 82

MMLU-Pro 84.2% GPQA 43.5% 88 tok/s

In: $0.65 Out: $2.75

2024-07-23

Llama 3.1 8B

Meta

II 77

MMLU-Pro 79.4% GPQA 36.2% 195 tok/s

In: $0.20 Out: $0.80

2024-07-23

Llama 3.2 1B

Meta

II 72

MMLU-Pro 74.8% GPQA 31.5% 380 tok/s

In: $0.10 Out: $0.40

2024-09-25

Llama 3.2 3B

Meta

II 76

MMLU-Pro 77.9% GPQA 35.8% 320 tok/s

In: $0.10 Out: $0.40

2024-09-25

Mistral Large

Mistral AI

II 85

MMLU-Pro 86.9% GPQA 48.2% 72 tok/s

In: $2.00 Out: $6.00

2024-02-29

Mistral Nemo

Mistral AI

II 78

MMLU-Pro 79.8% GPQA 38.4% 185 tok/s

In: $0.15 Out: $0.15

2024-07-22

Mixtral 8x7B

Mistral AI

II 79

MMLU-Pro 80.6% GPQA 39.2% 95 tok/s

In: $0.70 Out: $2.40

2023-12-11

Mistral Small

Mistral AI

II 80

MMLU-Pro 82.1% GPQA 41.3% 155 tok/s

In: $0.20 Out: $0.60

2024-09-17

Qwen 2.5 72B

Qwen

II 83

MMLU-Pro 85.4% GPQA 44.8% 82 tok/s

In: $0.90 Out: $3.60

2024-09-19

Qwen 2.5 32B

Qwen

II 80

MMLU-Pro 82.8% GPQA 40.5% 145 tok/s

In: $0.50 Out: $2.00

2024-09-19

Qwen 2.5 7B

Qwen

II 76

MMLU-Pro 78.6% GPQA 35.2% 210 tok/s

In: $0.20 Out: $0.80

2024-09-19

QwQ 32B

Qwen

II 84

MMLU-Pro 86.2% GPQA 47.8% 78 tok/s

In: $0.50 Out: $2.00

2024-11-28

Explorador de VRAM Interactivo

Selecciona un modelo y una cuantización para ver que hardware lo soporta.

Explorador VRAM

¿Qué modelo cabe en su hardware?

Categoría

Tengo este hardware

Modelo (buscar)

Cuantización

Modelos populares

VRAM —

Parámetros —

Contexto —

Licencia —

Mejor para —

Velocidad estimada: —

Jetson Orin Nano Super 8GB • €230

RPi 5 + AI HAT+ 2 16GB • €200

Beelink SER8 32GB • €550

ASUS NUC 14 Pro 64GB • €650

Minisforum UM890 Pro 96GB • €750

Mac Mini M4 (24GB) 24GB • €920

Mac Mini M4 Pro (48GB) 48GB • €1840

Mac Studio M4 Max (48GB) 48GB • €2300

Jetson AGX Orin 64GB 64GB • €1840

RTX 4060 Ti 16GB Build 16GB • €1000

RTX 4090 24GB Build 24GB • €2500

RTX 3090 24GB (Used) Build 24GB • €1200

AMD W7900 48GB Build 48GB • €4000

Jetson AGX Thor (128GB) 128GB • €3499

Jetson T4000 (64GB) 64GB • €1999

Cloud A100 80GB (RunPod) 80GB • €from 1.29

Cloud H100 80GB (RunPod) 80GB • €from 2.49

MacBook Pro M4 Pro (24GB) 24GB • €2199

Lenovo ThinkPad P16s Gen 3 (RTX A500) 32GB • €1650

Framework 16 (Radeon RX 7700S) 32GB • €1900

Supermicro 1U Server (L40S 48GB) 48GB • €12000

Dell PowerEdge R760xa (2x RTX A6000) 96GB • €18500

Cabe perfectamente Ajustado (>80%) No cabe

Consejo: Para modelos >13B parámetros, considere el Mac Mini M4 Pro (48GB) o superior.

Guarde su análisis VRAM

Siguiente paso: Hardware | ROI | Subvenciones | Test de preparación

Herramienta

Comparador de Modelos IA

Seleccione 2-3 modelos para una comparación lado a lado.

Proveedores Cloud

Cuando necesitas la nube

Para tareas que requieren modelos mas grandes de lo que cabe en hardware local, estos proveedores ofrecen acceso API con precios competitivos.

OpenAI

$5 gratis

Highest quality reasoning, multimodal, image generation

GPT-4oGPT-4o-minio1 +3

Ver precios →

Anthropic

$5 gratis

Best coding model, longest context (200K), safest

Claude Opus 4Claude Sonnet 4Claude Haiku 3.5

Ver precios →

Google

Free tier (15 RPM) gratis

Cheapest flash model, 1M+ context, multimodal

Gemini 2.0 FlashGemini 1.5 ProImagen 3 +1

Ver precios →

Mistral AI

€5 gratis

EU-based provider, strong coding, function calling

Mistral LargeMistral SmallCodestral +1

Ver precios →

Together AI

$5 gratis

Run any open model via API, cheapest open model hosting

Llama 4 ScoutQwen 3FLUX.1 +1

Ver precios →

Groq

Free tier (30 RPM) gratis

Fastest inference (LPU hardware), free tier, low latency

Llama 4 ScoutGemma 4Mistral (ultra-fast LPU inference)

Ver precios →

Replicate

Free predictions gratis

GPU rental per-second, image/video models, easy deployment

FLUX.1Stable DiffusionWhisper +1

Ver precios →

Modelo	Proveedor	Input $/1M	Output $/1M	Contexto
GPT-4o	OpenAI	$2.50	$10.00	128K
GPT-4o Mini	OpenAI	$0.15	$0.60	128K
GPT-4 Turbo	OpenAI	$10.00	$30.00	128K
Claude 3.5 Sonnet	Anthropic	$3.00	$15.00	200K
Claude 3 Haiku	Anthropic	$0.25	$1.25	200K
Claude 3 Opus	Anthropic	$15.00	$75.00	200K
Gemini 1.5 Pro	Google	$1.25	$5.00	2000K
Gemini 1.5 Flash	Google	$0.07	$0.30	1000K
Gemini 2.0 Flash	Google	$0.10	$0.40	1000K
Mistral Large	Mistral	$2.00	$6.00	128K
Mistral Small	Mistral	$0.10	$0.30	128K
Llama 3.1 70B	Meta/Together	$0.88	$0.88	128K
Llama 3.1 8B	Meta/Together	$0.18	$0.18	128K
DeepSeek V3	DeepSeek	$0.27	$1.10	128K

¿Cloud o Local?

Cloud API

+ Acceso a los modelos mas grandes
+ Sin hardware que mantener
+ Escala instantanea
- Datos salen de su red
- Coste mensual escalable
- Dependencia del proveedor

IA Local (Edge)

+ Datos nunca salen de su red
+ Coste fijo (hardware una vez)
+ GDPR by design
+ Sin latencia de red
~ Modelos mas pequenos
~ Inversion inicial

Consultar despliegue local →

Newsletter

Acceda a recursos exclusivos

Suscríbase para desbloquear 230+ workflows, 43 agentes y 26 plantillas profesionales. Insights semanales sin spam.

Bonus: Checklist EU AI Act gratis al suscribirte

1x por semana Sin spam Cancela cuando quieras

15 minutos para evaluar su caso

Consultoría inicial sin compromiso. Analizamos su infraestructura y le recomendamos la arquitectura híbrida óptima.

Agendar consultoría inicial WhatsApp

Sin compromiso 15 minutos Propuesta personalizada

136 páginas de recursos gratuitos · 26 plantillas de compliance · 22 dispositivos certificados