Comprehensive directory of open-source and specialized AI models for enterprise deployment, including capabilities, benchmarks, and integration guidance.
High-performance 7B model outperforming Llama 2 13B on all benchmarks. Optimized for efficiency with grouped-query attention and sliding window attention.
Multilingual model family with strong performance across 29+ languages. Excellent for international deployments with competitive English benchmarks.
Small language model optimized for edge deployment. Achieves GPT-3.5 level performance at a fraction of the size, ideal for on-device AI.
Mixture-of-Experts model with 671B total parameters (37B active). State-of-the-art open-source performance on coding and reasoning benchmarks.
Meta's latest open model family with vision capabilities. Strong general-purpose performance with extensive fine-tuning ecosystem support.
Latest GLM model with 200K context window and exceptional coding performance. Outperforms competitors in 74 real-world coding tests with 30% better token efficiency.
Open-source MoE model unifying reasoning, coding, and agentic capabilities. Trained on 22T tokens including 7T for code/reasoning. Ranks 3rd on combined benchmarks.
Compact open-source model outperforming Llama-3-8B. Supports 128K context, web browsing, code execution, and function calling across 26 languages.
Unified RAG framework with 16x-128x semantic document compression. End-to-end differentiable retrieval and generation in continuous latent space.
Late interaction retrieval model with token-level matching. Enables efficient semantic search with pre-computed document representations.
Code-specialized Llama model with infilling capabilities. Supports multiple programming languages with strong performance on HumanEval.
Trained on The Stack v2 with 600+ programming languages. Optimized for code completion with 16K context window and fill-in-the-middle support.
Code-focused model achieving GPT-4 level on coding benchmarks. Strong performance on HumanEval, MBPP, and DS-1000 with efficient architecture.
Large Language and Vision Assistant with strong visual reasoning. Supports high-resolution images and achieves competitive performance on vision-language benchmarks.
Unified vision foundation model for captioning, detection, segmentation, and OCR. Compact size with strong zero-shot transfer capabilities.
Vision-language model with native multimodal function calling. Supports 128K context (~150 pages of documents or 1 hour of video). Processes text, charts, tables, and formulas in one pass.
Multi-functionality, multi-linguality, and multi-granularity embedding model. Supports dense, sparse, and multi-vector retrieval in 100+ languages.
LLM-based embedding model with state-of-the-art MTEB performance. Combines instruction-following with semantic similarity for flexible retrieval.
| Model | Category | Parameters | Best For | License |
|---|---|---|---|---|
| Mistral 7B | LLM | 7B | General purpose, efficiency | Apache 2.0 |
| Qwen 2.5 | LLM | 0.5B-72B | Multilingual applications | Apache 2.0 |
| Phi-3 | LLM | 3.8B-14B | Edge/mobile deployment | MIT |
| DeepSeek V3 | LLM (MoE) | 671B (37B active) | Coding, math, reasoning | MIT |
| CLaRa 7B | RAG | 7B | Document compression RAG | Apache 2.0 |
| Code Llama | Coding | 7B-70B | Code generation, infilling | Llama License |
| StarCoder 2 | Coding | 3B-15B | Code completion, 600+ langs | OpenRAIL-M |
| LLaVA 1.6 | Vision | 7B-34B | Visual QA, image understanding | Apache 2.0 |
| GLM-4.6 | LLM (MoE) | 355B (32B active) | Coding, 200K context | MIT |
| GLM-4.5 | LLM (MoE) | 355B (32B active) | Agentic AI, reasoning | MIT |
| GLM-4-9B | LLM | 9B | Multilingual, tool calling | MIT |
| GLM-4.6V | Vision | 9B-106B | Document understanding, tool calling | MIT |
| BGE-M3 | 567M | Multilingual retrieval | MIT |
HuggingFace benchmark rankings for open-source language models across standard evaluation tasks.
Independent AI model comparison with quality, speed, and pricing benchmarks.
Massive Text Embedding Benchmark for comparing embedding model performance.
Code generation benchmark with rigorous evaluation on HumanEval+ and MBPP+.