Practical implementation guides and configurations for LLM serving, RAG systems, vector databases, fine-tuning, and more.
Production configurations for llama.cpp, Ollama, vLLM, and DeepSeek R1. Optimized settings for different hardware.
View recipesHybrid search with BM25, optimal chunking strategies, reranking with cross-encoders, and ColBERT late interaction.
View recipesProduction configurations for Qdrant and ChromaDB with HNSW index tuning and performance optimization.
View recipesGGUF quantization selection guide and EXL2 vs GGUF decision matrix. Memory requirements by VRAM.
View recipesUnsloth LoRA configurations with QLoRA, optimal rank selection, and training hyperparameters.
View recipesLangGraph agent workflows and CrewAI multi-agent configurations with state management and tool integration.
View recipesSystem prompt best practices and few-shot chain-of-thought templates for improved reasoning.
View recipesCursor .cursorrules configuration and Claude Code context management with CLAUDE.md patterns.
View recipesCommon pitfalls and solutions for flash attention, context windows, RAG failures, OOM errors, and more.
View solutionsSelf-RAG architecture, production deployment checklist, and embedding model selection matrix.
View guides