Overview: Battle-tested patterns for chat, RAG, tool use, and evaluation with provider switching (OpenAI/Azure/OpenRouter).
Key Capabilities: Modular chains; RAG ingestion & chunking; embeddings adapters (FAISS/PGVector/Chroma); reranking options;
tool-calling agents with timeouts/retries/guardrails; evaluation harness for regressions, grounding checks, and toxicity.
Architecture: FastAPI inference server with SSE streaming and tracing; Next.js chat UI; modular retrievers and re-rankers;
eval dashboards and gold datasets.
Security & Compliance: PII scrubbing; logging redaction; token budgeting; prompt-injection filters; domain allowlists.
Performance & Ops: Concurrency tuning, batching, vector index sizing; canary tests and A/B routing hooks.
Quick Start: Set provider keys in .env → docker compose up → open /chat demo and run the sample RAG flow.
Deliverables: Server, chains, evals, datasets, Next.js UI, monitoring dashboards.
FAQ: Self-hosted adapters for Ollama/vLLM; Unicode-tested multilingual pipeline.

Reviews
There are no reviews yet.