Architecture

I3K RAG Enterprise is a production-grade RAG system designed to run 100% locally, inside the customer perimeter. Every component is open-source and licensed under AGPL-3.0, and no data ever leaves the deployment boundary.

This page is the conceptual overview. For installation and runtime details, see the quickstart and the deployment guide.

Components

A I3K RAG Enterprise deployment is composed of the following services:

Component	Port	Role
FastAPI backend	`:8000`	REST API surface, JWT user management, query orchestration
React + Vite frontend	`:3000`	Modern web UI with real-time updates
Qdrant	`:6333`	Vector database for chunk embeddings
Ollama	`:11434`	Local LLM inference server
SQLite	—	User and authentication database
I3K retrieval orchestrator	in-house	RAG pipeline orchestration in Python
Apache Tika + Tesseract	—	Text extraction and OCR for ingested documents

The default LLM served by Ollama is Qwen3:14b-q4_K_M, with Mistral 7B Q4 available as a lighter alternative. The embedding model is BAAI/bge-m3, which covers 29 languages out of the box. All inference happens locally — on NVIDIA CUDA, AMD ROCm, or CPU-only.

No third-party calls

None of the components reach out to external services. Models, embeddings, vector store, user database and UI all live on the server you control.

The 4-step RAG pipeline

Every request flows through four well-defined steps. The pipeline is orchestrated by our in-house retrieval orchestrator inside the FastAPI backend.

Ingest. Documents are uploaded through the web UI or the REST API. Apache Tika extracts text from PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, TXT, MD, ODT, RTF, HTML and XML. Tesseract handles OCR for scanned PDF, DOCX, PPTX and XLSX files.
Embed & store. Extracted text is chunked, embedded with BAAI/bge-m3, and stored in Qdrant. Each vector carries metadata used for RBAC filtering at retrieval time.
Retrieve. Our in-house retrieval orchestrator runs semantic retrieval from Qdrant. Relevance threshold and top-K are configurable. Role-based filtering is applied at the retrieval layer, so users only ever see chunks they are entitled to.
Generate. Retrieved chunks are passed to the configured LLM (default: Qwen3:14b-q4_K_M via Ollama) to produce an answer grounded in the document context. Zero external calls.

Multi-user and RBAC

I3K RAG Enterprise ships with JWT authentication and three built-in roles:

User — query-only access. Can ask questions and read answers.
Super User — everything User can do, plus document upload and delete.
Admin — full system management, including user accounts and configuration.

Role enforcement happens at the retrieval layer: permissions are translated into Qdrant metadata filters, so a query never reaches the LLM with chunks the caller is not allowed to see.

Data sovereignty

I3K RAG Enterprise is built around a single rule: nothing leaves the perimeter.

All LLMs run inside Ollama on the server itself.
Embeddings are computed locally by the FastAPI backend.
Qdrant stores vectors on local disk.
SQLite holds user and auth state on local disk.
No telemetry, no analytics, no model-update calls by default.

This makes the system suitable for environments where data residency, contractual confidentiality or regulatory requirements forbid sending content to external APIs.

Security

Transport — terminate TLS at the reverse proxy of your choice in front of the FastAPI backend.
Authentication — JWT tokens with configurable expiration.
Password hashing — bcrypt on the user database.
Audit log — application-level log of authentication, document operations and admin actions.
Network egress — none by default. The deployment can be run on a fully air-gapped host.

Scaling

A single-server deployment is production-ready for 10,000+ documents. Vertical scaling is the recommended first step: more RAM lets Qdrant keep larger indices in memory, a GPU lets Ollama serve Qwen3:14b at higher throughput, and additional cores shorten ingestion time.

For larger datasets and horizontal layouts, see the deployment guide.

Backup

Backups are handled by the integrated rclone layer, which supports 70+ storage providers (S3, Azure Blob, GCS, B2, SFTP, WebDAV and many others). The same mechanism backs up the Qdrant collection, the SQLite database and the uploaded source documents, so a restore brings the system back to a consistent state.

Repository

Source code lives at github.com/I3K-IT/RAG-Enterprise under the AGPL-3.0 license.