Quickstart

I3K RAG Enterprise is a self-hosted RAG platform that runs 100% on your own infrastructure. No US cloud dependencies, no data leaving your perimeter, AGPL-3.0 licensed. The one-command installer brings up the full stack — Qdrant, Ollama, the FastAPI backend, the React frontend and the OCR pipeline — in roughly one hour, or about 15 minutes on a fast connection.

This guide walks you from a clean Ubuntu host to your first RAG query.

Requirements

OS: Ubuntu 20.04+ (22.04 recommended)
RAM: 16 GB minimum, 32 GB recommended
Storage: 50 GB or more
GPU: NVIDIA CUDA (8–16 GB VRAM recommended), AMD ROCm, or CPU-only
Network: 80+ Mbit/s recommended for the initial model pull

GPU vs CPU

GPU is strongly recommended for usable end-to-end latency on Qwen3:14b-q4_K_M. CPU-only works for development and small corpora; expect higher response times.

Installation (1 command)

Clone the repository and run the installer:

git clone https://github.com/I3K-IT/RAG-Enterprise.git
cd RAG-Enterprise
./install.sh

The installer is interactive. It asks you two things:

GPU type — NVIDIA, AMD or CPU. It configures Ollama and the embedding runtime accordingly.
LLM model — Qwen3:14b-q4_K_M (default, best quality on 16 GB VRAM) or Mistral 7B Q4 (lighter, fits on 8 GB).

The script then runs unattended. Total time is around one hour on a typical connection, ~15 minutes on a fast link.

What the script does

Pulls and configures Qdrant as the vector store (port 6333)
Installs Ollama (port 11434) and downloads the chosen LLM
Sets up the FastAPI backend (port 8000) with the I3K retrieval orchestrator pipeline
Builds and serves the React + Vite frontend (port 3000)
Initializes the SQLite user database with JWT auth and the three roles (User, Super User, Admin)
Installs Apache Tika and Tesseract for document parsing and OCR
Downloads the BAAI/bge-m3 embedding model (29 languages)

When it finishes, the installer prints the admin credentials it generated. Save them.

Open the frontend:

http://localhost:3000

Log in with the admin account printed by the installer. From the left sidebar, go to Documents and upload your first file. Supported formats: PDF (with OCR for scanned pages), DOCX/DOC, PPTX/PPT, XLSX/XLS, TXT, MD, ODT, RTF, HTML, XML.

First upload and query

On upload, the backend pipeline runs:

Extraction — Apache Tika parses the file; Tesseract handles scanned PDFs via OCR.
Chunking — the text is split into semantic chunks.
Embedding — each chunk is encoded with BAAI/bge-m3 (multilingual, 29 languages).
Indexing — vectors are written to Qdrant with their metadata.

Once indexing completes, ask a question from the chat UI. The backend retrieves the relevant chunks from Qdrant, hands them to Ollama running your chosen LLM, and returns a grounded answer with citations to the source documents.

The same query path is also exposed as a REST API by the FastAPI backend on port 8000, so you can integrate I3K RAG Enterprise into your own applications. Endpoints are protected by JWT and honour the User / Super User / Admin role boundaries.

Next steps

You now have a working single-node deployment. Read the architecture overview to understand how the components fit together, or jump to deployment topologies for multi-node, backup with rclone, and production hardening.

Quickstart

Requirements

Installation (1 command)

What the script does

First login

First upload and query