Quickstart

I3K RAG Enterprise is a self-hosted RAG platform that runs 100% on your own infrastructure. No US cloud dependencies, no data leaving your perimeter, AGPL-3.0 licensed. The one-command installer brings up the full stack — Qdrant, Ollama, the FastAPI backend, the React frontend and the OCR pipeline — in roughly one hour, or about 15 minutes on a fast connection.

This guide walks you from a clean Ubuntu host to your first RAG query.

Requirements

  • OS: Ubuntu 20.04+ (22.04 recommended)
  • RAM: 16 GB minimum, 32 GB recommended
  • Storage: 50 GB or more
  • GPU: NVIDIA CUDA (8–16 GB VRAM recommended), AMD ROCm, or CPU-only
  • Network: 80+ Mbit/s recommended for the initial model pull

GPU vs CPU

GPU is strongly recommended for usable end-to-end latency on Qwen3:14b-q4_K_M. CPU-only works for development and small corpora; expect higher response times.

Installation (1 command)

Clone the repository and run the installer:

git clone https://github.com/I3K-IT/RAG-Enterprise.git
cd RAG-Enterprise
./install.sh

The installer is interactive. It asks you two things:

  1. GPU type — NVIDIA, AMD or CPU. It configures Ollama and the embedding runtime accordingly.
  2. LLM modelQwen3:14b-q4_K_M (default, best quality on 16 GB VRAM) or Mistral 7B Q4 (lighter, fits on 8 GB).

The script then runs unattended. Total time is around one hour on a typical connection, ~15 minutes on a fast link.

What the script does

  • Pulls and configures Qdrant as the vector store (port 6333)
  • Installs Ollama (port 11434) and downloads the chosen LLM
  • Sets up the FastAPI backend (port 8000) with the I3K retrieval orchestrator pipeline
  • Builds and serves the React + Vite frontend (port 3000)
  • Initializes the SQLite user database with JWT auth and the three roles (User, Super User, Admin)
  • Installs Apache Tika and Tesseract for document parsing and OCR
  • Downloads the BAAI/bge-m3 embedding model (29 languages)

When it finishes, the installer prints the admin credentials it generated. Save them.

First login

Open the frontend:

http://localhost:3000

Log in with the admin account printed by the installer. From the left sidebar, go to Documents and upload your first file. Supported formats: PDF (with OCR for scanned pages), DOCX/DOC, PPTX/PPT, XLSX/XLS, TXT, MD, ODT, RTF, HTML, XML.

First upload and query

On upload, the backend pipeline runs:

  1. Extraction — Apache Tika parses the file; Tesseract handles scanned PDFs via OCR.
  2. Chunking — the text is split into semantic chunks.
  3. Embedding — each chunk is encoded with BAAI/bge-m3 (multilingual, 29 languages).
  4. Indexing — vectors are written to Qdrant with their metadata.

Once indexing completes, ask a question from the chat UI. The backend retrieves the relevant chunks from Qdrant, hands them to Ollama running your chosen LLM, and returns a grounded answer with citations to the source documents.

The same query path is also exposed as a REST API by the FastAPI backend on port 8000, so you can integrate I3K RAG Enterprise into your own applications. Endpoints are protected by JWT and honour the User / Super User / Admin role boundaries.

Next steps

You now have a working single-node deployment. Read the architecture overview to understand how the components fit together, or jump to deployment topologies for multi-node, backup with rclone, and production hardening.

Quickstart — I3K RAG Enterprise — I3K RAG Enterprise