Skip to content

RAG Chatbot

Welcome to the RAG Chatbot documentation. This guide covers installation, configuration, and usage of the Retrieval-Augmented Generation (RAG) chatbot system.

Overview

RAG Chatbot is an intelligent document-based chat system that allows users to ask questions and receive accurate answers based on your uploaded content. It combines the power of large language models (LLMs) with a searchable knowledge base built from your documents.

Key Features

  • Intelligent Q&A - Ask questions in natural language and get answers sourced from your documents
  • Multiple LLM Providers - Choose between OpenAI (GPT-5.1) or Anthropic (Claude) for chat responses
  • Document Support - Upload PDFs, Word documents, PowerPoints, spreadsheets, images, and audio files
  • Media Import - Import transcripts from YouTube, Vimeo, SoundCloud, and 1000+ other platforms
  • Hybrid Search - Combines semantic (AI) search with keyword matching for accurate results
  • Conversation Memory - Maintains context across long conversations with automatic summarization
  • Embeddable Widget - Drop-in chat widget for any website

How It Works

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Upload    │────▶│   Process    │────▶│   Store     │
│  Documents  │     │  & Chunk     │     │  Vectors    │
└─────────────┘     └──────────────┘     └─────────────┘
┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Answer    │◀────│   Retrieve   │◀────│   Query     │
│   User      │     │   Context    │     │   Database  │
└─────────────┘     └──────────────┘     └─────────────┘
  1. Upload - Documents are uploaded through the web interface or API
  2. Process - The Docling service parses documents and extracts text
  3. Store - Text is split into chunks and stored with vector embeddings
  4. Query - User questions are converted to vectors and matched against stored content
  5. Retrieve - The most relevant chunks are retrieved using hybrid search
  6. Answer - An LLM generates a response based on the retrieved context

System Requirements

Component Minimum Recommended
CPU 4 cores 8+ cores
RAM 8 GB 16+ GB
Storage 50 GB SSD 200+ GB SSD
GPU None (CPU mode) NVIDIA with 8+ GB VRAM

Supported Platforms

  • Operating System: AlmaLinux 9, RHEL 9, Rocky Linux 9, Ubuntu 22.04+
  • Cloud: AWS EC2, Google Cloud Compute Engine, Azure VMs
  • Container: Docker, Podman

Quick Start

For a quick installation, follow these steps:

  1. Install the system requirements on AlmaLinux 9
  2. Set up PostgreSQL 16 with pgvector
  3. Configure the application
  4. Deploy the Docling service
  5. Start using the chat interface

Architecture

The system consists of three main components:

PHP Application

The main web application handles:

  • Chat API endpoints
  • Document upload and management
  • User session management
  • Communication with LLM providers

PostgreSQL Database

Stores all persistent data:

  • Document metadata and content chunks
  • Vector embeddings for semantic search
  • Chat session history
  • Full-text search indexes

Docling Service

A Python microservice that handles:

  • Document parsing (PDF, DOCX, images, etc.)
  • OCR for scanned documents
  • Table and layout extraction
  • Media downloading and transcription

Support

If you encounter issues:

  1. Check the Troubleshooting guide
  2. Review application logs in /var/www/chatbot/logs/
  3. Use the Debug RAG endpoint to diagnose search issues

Next Steps

Ready to get started? Head to the Installation Guide to set up the system on AlmaLinux 9.