RAG Chatbot¶
Welcome to the RAG Chatbot documentation. This guide covers installation, configuration, and usage of the Retrieval-Augmented Generation (RAG) chatbot system.
Overview¶
RAG Chatbot is an intelligent document-based chat system that allows users to ask questions and receive accurate answers based on your uploaded content. It combines the power of large language models (LLMs) with a searchable knowledge base built from your documents.
Key Features¶
- Intelligent Q&A - Ask questions in natural language and get answers sourced from your documents
- Multiple LLM Providers - Choose between OpenAI (GPT-5.1) or Anthropic (Claude) for chat responses
- Document Support - Upload PDFs, Word documents, PowerPoints, spreadsheets, images, and audio files
- Media Import - Import transcripts from YouTube, Vimeo, SoundCloud, and 1000+ other platforms
- Hybrid Search - Combines semantic (AI) search with keyword matching for accurate results
- Conversation Memory - Maintains context across long conversations with automatic summarization
- Embeddable Widget - Drop-in chat widget for any website
How It Works¶
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Upload │────▶│ Process │────▶│ Store │
│ Documents │ │ & Chunk │ │ Vectors │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Answer │◀────│ Retrieve │◀────│ Query │
│ User │ │ Context │ │ Database │
└─────────────┘ └──────────────┘ └─────────────┘
- Upload - Documents are uploaded through the web interface or API
- Process - The Docling service parses documents and extracts text
- Store - Text is split into chunks and stored with vector embeddings
- Query - User questions are converted to vectors and matched against stored content
- Retrieve - The most relevant chunks are retrieved using hybrid search
- Answer - An LLM generates a response based on the retrieved context
System Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8 GB | 16+ GB |
| Storage | 50 GB SSD | 200+ GB SSD |
| GPU | None (CPU mode) | NVIDIA with 8+ GB VRAM |
Supported Platforms¶
- Operating System: AlmaLinux 9, RHEL 9, Rocky Linux 9, Ubuntu 22.04+
- Cloud: AWS EC2, Google Cloud Compute Engine, Azure VMs
- Container: Docker, Podman
Quick Start¶
For a quick installation, follow these steps:
- Install the system requirements on AlmaLinux 9
- Set up PostgreSQL 16 with pgvector
- Configure the application
- Deploy the Docling service
- Start using the chat interface
Architecture¶
The system consists of three main components:
PHP Application¶
The main web application handles:
- Chat API endpoints
- Document upload and management
- User session management
- Communication with LLM providers
PostgreSQL Database¶
Stores all persistent data:
- Document metadata and content chunks
- Vector embeddings for semantic search
- Chat session history
- Full-text search indexes
Docling Service¶
A Python microservice that handles:
- Document parsing (PDF, DOCX, images, etc.)
- OCR for scanned documents
- Table and layout extraction
- Media downloading and transcription
Support¶
If you encounter issues:
- Check the Troubleshooting guide
- Review application logs in
/var/www/chatbot/logs/ - Use the Debug RAG endpoint to diagnose search issues
Next Steps¶
Ready to get started? Head to the Installation Guide to set up the system on AlmaLinux 9.