Database Setup¶
This guide covers installing PostgreSQL 16 with the pgvector extension on AlmaLinux 9.
Overview¶
The RAG Chatbot uses two databases:
- ragdb - Per-tenant database storing RAG data (documents, chunks, embeddings, chat history). Uses the
ragchatbotschema. Each tenant can have its own ragdb on a separate server for data isolation. - chatdb - Shared database storing admin/widget configuration (widget settings, prompts, inquiries, tenant prompts). Uses the
tenant_settingsschema withtenant_idisolation.
Both require:
- PostgreSQL 16 - Modern relational database
- pgvector - Vector similarity search extension for AI embeddings (ragdb only)
Step 1: Install PostgreSQL 16¶
Add the official PostgreSQL repository:
# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 16
sudo dnf install -y postgresql16-server postgresql16-contrib postgresql16-devel
Initialize and start PostgreSQL:
# Initialize the database cluster
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
# Enable and start PostgreSQL
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16
# Verify it's running
sudo systemctl status postgresql-16
Step 2: Install pgvector Extension¶
pgvector enables vector similarity search for AI embeddings:
# Install build dependencies
sudo dnf install -y gcc make git
# Clone pgvector
cd /tmp
git clone --branch v0.8.1 https://github.com/pgvector/pgvector.git
cd pgvector
# Build and install
export PATH=/usr/pgsql-16/bin:$PATH
make
sudo make install
Verify the installation:
Step 3: Configure PostgreSQL¶
Edit the PostgreSQL configuration for better performance:
Recommended settings:
# Memory settings (adjust based on available RAM)
shared_buffers = 2GB # 25% of total RAM
effective_cache_size = 6GB # 75% of total RAM
work_mem = 256MB
maintenance_work_mem = 512MB
# Connection settings
max_connections = 100
listen_addresses = 'localhost' # Or '*' for remote access
# Performance
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200 # For SSD storage
# Logging (optional)
log_statement = 'ddl'
log_min_duration_statement = 1000 # Log queries over 1 second
Configure client authentication:
Add or modify:
# Local connections
local all all peer
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256
# Allow from web server (if on different host)
# host ragdb raguser 192.168.1.0/24 scram-sha-256
# host chatdb chatuser 192.168.1.0/24 scram-sha-256
Restart PostgreSQL:
Step 4: Create Database Users and Databases¶
Connect as the postgres superuser:
Create the RAG database (per-tenant):
-- Create the RAG database user
CREATE USER raguser WITH PASSWORD 'your_secure_password';
-- Create the RAG database
CREATE DATABASE ragdb OWNER raguser;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ragdb TO raguser;
Create the shared admin database:
-- Create the admin database user
CREATE USER chatuser WITH PASSWORD 'your_secure_password';
-- Create the admin database
CREATE DATABASE chatdb OWNER chatuser;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE chatdb TO chatuser;
-- Exit psql
\q
Security Note
Use strong, unique passwords for each database user. Never use the example passwords in production.
Step 5: Enable pgvector Extension¶
Connect to the ragdb database:
Enable the required extensions:
-- Enable pgvector for vector operations
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify the extension
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
-- Exit psql
\q
Note
pgvector is only needed in ragdb. The chatdb database does not use vector operations.
Step 6: Import the Schemas¶
The application includes schema files for both databases:
RAG Database (ragdb)¶
Tables (schema: ragchatbot):
| Table | Purpose |
|---|---|
documents | Stores uploaded document metadata |
chunks | Contains text chunks from documents |
embeddings | Vector embeddings for semantic search |
chat_sessions | User chat session data |
chat_messages | Individual chat messages |
Key Features:
- Vector Search: Uses pgvector's
vector(1536)type for OpenAI embeddings - Full-Text Search: PostgreSQL
tsvectorfor keyword matching - HNSW Index: Fast approximate nearest neighbor search
- Cascading Deletes: Automatic cleanup when documents are removed
Admin Database (chatdb)¶
Tables (schema: tenant_settings):
| Table | Purpose |
|---|---|
widget_settings | Widget appearance and behavior config per tenant |
widget_prompts | Quick action prompts displayed in the chat widget |
widget_inquiries | Pre-chat inquiry form configuration |
tenant_prompts | Customizable LLM system prompts per tenant |
All tables include a tenant_id column for multi-tenant isolation.
Verify Tables¶
Expected output:
List of relations
Schema | Name | Type | Owner
-------------+---------------+-------+---------
ragchatbot | chat_messages | table | raguser
ragchatbot | chat_sessions | table | raguser
ragchatbot | chunks | table | raguser
ragchatbot | documents | table | raguser
ragchatbot | embeddings | table | raguser
Expected output:
List of relations
Schema | Name | Type | Owner
-----------------+------------------+-------+----------
tenant_settings | tenant_prompts | table | chatuser
tenant_settings | widget_inquiries | table | chatuser
tenant_settings | widget_prompts | table | chatuser
tenant_settings | widget_settings | table | chatuser
Step 7: Test Database Connections¶
Test connectivity from the command line:
# Test ragdb
psql -h localhost -U raguser -d ragdb -c "SELECT 1;"
# Test chatdb
psql -h localhost -U chatuser -d chatdb -c "SELECT 1;"
Test from PHP:
# Test ragdb
php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=ragdb', 'raguser', 'your_password');
echo 'ragdb: Connected successfully!\n';
"
# Test chatdb
php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=chatdb', 'chatuser', 'your_password');
echo 'chatdb: Connected successfully!\n';
"
Database Maintenance¶
Backup¶
Create regular backups:
# ragdb backup (per-tenant RAG data)
pg_dump -h localhost -U raguser -d ragdb > ragdb_backup_$(date +%Y%m%d).sql
pg_dump -h localhost -U raguser -d ragdb | gzip > ragdb_backup_$(date +%Y%m%d).sql.gz
# chatdb backup (shared admin/widget config)
pg_dump -h localhost -U chatuser -d chatdb > chatdb_backup_$(date +%Y%m%d).sql
pg_dump -h localhost -U chatuser -d chatdb | gzip > chatdb_backup_$(date +%Y%m%d).sql.gz
Restore¶
Restore from backup:
# ragdb
psql -h localhost -U raguser -d ragdb < ragdb_backup_20260201.sql
# chatdb
psql -h localhost -U chatuser -d chatdb < chatdb_backup_20260201.sql
Vacuum and Analyze¶
Run periodically for optimal performance:
Reindex¶
Rebuild indexes if search becomes slow:
Monitoring¶
Check Database Size¶
SELECT pg_size_pretty(pg_database_size('ragdb'));
SELECT pg_size_pretty(pg_database_size('chatdb'));
Check Table Sizes¶
Run in each database (ragdb or chatdb):
SELECT
schemaname || '.' || relname as table,
pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
Check Index Usage¶
SELECT
indexrelname as index,
idx_scan as scans,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
Troubleshooting¶
Connection Refused¶
Check if PostgreSQL is running:
Verify it's listening:
Authentication Failed¶
Check pg_hba.conf settings and ensure both users exist:
Extension Not Found¶
Ensure pgvector was installed correctly:
If missing, rebuild and reinstall pgvector.
Slow Queries¶
Check for missing indexes:
If the HNSW index isn't being used, recreate it:
DROP INDEX IF EXISTS embeddings_embedding_idx;
CREATE INDEX embeddings_embedding_idx ON embeddings
USING hnsw (embedding vector_cosine_ops);
Next Steps¶
- Configure the application with your database credentials
- Deploy the Docling service for document processing
- Start using the chat interface