Skip to content

Database Setup

This guide covers installing PostgreSQL 16 with the pgvector extension on AlmaLinux 9.

Overview

The RAG Chatbot uses two databases:

  • ragdb - Per-tenant database storing RAG data (documents, chunks, embeddings, chat history). Uses the ragchatbot schema. Each tenant can have its own ragdb on a separate server for data isolation.
  • chatdb - Shared database storing admin/widget configuration (widget settings, prompts, inquiries, tenant prompts). Uses the tenant_settings schema with tenant_id isolation.

Both require:

  • PostgreSQL 16 - Modern relational database
  • pgvector - Vector similarity search extension for AI embeddings (ragdb only)

Step 1: Install PostgreSQL 16

Add the official PostgreSQL repository:

# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql

# Install PostgreSQL 16
sudo dnf install -y postgresql16-server postgresql16-contrib postgresql16-devel

Initialize and start PostgreSQL:

# Initialize the database cluster
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb

# Enable and start PostgreSQL
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16

# Verify it's running
sudo systemctl status postgresql-16

Step 2: Install pgvector Extension

pgvector enables vector similarity search for AI embeddings:

# Install build dependencies
sudo dnf install -y gcc make git

# Clone pgvector
cd /tmp
git clone --branch v0.8.1 https://github.com/pgvector/pgvector.git
cd pgvector

# Build and install
export PATH=/usr/pgsql-16/bin:$PATH
make
sudo make install

Verify the installation:

ls /usr/pgsql-16/lib/vector.so
# Should show the file exists

Step 3: Configure PostgreSQL

Edit the PostgreSQL configuration for better performance:

sudo nano /var/lib/pgsql/16/data/postgresql.conf

Recommended settings:

# Memory settings (adjust based on available RAM)
shared_buffers = 2GB              # 25% of total RAM
effective_cache_size = 6GB        # 75% of total RAM
work_mem = 256MB
maintenance_work_mem = 512MB

# Connection settings
max_connections = 100
listen_addresses = 'localhost'    # Or '*' for remote access

# Performance
random_page_cost = 1.1            # For SSD storage
effective_io_concurrency = 200    # For SSD storage

# Logging (optional)
log_statement = 'ddl'
log_min_duration_statement = 1000  # Log queries over 1 second

Configure client authentication:

sudo nano /var/lib/pgsql/16/data/pg_hba.conf

Add or modify:

# Local connections
local   all             all                                     peer
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             ::1/128                 scram-sha-256

# Allow from web server (if on different host)
# host    ragdb     raguser         192.168.1.0/24          scram-sha-256
# host    chatdb    chatuser        192.168.1.0/24          scram-sha-256

Restart PostgreSQL:

sudo systemctl restart postgresql-16

Step 4: Create Database Users and Databases

Connect as the postgres superuser:

sudo -u postgres psql

Create the RAG database (per-tenant):

-- Create the RAG database user
CREATE USER raguser WITH PASSWORD 'your_secure_password';

-- Create the RAG database
CREATE DATABASE ragdb OWNER raguser;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ragdb TO raguser;

Create the shared admin database:

-- Create the admin database user
CREATE USER chatuser WITH PASSWORD 'your_secure_password';

-- Create the admin database
CREATE DATABASE chatdb OWNER chatuser;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE chatdb TO chatuser;

-- Exit psql
\q

Security Note

Use strong, unique passwords for each database user. Never use the example passwords in production.

Step 5: Enable pgvector Extension

Connect to the ragdb database:

sudo -u postgres psql -d ragdb

Enable the required extensions:

-- Enable pgvector for vector operations
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify the extension
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';

-- Exit psql
\q

Note

pgvector is only needed in ragdb. The chatdb database does not use vector operations.

Step 6: Import the Schemas

The application includes schema files for both databases:

RAG Database (ragdb)

sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/ragdb.sql

Tables (schema: ragchatbot):

Table Purpose
documents Stores uploaded document metadata
chunks Contains text chunks from documents
embeddings Vector embeddings for semantic search
chat_sessions User chat session data
chat_messages Individual chat messages

Key Features:

  • Vector Search: Uses pgvector's vector(1536) type for OpenAI embeddings
  • Full-Text Search: PostgreSQL tsvector for keyword matching
  • HNSW Index: Fast approximate nearest neighbor search
  • Cascading Deletes: Automatic cleanup when documents are removed

Admin Database (chatdb)

sudo -u postgres psql -d chatdb -f /var/www/chatbot/database/chatdb.sql

Tables (schema: tenant_settings):

Table Purpose
widget_settings Widget appearance and behavior config per tenant
widget_prompts Quick action prompts displayed in the chat widget
widget_inquiries Pre-chat inquiry form configuration
tenant_prompts Customizable LLM system prompts per tenant

All tables include a tenant_id column for multi-tenant isolation.

Verify Tables

# Check ragdb tables
sudo -u postgres psql -d ragdb -c "\dt ragchatbot.*"

Expected output:

              List of relations
   Schema    |     Name      | Type  |  Owner
-------------+---------------+-------+---------
 ragchatbot  | chat_messages | table | raguser
 ragchatbot  | chat_sessions | table | raguser
 ragchatbot  | chunks        | table | raguser
 ragchatbot  | documents     | table | raguser
 ragchatbot  | embeddings    | table | raguser
# Check chatdb tables
sudo -u postgres psql -d chatdb -c "\dt tenant_settings.*"

Expected output:

                 List of relations
     Schema      |       Name       | Type  |  Owner
-----------------+------------------+-------+----------
 tenant_settings | tenant_prompts   | table | chatuser
 tenant_settings | widget_inquiries | table | chatuser
 tenant_settings | widget_prompts   | table | chatuser
 tenant_settings | widget_settings  | table | chatuser

Step 7: Test Database Connections

Test connectivity from the command line:

# Test ragdb
psql -h localhost -U raguser -d ragdb -c "SELECT 1;"

# Test chatdb
psql -h localhost -U chatuser -d chatdb -c "SELECT 1;"

Test from PHP:

# Test ragdb
php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=ragdb', 'raguser', 'your_password');
echo 'ragdb: Connected successfully!\n';
"

# Test chatdb
php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=chatdb', 'chatuser', 'your_password');
echo 'chatdb: Connected successfully!\n';
"

Database Maintenance

Backup

Create regular backups:

# ragdb backup (per-tenant RAG data)
pg_dump -h localhost -U raguser -d ragdb > ragdb_backup_$(date +%Y%m%d).sql
pg_dump -h localhost -U raguser -d ragdb | gzip > ragdb_backup_$(date +%Y%m%d).sql.gz

# chatdb backup (shared admin/widget config)
pg_dump -h localhost -U chatuser -d chatdb > chatdb_backup_$(date +%Y%m%d).sql
pg_dump -h localhost -U chatuser -d chatdb | gzip > chatdb_backup_$(date +%Y%m%d).sql.gz

Restore

Restore from backup:

# ragdb
psql -h localhost -U raguser -d ragdb < ragdb_backup_20260201.sql

# chatdb
psql -h localhost -U chatuser -d chatdb < chatdb_backup_20260201.sql

Vacuum and Analyze

Run periodically for optimal performance:

sudo -u postgres psql -d ragdb -c "VACUUM ANALYZE;"

Reindex

Rebuild indexes if search becomes slow:

sudo -u postgres psql -d ragdb -c "REINDEX DATABASE ragdb;"

Monitoring

Check Database Size

SELECT pg_size_pretty(pg_database_size('ragdb'));
SELECT pg_size_pretty(pg_database_size('chatdb'));

Check Table Sizes

Run in each database (ragdb or chatdb):

SELECT
    schemaname || '.' || relname as table,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Check Index Usage

SELECT
    indexrelname as index,
    idx_scan as scans,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Troubleshooting

Connection Refused

Check if PostgreSQL is running:

sudo systemctl status postgresql-16

Verify it's listening:

ss -tlnp | grep 5432

Authentication Failed

Check pg_hba.conf settings and ensure both users exist:

sudo -u postgres psql -c "SELECT usename FROM pg_user WHERE usename IN ('raguser', 'chatuser');"

Extension Not Found

Ensure pgvector was installed correctly:

ls -la /usr/pgsql-16/lib/vector.so

If missing, rebuild and reinstall pgvector.

Slow Queries

Check for missing indexes:

EXPLAIN ANALYZE SELECT * FROM embeddings
ORDER BY embedding <=> '[0.1,0.2,...]'::vector LIMIT 5;

If the HNSW index isn't being used, recreate it:

DROP INDEX IF EXISTS embeddings_embedding_idx;
CREATE INDEX embeddings_embedding_idx ON embeddings
USING hnsw (embedding vector_cosine_ops);

Next Steps

  1. Configure the application with your database credentials
  2. Deploy the Docling service for document processing
  3. Start using the chat interface