Skip to content

Database Setup

This guide covers installing PostgreSQL 16 with the pgvector extension on AlmaLinux 9.

Overview

The RAG Chatbot requires:

  • PostgreSQL 16 - Modern relational database
  • pgvector - Vector similarity search extension for AI embeddings

Step 1: Install PostgreSQL 16

Add the official PostgreSQL repository:

# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql

# Install PostgreSQL 16
sudo dnf install -y postgresql16-server postgresql16-contrib postgresql16-devel

Initialize and start PostgreSQL:

# Initialize the database cluster
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb

# Enable and start PostgreSQL
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16

# Verify it's running
sudo systemctl status postgresql-16

Step 2: Install pgvector Extension

pgvector enables vector similarity search for AI embeddings:

# Install build dependencies
sudo dnf install -y gcc make git

# Clone pgvector
cd /tmp
git clone --branch v0.8.1 https://github.com/pgvector/pgvector.git
cd pgvector

# Build and install
export PATH=/usr/pgsql-16/bin:$PATH
make
sudo make install

Verify the installation:

ls /usr/pgsql-16/lib/vector.so
# Should show the file exists

Step 3: Configure PostgreSQL

Edit the PostgreSQL configuration for better performance:

sudo nano /var/lib/pgsql/16/data/postgresql.conf

Recommended settings:

# Memory settings (adjust based on available RAM)
shared_buffers = 2GB              # 25% of total RAM
effective_cache_size = 6GB        # 75% of total RAM
work_mem = 256MB
maintenance_work_mem = 512MB

# Connection settings
max_connections = 100
listen_addresses = 'localhost'    # Or '*' for remote access

# Performance
random_page_cost = 1.1            # For SSD storage
effective_io_concurrency = 200    # For SSD storage

# Logging (optional)
log_statement = 'ddl'
log_min_duration_statement = 1000  # Log queries over 1 second

Configure client authentication:

sudo nano /var/lib/pgsql/16/data/pg_hba.conf

Add or modify:

# Local connections
local   all             all                                     peer
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             ::1/128                 scram-sha-256

# Allow from web server (if on different host)
# host    ragdb     raguser         192.168.1.0/24          scram-sha-256

Restart PostgreSQL:

sudo systemctl restart postgresql-16

Step 4: Create Database User

Connect as the postgres superuser:

sudo -u postgres psql

Create the application user and database:

-- Create the database user
CREATE USER raguser WITH PASSWORD 'your_secure_password';

-- Create the database
CREATE DATABASE ragdb OWNER raguser;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ragdb TO raguser;

-- Exit psql
\q

Security Note

Use a strong password for the database user. Never use the example passwords in production.

Step 5: Enable pgvector Extension

Connect to the new database:

sudo -u postgres psql -d ragdb

Enable the required extensions:

-- Enable pgvector for vector operations
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify the extension
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';

-- Exit psql
\q

Step 6: Import the Schema

The application includes a schema file that creates all required tables:

# Import the schema
sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/schema.sql

Schema Overview

The schema creates the following tables:

Table Purpose
documents Stores uploaded document metadata
chunks Contains text chunks from documents
embeddings Vector embeddings for semantic search
chat_sessions User chat session data
chat_messages Individual chat messages

Key Features

  • Vector Search: Uses pgvector's vector(1536) type for OpenAI embeddings
  • Full-Text Search: PostgreSQL tsvector for keyword matching
  • HNSW Index: Fast approximate nearest neighbor search
  • Cascading Deletes: Automatic cleanup when documents are removed

Step 7: Run Migrations

Apply any database migrations:

# Migration 1: Add conversation summarization support
sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/migrations/001_add_conversation_summary.sql

# Migration 2: Add full-text search
sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/migrations/002_add_fulltext_search.sql

Verify the tables exist:

sudo -u postgres psql -d ragdb -c "\dt"

Expected output:

              List of relations
 Schema |     Name      | Type  |  Owner
--------+---------------+-------+---------
 public | chat_messages | table | raguser
 public | chat_sessions | table | raguser
 public | chunks        | table | raguser
 public | documents     | table | raguser
 public | embeddings    | table | raguser

Step 8: Test Database Connection

Test connectivity from the command line:

psql -h localhost -U raguser -d ragdb -c "SELECT 1;"

Test from PHP:

php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=ragdb', 'raguser', 'your_password');
echo 'Connected successfully!';
"

Database Maintenance

Backup

Create regular backups:

# Full backup
pg_dump -h localhost -U raguser -d ragdb > backup_$(date +%Y%m%d).sql

# Compressed backup
pg_dump -h localhost -U raguser -d ragdb | gzip > backup_$(date +%Y%m%d).sql.gz

Restore

Restore from backup:

# From SQL file
psql -h localhost -U raguser -d ragdb < backup_20260201.sql

# From compressed file
gunzip -c backup_20260201.sql.gz | psql -h localhost -U raguser -d ragdb

Vacuum and Analyze

Run periodically for optimal performance:

sudo -u postgres psql -d ragdb -c "VACUUM ANALYZE;"

Reindex

Rebuild indexes if search becomes slow:

sudo -u postgres psql -d ragdb -c "REINDEX DATABASE ragdb;"

Monitoring

Check Database Size

SELECT pg_size_pretty(pg_database_size('ragdb'));

Check Table Sizes

SELECT
    relname as table,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Check Index Usage

SELECT
    indexrelname as index,
    idx_scan as scans,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Troubleshooting

Connection Refused

Check if PostgreSQL is running:

sudo systemctl status postgresql-16

Verify it's listening:

ss -tlnp | grep 5432

Authentication Failed

Check pg_hba.conf settings and ensure the user exists:

sudo -u postgres psql -c "SELECT usename FROM pg_user;"

Extension Not Found

Ensure pgvector was installed correctly:

ls -la /usr/pgsql-16/lib/vector.so

If missing, rebuild and reinstall pgvector.

Slow Queries

Check for missing indexes:

EXPLAIN ANALYZE SELECT * FROM embeddings
ORDER BY embedding <=> '[0.1,0.2,...]'::vector LIMIT 5;

If the HNSW index isn't being used, recreate it:

DROP INDEX IF EXISTS embeddings_embedding_idx;
CREATE INDEX embeddings_embedding_idx ON embeddings
USING hnsw (embedding vector_cosine_ops);

Next Steps

  1. Configure the application with your database credentials
  2. Deploy the Docling service for document processing
  3. Start using the chat interface