Database Setup¶
This guide covers installing PostgreSQL 16 with the pgvector extension on AlmaLinux 9.
Overview¶
The RAG Chatbot requires:
- PostgreSQL 16 - Modern relational database
- pgvector - Vector similarity search extension for AI embeddings
Step 1: Install PostgreSQL 16¶
Add the official PostgreSQL repository:
# Install the repository RPM
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
# Disable the built-in PostgreSQL module
sudo dnf -qy module disable postgresql
# Install PostgreSQL 16
sudo dnf install -y postgresql16-server postgresql16-contrib postgresql16-devel
Initialize and start PostgreSQL:
# Initialize the database cluster
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb
# Enable and start PostgreSQL
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16
# Verify it's running
sudo systemctl status postgresql-16
Step 2: Install pgvector Extension¶
pgvector enables vector similarity search for AI embeddings:
# Install build dependencies
sudo dnf install -y gcc make git
# Clone pgvector
cd /tmp
git clone --branch v0.8.1 https://github.com/pgvector/pgvector.git
cd pgvector
# Build and install
export PATH=/usr/pgsql-16/bin:$PATH
make
sudo make install
Verify the installation:
Step 3: Configure PostgreSQL¶
Edit the PostgreSQL configuration for better performance:
Recommended settings:
# Memory settings (adjust based on available RAM)
shared_buffers = 2GB # 25% of total RAM
effective_cache_size = 6GB # 75% of total RAM
work_mem = 256MB
maintenance_work_mem = 512MB
# Connection settings
max_connections = 100
listen_addresses = 'localhost' # Or '*' for remote access
# Performance
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200 # For SSD storage
# Logging (optional)
log_statement = 'ddl'
log_min_duration_statement = 1000 # Log queries over 1 second
Configure client authentication:
Add or modify:
# Local connections
local all all peer
host all all 127.0.0.1/32 scram-sha-256
host all all ::1/128 scram-sha-256
# Allow from web server (if on different host)
# host ragdb raguser 192.168.1.0/24 scram-sha-256
Restart PostgreSQL:
Step 4: Create Database User¶
Connect as the postgres superuser:
Create the application user and database:
-- Create the database user
CREATE USER raguser WITH PASSWORD 'your_secure_password';
-- Create the database
CREATE DATABASE ragdb OWNER raguser;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE ragdb TO raguser;
-- Exit psql
\q
Security Note
Use a strong password for the database user. Never use the example passwords in production.
Step 5: Enable pgvector Extension¶
Connect to the new database:
Enable the required extensions:
-- Enable pgvector for vector operations
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify the extension
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
-- Exit psql
\q
Step 6: Import the Schema¶
The application includes a schema file that creates all required tables:
Schema Overview¶
The schema creates the following tables:
| Table | Purpose |
|---|---|
documents | Stores uploaded document metadata |
chunks | Contains text chunks from documents |
embeddings | Vector embeddings for semantic search |
chat_sessions | User chat session data |
chat_messages | Individual chat messages |
Key Features¶
- Vector Search: Uses pgvector's
vector(1536)type for OpenAI embeddings - Full-Text Search: PostgreSQL
tsvectorfor keyword matching - HNSW Index: Fast approximate nearest neighbor search
- Cascading Deletes: Automatic cleanup when documents are removed
Step 7: Run Migrations¶
Apply any database migrations:
# Migration 1: Add conversation summarization support
sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/migrations/001_add_conversation_summary.sql
# Migration 2: Add full-text search
sudo -u postgres psql -d ragdb -f /var/www/chatbot/database/migrations/002_add_fulltext_search.sql
Verify the tables exist:
Expected output:
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+---------
public | chat_messages | table | raguser
public | chat_sessions | table | raguser
public | chunks | table | raguser
public | documents | table | raguser
public | embeddings | table | raguser
Step 8: Test Database Connection¶
Test connectivity from the command line:
Test from PHP:
php -r "
\$pdo = new PDO('pgsql:host=localhost;dbname=ragdb', 'raguser', 'your_password');
echo 'Connected successfully!';
"
Database Maintenance¶
Backup¶
Create regular backups:
# Full backup
pg_dump -h localhost -U raguser -d ragdb > backup_$(date +%Y%m%d).sql
# Compressed backup
pg_dump -h localhost -U raguser -d ragdb | gzip > backup_$(date +%Y%m%d).sql.gz
Restore¶
Restore from backup:
# From SQL file
psql -h localhost -U raguser -d ragdb < backup_20260201.sql
# From compressed file
gunzip -c backup_20260201.sql.gz | psql -h localhost -U raguser -d ragdb
Vacuum and Analyze¶
Run periodically for optimal performance:
Reindex¶
Rebuild indexes if search becomes slow:
Monitoring¶
Check Database Size¶
Check Table Sizes¶
SELECT
relname as table,
pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;
Check Index Usage¶
SELECT
indexrelname as index,
idx_scan as scans,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
Troubleshooting¶
Connection Refused¶
Check if PostgreSQL is running:
Verify it's listening:
Authentication Failed¶
Check pg_hba.conf settings and ensure the user exists:
Extension Not Found¶
Ensure pgvector was installed correctly:
If missing, rebuild and reinstall pgvector.
Slow Queries¶
Check for missing indexes:
If the HNSW index isn't being used, recreate it:
DROP INDEX IF EXISTS embeddings_embedding_idx;
CREATE INDEX embeddings_embedding_idx ON embeddings
USING hnsw (embedding vector_cosine_ops);
Next Steps¶
- Configure the application with your database credentials
- Deploy the Docling service for document processing
- Start using the chat interface