Security & Rate Limiting¶

This guide covers security features, rate limiting, and hardening recommendations for the RAG Chatbot.

Security Overview¶

The RAG Chatbot includes multiple security layers:

Layer	Protection
Tenant Isolation	Separate databases and configurations per tenant
Input Validation	XSS, SQL injection, prompt injection
File Upload	Path traversal, malicious files, code injection
URL Validation	SSRF attacks, internal network access
Rate Limiting	DoS attacks, brute force
Authentication	API keys, HTTP Basic Auth

Multi-Tenant Security¶

Data Isolation¶

Each tenant has complete data isolation:

Separate Databases - Each tenant's data is stored in its own PostgreSQL database
Separate Configurations - Each tenant has its own .env file with unique API keys
No Cross-Tenant Access - Requests can only access data for the specified tenant

API Key Management¶

Each tenant has unique API keys:

Key	Purpose
`ADMIN_API_KEY`	Admin dashboard and management endpoints
`CHAT_API_KEY`	Chat endpoints (if authentication required)
`UPLOAD_API_KEY`	Upload and media-info endpoint protection (required)

Generate secure keys:

openssl rand -base64 24

Regenerate keys for a tenant:

php cli/tenant.php regenerate-keys my-tenant

IP Blocklist¶

The system maintains a shared IP blocklist for tenant resolution abuse:

After 10 failed attempts to resolve a tenant, the IP is blocked for 15 minutes
Blocklist data is stored in /var/www/chatbot/data/ip_blocklist/
This prevents brute-force tenant enumeration

Tenant Configuration Security¶

Protect tenant .env files:

# Set proper ownership (web server user)
sudo chown apache:apache /var/www/chatbot/tenants/*/
sudo chown apache:apache /var/www/chatbot/tenants/*/.env

# Restrict permissions
sudo chmod 750 /var/www/chatbot/tenants/*/
sudo chmod 640 /var/www/chatbot/tenants/*/.env

Rate Limiting¶

How It Works¶

Rate limiting uses a sliding window algorithm:

Each request is timestamped
Old timestamps (outside the window) are pruned
If remaining requests > 0, the request proceeds
Otherwise, HTTP 429 is returned with Retry-After header

Endpoint Limits¶

Endpoint	Limit	Window	Identifier
`/chat`	30 requests	60 seconds	Session ID + IP
`/customer-chat`	30 requests	60 seconds	Session ID + IP
`/upload`	30 requests	60 seconds	IP address
`/media-info`	30 requests	60 seconds	IP address
`/widget-config`	60 requests	60 seconds	IP address
`/admin/*`	120 requests	60 seconds	IP address
`/api-debug/logs`	5 auth attempts	60 seconds	IP address
CORS mismatch	10 requests	5 minutes	IP address

Response Headers¶

All rate-limited endpoints return these headers:

X-RateLimit-Limit: 30
X-RateLimit-Remaining: 25
X-RateLimit-Reset: 1706889600

When the limit is exceeded:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
Content-Type: application/json

{"error": "Rate limit exceeded", "retry_after": 45}

Customizing Limits¶

To adjust rate limits, modify the constants in the controller files:

Chat endpoint (src/Http/ChatController.php):

private const MAX_REQUESTS_PER_MINUTE = 30;
private const RATE_LIMIT_WINDOW = 60;
private const MAX_MESSAGE_LENGTH = 10000;

Upload endpoint (src/Http/UploadController.php):

private const MAX_REQUESTS_PER_MINUTE = 30;
private const RATE_LIMIT_WINDOW = 60;

Authentication¶

Admin API Key¶

The admin dashboard and management endpoints require the ADMIN_API_KEY:

Set the key in your tenant's .env:

ADMIN_API_KEY=your-random-admin-key

Include the key in requests:

curl -X GET "https://your-domain.com/admin/documents" \
  -H "X-Tenant-ID: my-tenant" \
  -H "X-API-Key: your-random-admin-key"

Upload API Key¶

The upload and media-info endpoints require the UPLOAD_API_KEY:

Set the key in your tenant's .env:

UPLOAD_API_KEY=your-random-secret-key

Include the key in requests:

curl -X POST "https://your-domain.com/upload?tenant=my-tenant" \
  -H "X-API-Key: your-random-secret-key" \
  -F "file=@document.pdf"

Generate a secure key:

openssl rand -base64 24

Required

If UPLOAD_API_KEY is not set, both /upload and /media-info return 401 Unauthorized.

Debug Endpoint Authentication¶

The API debug endpoint uses HTTP Basic Authentication:

Configure in your tenant's .env:

API_DEBUG_USERNAME=admin
API_DEBUG_PASSWORD=very-secure-password

Access with credentials:

curl -u admin:very-secure-password \
  https://your-domain.com/api-debug/logs

IP Whitelist¶

Restrict debug access to specific IPs:

# Single IP
API_DEBUG_IP_WHITELIST=192.168.1.100

# Multiple IPs
API_DEBUG_IP_WHITELIST=192.168.1.100,10.0.0.50

# CIDR notation
API_DEBUG_IP_WHITELIST=192.168.1.0/24,10.0.0.0/8

Lockout Protection¶

After 10 failed authentication attempts, the IP is locked out for 5 minutes:

{
  "error": "Too many failed attempts. Please try again later.",
  "retry_after": 300
}

CORS Origin Whitelist¶

Cross-origin requests are controlled via the ALLOWED_ORIGINS environment variable:

# Comma-separated list of allowed origins
ALLOWED_ORIGINS=https://your-site.com,https://app.your-site.com

Behavior:

Only origins listed in ALLOWED_ORIGINS receive CORS headers
If ALLOWED_ORIGINS is empty, no Access-Control-Allow-Origin header is sent (blocks cross-origin)
IPs that repeatedly send unrecognized origins are rate-limited (10 per 5 minutes)
After exceeding the limit, the IP receives 403 Forbidden
Access-Control-Max-Age is set to 3600 seconds (1 hour)

Tenant Prompt Security¶

Tenants can customize the LLM system prompt via the admin dashboard or API. Security measures prevent abuse:

Architecture¶

The system prompt is composed of two parts:

Security prompt (hardcoded, non-overridable) - Enforces rules like treating context as data, not following injected instructions
Tenant prompt (customizable) - Controls LLM behavior and personality, appended after security rules with a clear delimiter

Validation¶

When saving a tenant prompt:

Max length: 10,000 characters
Injection detection: Prompts containing patterns like "ignore previous instructions", "system prompt:", "you are now", etc. are rejected
Content sanitization: The ContentSanitizer class scans for known injection patterns

Input Validation¶

Message Sanitization¶

User messages are sanitized to prevent prompt injection:

Suspicious patterns are wrapped in [TEXT: ...] markers
LLM is instructed to treat context as data, not commands
Context is separated with clear delimiters

Detected patterns include:

"Ignore previous instructions"
"System prompt" references
Role-playing attempts ("You are now...")
Code injection attempts

File Upload Validation¶

Uploaded files are validated for:

Check	Protection
Extension whitelist	Block executable files
Magic byte verification	Detect disguised files
PHP code detection	Prevent code execution
Path traversal	Block `../` in filenames
Null byte injection	Block `%00` attacks
Size limits	Prevent DoS via large files

File Size Limit Implementation:

The maximum upload size is controlled by the MAX_UPLOAD_SIZE_MB environment variable (default: 100 MB).

Configuration: Set in tenant's .env file as MAX_UPLOAD_SIZE_MB=100
Read in: public/upload:615 - converts MB to bytes
Enforced by: src/Services/FileUploadValidator.php:88 - rejects oversized files

// upload.php - Line 615
$maxSize = (int) ($_ENV['MAX_UPLOAD_SIZE_MB'] ?? 100) * 1024 * 1024;

// FileUploadValidator.php - Line 88
if ($file['size'] > $maxSizeBytes) {
    return ['valid' => false, 'error' => 'File size exceeds limit'];
}

PHP Configuration

Also ensure your PHP settings allow large uploads:

upload_max_filesize = 100M
post_max_size = 105M

Blocked extensions:

php, phtml, php3, php4, php5, php7, phar
exe, bat, cmd, sh, bash, ps1
js, vbs, wsf, hta

URL Validation (SSRF Protection)¶

Media URLs are validated to prevent SSRF attacks:

Check	Protection
Protocol whitelist	Only HTTP/HTTPS allowed
Private IP blocking	No 10.x, 172.16.x, 192.168.x
Localhost blocking	No 127.x or localhost
Cloud metadata blocking	No 169.254.169.254
DNS rebinding protection	Verify resolved IP
Port restrictions	Block sensitive ports

Security Headers¶

All responses include security headers:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Referrer-Policy: strict-origin-when-cross-origin

The X-Powered-By header is removed to hide PHP version. Security headers are set early in request processing (before tenant resolution) so they appear on all responses, including error paths.

Adding More Headers¶

For additional protection, add headers in Apache:

# In httpd.conf or .htaccess
Header always set Content-Security-Policy "default-src 'self'"
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
Header always set Permissions-Policy "geolocation=(), microphone=(), camera=()"

Or in Nginx:

add_header Content-Security-Policy "default-src 'self'";
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains";
add_header Permissions-Policy "geolocation=(), microphone=(), camera=()";

Conversation Limits¶

To prevent DoS through resource exhaustion:

Limit	Value	Purpose
Message length	10,000 chars	Prevent oversized requests
Total messages	500 per session	Limit database growth
Summary length	8,000 chars	Cap memory usage

When limits are reached, users are prompted to start a new conversation.

Production Hardening¶

Disable Debug Features¶

In production, always disable:

API_DEBUG_LOGGING_ENABLED=false

Use HTTPS¶

Always use TLS in production:

# Install certbot
sudo dnf install certbot python3-certbot-apache

# Obtain certificate
sudo certbot --apache -d your-domain.com

# Auto-renewal
sudo systemctl enable certbot-renew.timer

Database Security¶

Use strong passwords - Generate with openssl rand -base64 32
Restrict access - Only allow connections from web server
Encrypt connections - Enable SSL in PostgreSQL

# In postgresql.conf
ssl = on
ssl_cert_file = '/path/to/server.crt'
ssl_key_file = '/path/to/server.key'

File Permissions¶

# Secure tenant .env files
chmod 640 /var/www/chatbot/tenants/*/.env
chown apache:apache /var/www/chatbot/tenants/*/.env

# Secure tenant directories
chmod 750 /var/www/chatbot/tenants/*/
chown apache:apache /var/www/chatbot/tenants/*/

# Restrict log directory
chmod 750 /var/www/chatbot/logs
chown apache:apache /var/www/chatbot/logs

# Make application files read-only
find /var/www/chatbot -type f -exec chmod 644 {} \;
find /var/www/chatbot -type d -exec chmod 755 {} \;

SELinux (RHEL/AlmaLinux)¶

Keep SELinux enabled and configure properly:

# Allow network connections
setsebool -P httpd_can_network_connect 1
setsebool -P httpd_can_network_connect_db 1

# Set proper contexts
semanage fcontext -a -t httpd_sys_rw_content_t "/var/www/chatbot/logs(/.*)?"
restorecon -Rv /var/www/chatbot/logs

Firewall Rules¶

Only expose necessary ports:

# AlmaLinux/RHEL (firewalld)
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload

# Ubuntu (ufw)
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

# CSF (ConfigServer Firewall)
# Edit /etc/csf/csf.conf and ensure 80,443 are in TCP_IN:
sudo nano /etc/csf/csf.conf
#   TCP_IN = "20,21,22,25,53,80,110,143,443,465,587,993,995"

# Restart CSF to apply changes
sudo csf -r

# Verify ports are open
sudo csf -p

Monitoring & Logging¶

Application Logs¶

Errors are logged to:

PHP error log: /var/log/php-fpm/www-error.log
Apache error log: /var/log/httpd/chatbot-error.log
Application logs: /var/www/chatbot/logs/

Security Monitoring¶

Monitor for:

Rate limit triggers - Excessive 429 responses
Failed auth attempts - Watch API debug auth logs
Unusual file uploads - Large files or unusual types
Error spikes - May indicate attack attempts

Log Rotation¶

Configure logrotate for application logs:

# /etc/logrotate.d/chatbot
/var/www/chatbot/logs/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    create 644 apache apache
}

Security Checklist¶

Before going to production:

[ ] HTTPS enabled with valid certificate
[ ] Strong database passwords set (ragdb + chatdb)
[ ] API_DEBUG_LOGGING_ENABLED=false
[ ] UPLOAD_API_KEY set (required for upload and media-info)
[ ] ALLOWED_ORIGINS configured for CORS whitelist
[ ] ADMIN_API_KEY and CHAT_API_KEY set with strong values
[ ] ADMIN_USERNAME and ADMIN_PASSWORD set for dashboard
[ ] SELinux enabled and configured
[ ] Firewall configured
[ ] File permissions restricted
[ ] Tenant .env files secured (chmod 640)
[ ] Security headers configured
[ ] Log rotation configured
[ ] Monitoring in place

Reporting Vulnerabilities¶

If you discover a security vulnerability:

Do not open a public issue
Email security@your-organization.com
Include steps to reproduce
Allow time for a fix before disclosure