Skip to content

API Endpoints

This reference documents all available API endpoints for the RAG Chatbot.

Base URL

All endpoints are relative to your installation path:

https://your-domain.com/chatbot/public/

Authentication

Most endpoints are public by default. Optional authentication is available:

  • Upload endpoint: Protected with X-API-Key header when UPLOAD_API_KEY is set
  • Debug endpoint: Protected with HTTP Basic Authentication

Chat Endpoint

Send messages and receive AI-powered responses based on your knowledge base.

Request

POST /chat
Content-Type: application/json

Body Parameters:

Parameter Type Required Description
session_id string Yes Unique identifier for the conversation
message string Yes User's question or message
metadata object No Optional filters for document search

Metadata Filters:

{
  "metadata": {
    "speaker": "John",           // Filter by speaker name
    "topic": "AI",               // Filter by topic
    "file_id": "doc_123",        // Filter to specific document
    "document_type": "transcript" // Filter by document type
  }
}

Response

Success (200 OK):

{
  "response": "Based on the quarterly report, revenue increased by 15%...",
  "sources": [
    {
      "filename": "Q3-Report.pdf",
      "chunk_index": 3,
      "similarity": 0.8543,
      "keyword_score": 0.75,
      "combined_score": 0.8225
    }
  ],
  "context_used": true,
  "llm_provider": "openai"
}

No Relevant Content:

{
  "response": "I don't have enough information to answer that question.",
  "sources": [],
  "context_used": false
}

Conversation Limit Reached:

{
  "response": "This conversation has reached its maximum length. Click the **+** button in the header to start a new conversation.",
  "sources": [],
  "context_used": false,
  "limit_reached": true
}

Example

curl -X POST https://your-domain.com/chatbot/public/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "user_12345",
    "message": "What were the key findings in the Q3 report?"
  }'

Rate Limits

Limit Value
Requests 30 per minute
Message length 10,000 characters
Identifier Session ID + IP address

Rate Limit Headers:

X-RateLimit-Limit: 30
X-RateLimit-Remaining: 25
X-RateLimit-Reset: 1706889600

Upload Endpoint

Upload documents or import media for the knowledge base.

File Upload

POST /upload.php
Content-Type: multipart/form-data

Form Parameters:

Parameter Type Required Description
file file Yes Document file to upload
metadata string No JSON string with custom metadata
options string No JSON string with processing options

Processing Options:

{
  "enable_ocr": true,
  "enable_tables": true,
  "enable_layout": true
}

URL Import

POST /upload.php
Content-Type: multipart/form-data

Form Parameters:

Parameter Type Required Description
url string Yes Media URL (YouTube, Vimeo, etc.)
subtitle_langs string No Comma-separated language codes (default: "en")
metadata string No JSON string with custom metadata

Response (Single Document)

{
  "success": true,
  "message": "Document processed and imported successfully",
  "document": {
    "filename": "report.pdf",
    "document_id": 42,
    "document_type": "pdf",
    "page_count": 15,
    "word_count": 5230,
    "tables_count": 3,
    "processing_time_ms": 2450
  }
}

Response (Playlist)

{
  "success": true,
  "message": "Playlist processed: 8 of 10 videos imported",
  "assistant_message": "I've imported 8 videos from the playlist...",
  "playlist": {
    "title": "Training Series",
    "video_count": 10,
    "imported_count": 8
  },
  "documents": [
    {
      "title": "Video 1",
      "document_id": 43,
      "word_count": 1250
    }
  ],
  "failed_videos": [
    {
      "title": "Video 5",
      "url": "https://...",
      "error_type": "no_transcript",
      "error": "No transcript available",
      "retryable": true
    }
  ]
}

Examples

File Upload:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -F "file=@document.pdf" \
  -F 'metadata={"department": "Engineering"}'

YouTube Import:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -F "url=https://www.youtube.com/watch?v=dQw4w9WgXcQ" \
  -F "subtitle_langs=en,es"

With API Key:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -H "X-API-Key: your-secret-key" \
  -F "file=@document.pdf"

Rate Limits

Limit Value
Requests 10 per minute
File size 100 MB (configurable)
Identifier IP address

Debug Endpoint

Inspect documents and test search queries. Useful for troubleshooting.

List Documents

GET /debug-rag.php

Response:

{
  "mode": "documents",
  "stats": {
    "total_documents": 15,
    "total_chunks": 234,
    "total_embeddings": 234
  },
  "documents": [
    {
      "id": 42,
      "filename": "report.pdf",
      "chunk_count": 12,
      "embedding_count": 12,
      "has_embeddings": true,
      "source_url": null
    }
  ]
}

Search Documents

GET /debug-rag.php?search=report

View Document Details

GET /debug-rag.php?doc=42

Response:

{
  "mode": "document",
  "document": {
    "id": 42,
    "filename": "report.pdf",
    "metadata": {...}
  },
  "chunks": [
    {
      "id": 101,
      "chunk_index": 0,
      "text_preview": "First 200 characters...",
      "has_embedding": true
    }
  ]
}

Test Search Query

GET /debug-rag.php?query=What+is+AI&threshold=0.25

Query Parameters:

Parameter Default Description
query The search query to test
threshold 0.30 Minimum combined score for results to be used

Understanding the Threshold:

The threshold parameter controls how strict the search matching is. It represents the minimum combined score (0.0 to 1.0) a document chunk must achieve to be considered relevant. The combined score is calculated as:

combined_score = (vector_similarity × 0.7) + (keyword_score × 0.3)

This is useful for debugging because:

  • Results not appearing? Lower the threshold (e.g., 0.15) to see chunks that are being rejected and understand why
  • Irrelevant results appearing? Raise the threshold (e.g., 0.40) to filter out weak matches
  • Compare scores — The response shows both passed_results and rejected_by_threshold, letting you see exactly which chunks were filtered out and their scores

Debugging Tip

If the chat returns "I don't have enough information," test the same question here with a low threshold (0.1) to see what chunks exist and their similarity scores. This reveals whether the issue is missing content or a threshold that's too strict.

Response:

{
  "mode": "query",
  "query": "What is AI",
  "threshold": 0.25,
  "results_above_threshold": 3,
  "results_below_threshold": 2,
  "passed_results": [
    {
      "chunk_id": 101,
      "filename": "ai-guide.pdf",
      "vector_similarity": 0.82,
      "keyword_score": 0.65,
      "combined_score": 0.77
    }
  ],
  "rejected_by_threshold": [...],
  "diagnosis": [
    "3 results above threshold 0.25"
  ]
}

API Debug Logs Endpoint

View and manage API call logs. Requires authentication.

Authentication

HTTP Basic Authentication using credentials from .env:

  • Username: API_DEBUG_USERNAME
  • Password: API_DEBUG_PASSWORD

List Sessions

GET /api-debug/logs
Authorization: Basic <base64-credentials>

Response:

{
  "sessions": ["session_123", "session_456"],
  "total_sessions": 2,
  "log_path": "/var/www/chatbot/logs/api_debug"
}

Get Session Logs

GET /api-debug/logs/{session_id}
Authorization: Basic <base64-credentials>

Response:

{
  "session_id": "session_123",
  "message_count": 5,
  "summary": {
    "total_input_tokens": 1250,
    "total_output_tokens": 850,
    "total_tokens": 2100,
    "total_duration_ms": 3450.25
  },
  "messages": [
    {
      "timestamp": "2026-02-02T10:30:00Z",
      "provider": "openai",
      "model": "gpt-4",
      "tokens": {"input": 250, "output": 170},
      "duration_ms": 690
    }
  ]
}

Delete Session Logs

DELETE /api-debug/logs/{session_id}
Authorization: Basic <base64-credentials>

Response:

{
  "success": true,
  "message": "Logs for session 'session_123' deleted"
}

Rate Limits

Limit Value
Auth attempts 5 per minute
Lockout After 10 failures, 5-minute lockout

Error Responses

All endpoints return consistent error formats:

Client Errors (4xx)

400 Bad Request:

{
  "error": "Missing required fields: session_id and message"
}

401 Unauthorized:

{
  "error": "Authentication required"
}

429 Too Many Requests:

{
  "error": "Rate limit exceeded",
  "retry_after": 45
}

Server Errors (5xx)

500 Internal Server Error:

{
  "error": "An error occurred processing your request. Please try again."
}

503 Service Unavailable:

{
  "error": "Document parsing service unavailable"
}

Response Headers

All endpoints include these headers:

Security Headers:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Referrer-Policy: strict-origin-when-cross-origin

CORS Headers (on upload):

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, X-API-Key

SDK Examples

PHP

<?php
$client = new GuzzleHttp\Client();

$response = $client->post('https://your-domain.com/chatbot/public/chat', [
    'json' => [
        'session_id' => 'user_123',
        'message' => 'What is in the report?'
    ]
]);

$data = json_decode($response->getBody(), true);
echo $data['response'];

Python

import requests

response = requests.post(
    'https://your-domain.com/chatbot/public/chat',
    json={
        'session_id': 'user_123',
        'message': 'What is in the report?'
    }
)

data = response.json()
print(data['response'])

JavaScript

const response = await fetch('https://your-domain.com/chatbot/public/chat', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        session_id: 'user_123',
        message: 'What is in the report?'
    })
});

const data = await response.json();
console.log(data.response);