API Endpoints¶

This reference documents all available API endpoints for the RAG Chatbot.

Base URL¶

All endpoints are relative to your installation path:

https://your-domain.com/chatbot/public/

Authentication¶

Most endpoints are public by default. Optional authentication is available:

Upload endpoint: Protected with X-API-Key header when UPLOAD_API_KEY is set
Debug endpoint: Protected with HTTP Basic Authentication

Chat Endpoint¶

Send messages and receive AI-powered responses based on your knowledge base.

Request¶

POST /chat
Content-Type: application/json

Body Parameters:

Parameter	Type	Required	Description
`session_id`	string	Yes	Unique identifier for the conversation
`message`	string	Yes	User's question or message
`metadata`	object	No	Optional filters for document search

Metadata Filters:

{
  "metadata": {
    "speaker": "John",           // Filter by speaker name
    "topic": "AI",               // Filter by topic
    "file_id": "doc_123",        // Filter to specific document
    "document_type": "transcript" // Filter by document type
  }
}

Response¶

Success (200 OK):

{
  "response": "Based on the quarterly report, revenue increased by 15%...",
  "sources": [
    {
      "filename": "Q3-Report.pdf",
      "chunk_index": 3,
      "similarity": 0.8543,
      "keyword_score": 0.75,
      "combined_score": 0.8225
    }
  ],
  "context_used": true,
  "llm_provider": "openai"
}

No Relevant Content:

{
  "response": "I don't have enough information to answer that question.",
  "sources": [],
  "context_used": false
}

Conversation Limit Reached:

{
  "response": "This conversation has reached its maximum length. Click the **+** button in the header to start a new conversation.",
  "sources": [],
  "context_used": false,
  "limit_reached": true
}

Example¶

curl -X POST https://your-domain.com/chatbot/public/chat \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "user_12345",
    "message": "What were the key findings in the Q3 report?"
  }'

Rate Limits¶

Limit	Value
Requests	30 per minute
Message length	10,000 characters
Identifier	Session ID + IP address

Rate Limit Headers:

X-RateLimit-Limit: 30
X-RateLimit-Remaining: 25
X-RateLimit-Reset: 1706889600

Upload Endpoint¶

Upload documents or import media for the knowledge base.

File Upload¶

POST /upload.php
Content-Type: multipart/form-data

Form Parameters:

Parameter	Type	Required	Description
`file`	file	Yes	Document file to upload
`metadata`	string	No	JSON string with custom metadata
`options`	string	No	JSON string with processing options

Processing Options:

{
  "enable_ocr": true,
  "enable_tables": true,
  "enable_layout": true
}

URL Import¶

POST /upload.php
Content-Type: multipart/form-data

Form Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	Media URL (YouTube, Vimeo, etc.)
`subtitle_langs`	string	No	Comma-separated language codes (default: "en")
`metadata`	string	No	JSON string with custom metadata

Response (Single Document)¶

{
  "success": true,
  "message": "Document processed and imported successfully",
  "document": {
    "filename": "report.pdf",
    "document_id": 42,
    "document_type": "pdf",
    "page_count": 15,
    "word_count": 5230,
    "tables_count": 3,
    "processing_time_ms": 2450
  }
}

Response (Playlist)¶

{
  "success": true,
  "message": "Playlist processed: 8 of 10 videos imported",
  "assistant_message": "I've imported 8 videos from the playlist...",
  "playlist": {
    "title": "Training Series",
    "video_count": 10,
    "imported_count": 8
  },
  "documents": [
    {
      "title": "Video 1",
      "document_id": 43,
      "word_count": 1250
    }
  ],
  "failed_videos": [
    {
      "title": "Video 5",
      "url": "https://...",
      "error_type": "no_transcript",
      "error": "No transcript available",
      "retryable": true
    }
  ]
}

Examples¶

File Upload:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -F "file=@document.pdf" \
  -F 'metadata={"department": "Engineering"}'

YouTube Import:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -F "url=https://www.youtube.com/watch?v=dQw4w9WgXcQ" \
  -F "subtitle_langs=en,es"

With API Key:

curl -X POST https://your-domain.com/chatbot/public/upload.php \
  -H "X-API-Key: your-secret-key" \
  -F "file=@document.pdf"

Rate Limits¶

Limit	Value
Requests	10 per minute
File size	100 MB (configurable)
Identifier	IP address

Debug Endpoint¶

Inspect documents and test search queries. Useful for troubleshooting.

List Documents¶

GET /debug-rag.php

Response:

{
  "mode": "documents",
  "stats": {
    "total_documents": 15,
    "total_chunks": 234,
    "total_embeddings": 234
  },
  "documents": [
    {
      "id": 42,
      "filename": "report.pdf",
      "chunk_count": 12,
      "embedding_count": 12,
      "has_embeddings": true,
      "source_url": null
    }
  ]
}

Search Documents¶

GET /debug-rag.php?search=report

View Document Details¶

GET /debug-rag.php?doc=42

Response:

{
  "mode": "document",
  "document": {
    "id": 42,
    "filename": "report.pdf",
    "metadata": {...}
  },
  "chunks": [
    {
      "id": 101,
      "chunk_index": 0,
      "text_preview": "First 200 characters...",
      "has_embedding": true
    }
  ]
}

Test Search Query¶

GET /debug-rag.php?query=What+is+AI&threshold=0.25

Query Parameters:

Parameter	Default	Description
`query`	—	The search query to test
`threshold`	0.30	Minimum combined score for results to be used

Understanding the Threshold:

The threshold parameter controls how strict the search matching is. It represents the minimum combined score (0.0 to 1.0) a document chunk must achieve to be considered relevant. The combined score is calculated as:

combined_score = (vector_similarity × 0.7) + (keyword_score × 0.3)

This is useful for debugging because:

Results not appearing? Lower the threshold (e.g., 0.15) to see chunks that are being rejected and understand why
Irrelevant results appearing? Raise the threshold (e.g., 0.40) to filter out weak matches
Compare scores — The response shows both passed_results and rejected_by_threshold, letting you see exactly which chunks were filtered out and their scores

Debugging Tip

If the chat returns "I don't have enough information," test the same question here with a low threshold (0.1) to see what chunks exist and their similarity scores. This reveals whether the issue is missing content or a threshold that's too strict.

Response:

{
  "mode": "query",
  "query": "What is AI",
  "threshold": 0.25,
  "results_above_threshold": 3,
  "results_below_threshold": 2,
  "passed_results": [
    {
      "chunk_id": 101,
      "filename": "ai-guide.pdf",
      "vector_similarity": 0.82,
      "keyword_score": 0.65,
      "combined_score": 0.77
    }
  ],
  "rejected_by_threshold": [...],
  "diagnosis": [
    "3 results above threshold 0.25"
  ]
}

API Debug Logs Endpoint¶

View and manage API call logs. Requires authentication.

Authentication¶

HTTP Basic Authentication using credentials from .env:

Username: API_DEBUG_USERNAME
Password: API_DEBUG_PASSWORD

List Sessions¶

GET /api-debug/logs
Authorization: Basic <base64-credentials>

Response:

{
  "sessions": ["session_123", "session_456"],
  "total_sessions": 2,
  "log_path": "/var/www/chatbot/logs/api_debug"
}

Get Session Logs¶

GET /api-debug/logs/{session_id}
Authorization: Basic <base64-credentials>

Response:

{
  "session_id": "session_123",
  "message_count": 5,
  "summary": {
    "total_input_tokens": 1250,
    "total_output_tokens": 850,
    "total_tokens": 2100,
    "total_duration_ms": 3450.25
  },
  "messages": [
    {
      "timestamp": "2026-02-02T10:30:00Z",
      "provider": "openai",
      "model": "gpt-4",
      "tokens": {"input": 250, "output": 170},
      "duration_ms": 690
    }
  ]
}

Delete Session Logs¶

DELETE /api-debug/logs/{session_id}
Authorization: Basic <base64-credentials>

Response:

{
  "success": true,
  "message": "Logs for session 'session_123' deleted"
}

Rate Limits¶

Limit	Value
Auth attempts	5 per minute
Lockout	After 10 failures, 5-minute lockout

Error Responses¶

All endpoints return consistent error formats:

Client Errors (4xx)¶

400 Bad Request:

{
  "error": "Missing required fields: session_id and message"
}

401 Unauthorized:

{
  "error": "Authentication required"
}

429 Too Many Requests:

{
  "error": "Rate limit exceeded",
  "retry_after": 45
}

Server Errors (5xx)¶

500 Internal Server Error:

{
  "error": "An error occurred processing your request. Please try again."
}

503 Service Unavailable:

{
  "error": "Document parsing service unavailable"
}

Response Headers¶

All endpoints include these headers:

Security Headers:

X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Referrer-Policy: strict-origin-when-cross-origin

CORS Headers (on upload):

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, X-API-Key

SDK Examples¶

PHP¶

<?php
$client = new GuzzleHttp\Client();

$response = $client->post('https://your-domain.com/chatbot/public/chat', [
    'json' => [
        'session_id' => 'user_123',
        'message' => 'What is in the report?'
    ]
]);

$data = json_decode($response->getBody(), true);
echo $data['response'];

Python¶

import requests

response = requests.post(
    'https://your-domain.com/chatbot/public/chat',
    json={
        'session_id': 'user_123',
        'message': 'What is in the report?'
    }
)

data = response.json()
print(data['response'])

JavaScript¶

const response = await fetch('https://your-domain.com/chatbot/public/chat', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        session_id: 'user_123',
        message: 'What is in the report?'
    })
});

const data = await response.json();
console.log(data.response);