Docling Service¶
The Docling service is a Python-based microservice that handles document parsing, OCR, and media transcription. This guide covers deployment on AWS and Google Cloud.
Overview¶
Docling provides:
- Document parsing - PDF, DOCX, PPTX, and more
- OCR - Text extraction from scanned documents and images
- Table extraction - Structured data from tables
- Layout analysis - Document structure preservation
- Media processing - Video/audio download and transcription
System Requirements¶
Minimum (CPU Mode)¶
| Resource | Requirement |
|---|---|
| CPU | 4 cores |
| RAM | 8 GB |
| Storage | 50 GB SSD |
| OS | Linux (Ubuntu 22.04, AlmaLinux 9) |
Recommended (GPU Mode)¶
| Resource | Requirement |
|---|---|
| GPU | NVIDIA with 8+ GB VRAM |
| CPU | 8 cores |
| RAM | 16 GB |
| Storage | 100 GB SSD |
| Driver | CUDA 11.8+ |
AWS Deployment¶
Option 1: EC2 with Docker¶
Launch Instance¶
- Go to EC2 Console → Launch Instance
- Choose Amazon Linux 2023 or Ubuntu 22.04
- Select instance type:
- CPU:
c6i.xlarge(4 vCPU, 8 GB RAM) - GPU:
g4dn.xlarge(1 T4 GPU, 4 vCPU, 16 GB RAM) - Configure storage: 100 GB gp3
- Security group: Allow inbound port 8001 from your web server
Install Docker¶
# Amazon Linux 2023
sudo dnf install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER
# For GPU support
sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart docker
Deploy Docling¶
# Clone or copy the docling-service directory
cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service
# Configure environment
cp .env.example .env
nano .env # Adjust settings
# Start service (GPU)
docker compose --profile gpu up -d
# Or start service (CPU)
docker compose --profile cpu up -d
Configure Security Group¶
Ensure your EC2 security group allows:
| Type | Port | Source |
|---|---|---|
| Custom TCP | 8001 | Web server IP/Security Group |
Option 2: ECS with Fargate¶
For managed container deployment:
Create Task Definition¶
{
"family": "docling-service",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "4096",
"memory": "8192",
"containerDefinitions": [
{
"name": "docling",
"image": "your-ecr-repo/docling-service:latest",
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{"name": "DOCLING_DEVICE_MODE", "value": "cpu"},
{"name": "DOCLING_MAX_CONCURRENT_JOBS", "value": "2"}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/docling-service",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
Create Service¶
aws ecs create-service \
--cluster your-cluster \
--service-name docling-service \
--task-definition docling-service:1 \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx]}"
AWS GPU Instance Types¶
| Instance | GPU | VRAM | vCPU | RAM | Use Case |
|---|---|---|---|---|---|
| g4dn.xlarge | 1x T4 | 16 GB | 4 | 16 GB | Development, low traffic |
| g4dn.2xlarge | 1x T4 | 16 GB | 8 | 32 GB | Production, moderate traffic |
| g5.xlarge | 1x A10G | 24 GB | 4 | 16 GB | Production, high performance |
| p3.2xlarge | 1x V100 | 16 GB | 8 | 61 GB | Heavy workloads |
Google Cloud Deployment¶
Option 1: Compute Engine with Docker¶
Create VM Instance¶
Using gcloud CLI:
# CPU instance
gcloud compute instances create docling-service \
--zone=us-central1-a \
--machine-type=e2-standard-4 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=100GB \
--boot-disk-type=pd-ssd
# GPU instance
gcloud compute instances create docling-service-gpu \
--zone=us-central1-a \
--machine-type=n1-standard-4 \
--accelerator=type=nvidia-tesla-t4,count=1 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=100GB \
--boot-disk-type=pd-ssd \
--maintenance-policy=TERMINATE
Install Docker and NVIDIA Drivers¶
# Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable docker
sudo usermod -aG docker $USER
# For GPU: Install NVIDIA drivers
sudo apt install -y nvidia-driver-535
sudo reboot
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker
Deploy Docling¶
cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service
cp .env.example .env
nano .env
# GPU mode
docker compose --profile gpu up -d
# CPU mode
docker compose --profile cpu up -d
Configure Firewall¶
gcloud compute firewall-rules create allow-docling \
--direction=INGRESS \
--action=ALLOW \
--rules=tcp:8001 \
--source-ranges=YOUR_WEB_SERVER_IP/32
Option 2: Cloud Run¶
For serverless deployment (CPU only):
# Build and push image
cd docling-service
gcloud builds submit --tag gcr.io/your-project/docling-service
# Deploy to Cloud Run
gcloud run deploy docling-service \
--image gcr.io/your-project/docling-service \
--platform managed \
--region us-central1 \
--memory 8Gi \
--cpu 4 \
--timeout 300 \
--concurrency 4 \
--min-instances 1 \
--max-instances 5
GCP GPU Machine Types¶
| Machine Type | GPU | VRAM | vCPU | RAM |
|---|---|---|---|---|
| n1-standard-4 + T4 | 1x T4 | 16 GB | 4 | 15 GB |
| n1-standard-8 + T4 | 1x T4 | 16 GB | 8 | 30 GB |
| a2-highgpu-1g | 1x A100 | 40 GB | 12 | 85 GB |
Host-Based Firewall¶
If running Docling on a standalone server (not cloud-managed), configure the host firewall to allow port 8001 from your web server:
# AlmaLinux/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=8001/tcp
sudo firewall-cmd --reload
# Ubuntu (ufw)
sudo ufw allow from YOUR_WEB_SERVER_IP to any port 8001
# CSF (ConfigServer Firewall)
# Option 1: Allow specific IP
sudo csf -a YOUR_WEB_SERVER_IP
# Option 2: Open port globally (less secure)
# Edit /etc/csf/csf.conf and add 8001 to TCP_IN
sudo csf -r
Security Best Practice
Only allow port 8001 from your web server's IP address, not from all sources. The Docling service should not be exposed to the public internet.
Configuration¶
Environment Variables¶
Create /opt/docling-service/.env:
# Service binding
DOCLING_SERVICE_HOST=0.0.0.0
DOCLING_SERVICE_PORT=8000
# Security (optional)
DOCLING_API_KEY=your-secret-key
# Device mode: auto, cuda, cpu, mps
DOCLING_DEVICE_MODE=auto
# GPU selection (for multi-GPU systems)
DOCLING_CUDA_DEVICE_ID=0
# Processing limits
DOCLING_MAX_FILE_SIZE_MB=100
DOCLING_MAX_BATCH_SIZE=50
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_JOB_TIMEOUT_SECONDS=3600
# Feature toggles
DOCLING_ENABLE_OCR=true
DOCLING_ENABLE_TABLE_EXTRACTION=true
DOCLING_ENABLE_LAYOUT_ANALYSIS=true
DOCLING_ENABLE_MATH_DETECTION=true
# OCR language
DOCLING_OCR_LANGUAGE=en
# Output format
DOCLING_OUTPUT_FORMAT=markdown
# Temporary storage
DOCLING_TEMP_DIR=/tmp/docling
DOCLING_RESULTS_TTL_SECONDS=3600
# Logging
DOCLING_LOG_LEVEL=INFO
Performance Tuning¶
CPU Mode¶
GPU Mode¶
DOCLING_DEVICE_MODE=cuda
DOCLING_CUDA_DEVICE_ID=0
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_MAX_BATCH_SIZE=50
Proxy Configuration¶
When importing media from YouTube and other platforms, you may encounter rate limiting or geo-restrictions. The Docling service includes a proxy rotation system for yt-dlp that helps bypass these issues.
Why Use Proxies?¶
- Rate limiting - YouTube may block repeated requests from the same IP
- Geo-restrictions - Access content available only in certain regions
- Reliability - Automatic failover if one proxy becomes unavailable
Setting Up Proxies¶
- Copy the example proxy file in the
configdirectory:
- Edit the proxy file and add your proxies (one per line):
Supported formats:
# HTTP proxies
http://proxy1.example.com:8080
http://user:password@proxy2.example.com:8080
# HTTPS proxies
https://secure-proxy.example.com:443
# SOCKS5 proxies (better anonymity)
socks5://proxy3.example.com:1080
socks5://user:password@socks-proxy.example.com:1080
- Configure the environment to point to the proxy file:
# Path to proxy list file
DOCLING_PROXY_FILE_PATH=/path/to/docling-service/config/proxies.txt
# How often to rotate to next proxy (seconds)
DOCLING_PROXY_ROTATION_INTERVAL_SECONDS=60
- Restart the service:
Proxy Rotation Behavior¶
The proxy manager automatically rotates through your proxy list:
| Setting | Description |
|---|---|
DOCLING_PROXY_FILE_PATH | Path to file containing proxy list |
DOCLING_PROXY_ROTATION_INTERVAL_SECONDS | Seconds between automatic rotation (default: 60) |
How it works:
- Proxies are loaded from
config/proxies.txton startup - yt-dlp uses the current proxy for all media downloads
- Automatically rotates to the next proxy at the configured interval
- Skips invalid proxy entries and logs warnings
- Reloads the proxy file automatically if it changes (no restart needed)
- Credentials are masked in log output for security
Testing Proxies¶
Verify your proxies work before adding them:
# Test HTTP proxy
curl -x http://proxy.example.com:8080 https://api.ipify.org
# Test SOCKS5 proxy
curl -x socks5://proxy.example.com:1080 https://api.ipify.org
# Test authenticated proxy
curl -x http://user:pass@proxy.example.com:8080 https://api.ipify.org
Proxy Providers
For production use, consider residential proxy services that provide rotating IPs. Datacenter proxies are more likely to be blocked by media platforms.
Connecting to Main Application¶
Update your main application's .env:
# If Docling is on the same server
DOCLING_SERVICE_URL=http://localhost:8001
# If Docling is on a different server
DOCLING_SERVICE_URL=http://10.0.1.50:8001
# If using API key authentication
DOCLING_API_KEY=your-secret-key
# Increase timeout for large files
DOCLING_TIMEOUT=300
Health Monitoring¶
Health Check Endpoint¶
Response:
Monitoring with CloudWatch (AWS)¶
Create a CloudWatch alarm for the health endpoint:
aws cloudwatch put-metric-alarm \
--alarm-name docling-health \
--metric-name HealthCheckStatus \
--namespace Custom/Docling \
--statistic Average \
--period 60 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--evaluation-periods 3
Monitoring with Cloud Monitoring (GCP)¶
Create an uptime check in Cloud Console:
- Go to Monitoring → Uptime Checks
- Create check for
http://YOUR_IP:8001/health - Set alerting threshold
Troubleshooting¶
Service Won't Start¶
# Check container logs
docker compose logs docling
# Check if port is in use
ss -tlnp | grep 8001
# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi
Out of Memory (OOM)¶
Reduce concurrent jobs:
CUDA Errors¶
# Verify NVIDIA driver
nvidia-smi
# Check CUDA version
nvcc --version
# Reinstall container toolkit
sudo apt install --reinstall nvidia-container-toolkit
sudo systemctl restart docker
Slow Processing¶
- Enable GPU mode if available
- Reduce concurrent jobs to prevent resource contention
- Use SSD storage for temp directory
- Increase instance size
Scaling¶
Horizontal Scaling¶
For high-traffic deployments, run multiple Docling instances:
- Deploy multiple instances behind a load balancer
- Use shared storage for results (S3, GCS)
- Configure sticky sessions if needed
Auto-Scaling (AWS)¶
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name docling-asg \
--launch-template LaunchTemplateName=docling-template \
--min-size 1 \
--max-size 5 \
--target-group-arns arn:aws:elasticloadbalancing:...