Docling Service¶

The Docling service is a Python-based microservice that handles document parsing, OCR, and media transcription. This guide covers deployment on AWS and Google Cloud.

Overview¶

Docling provides:

Document parsing - PDF, DOCX, PPTX, and more
OCR - Text extraction from scanned documents and images
Table extraction - Structured data from tables
Layout analysis - Document structure preservation
Media processing - Video/audio download and transcription

System Requirements¶

Minimum (CPU Mode)¶

Resource	Requirement
CPU	4 cores
RAM	8 GB
Storage	50 GB SSD
OS	Linux (Ubuntu 22.04, AlmaLinux 9)

Recommended (GPU Mode)¶

Resource	Requirement
GPU	NVIDIA with 8+ GB VRAM
CPU	8 cores
RAM	16 GB
Storage	100 GB SSD
Driver	CUDA 11.8+

AWS Deployment¶

Option 1: EC2 with Docker¶

Launch Instance¶

Go to EC2 Console → Launch Instance
Choose Amazon Linux 2023 or Ubuntu 22.04
Select instance type:
CPU: c6i.xlarge (4 vCPU, 8 GB RAM)
GPU: g4dn.xlarge (1 T4 GPU, 4 vCPU, 16 GB RAM)
Configure storage: 100 GB gp3
Security group: Allow inbound port 8001 from your web server

Install Docker¶

# Amazon Linux 2023
sudo dnf install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER

# For GPU support
sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart docker

Deploy Docling¶

# Clone or copy the docling-service directory
cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service

# Configure environment
cp .env.example .env
nano .env  # Adjust settings

# Start service (GPU)
docker compose --profile gpu up -d

# Or start service (CPU)
docker compose --profile cpu up -d

Configure Security Group¶

Ensure your EC2 security group allows:

Type	Port	Source
Custom TCP	8001	Web server IP/Security Group

Option 2: ECS with Fargate¶

For managed container deployment:

Create Task Definition¶

{
  "family": "docling-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "4096",
  "memory": "8192",
  "containerDefinitions": [
    {
      "name": "docling",
      "image": "your-ecr-repo/docling-service:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "DOCLING_DEVICE_MODE", "value": "cpu"},
        {"name": "DOCLING_MAX_CONCURRENT_JOBS", "value": "2"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/docling-service",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Create Service¶

aws ecs create-service \
  --cluster your-cluster \
  --service-name docling-service \
  --task-definition docling-service:1 \
  --desired-count 1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx]}"

AWS GPU Instance Types¶

Instance	GPU	VRAM	vCPU	RAM	Use Case
g4dn.xlarge	1x T4	16 GB	4	16 GB	Development, low traffic
g4dn.2xlarge	1x T4	16 GB	8	32 GB	Production, moderate traffic
g5.xlarge	1x A10G	24 GB	4	16 GB	Production, high performance
p3.2xlarge	1x V100	16 GB	8	61 GB	Heavy workloads

Google Cloud Deployment¶

Option 1: Compute Engine with Docker¶

Create VM Instance¶

Using gcloud CLI:

# CPU instance
gcloud compute instances create docling-service \
  --zone=us-central1-a \
  --machine-type=e2-standard-4 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=100GB \
  --boot-disk-type=pd-ssd

# GPU instance
gcloud compute instances create docling-service-gpu \
  --zone=us-central1-a \
  --machine-type=n1-standard-4 \
  --accelerator=type=nvidia-tesla-t4,count=1 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=100GB \
  --boot-disk-type=pd-ssd \
  --maintenance-policy=TERMINATE

Install Docker and NVIDIA Drivers¶

# Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable docker
sudo usermod -aG docker $USER

# For GPU: Install NVIDIA drivers
sudo apt install -y nvidia-driver-535
sudo reboot

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

Deploy Docling¶

cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service

cp .env.example .env
nano .env

# GPU mode
docker compose --profile gpu up -d

# CPU mode
docker compose --profile cpu up -d

Configure Firewall¶

gcloud compute firewall-rules create allow-docling \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:8001 \
  --source-ranges=YOUR_WEB_SERVER_IP/32

Option 2: Cloud Run¶

For serverless deployment (CPU only):

# Build and push image
cd docling-service
gcloud builds submit --tag gcr.io/your-project/docling-service

# Deploy to Cloud Run
gcloud run deploy docling-service \
  --image gcr.io/your-project/docling-service \
  --platform managed \
  --region us-central1 \
  --memory 8Gi \
  --cpu 4 \
  --timeout 300 \
  --concurrency 4 \
  --min-instances 1 \
  --max-instances 5

GCP GPU Machine Types¶

Machine Type	GPU	VRAM	vCPU	RAM
n1-standard-4 + T4	1x T4	16 GB	4	15 GB
n1-standard-8 + T4	1x T4	16 GB	8	30 GB
a2-highgpu-1g	1x A100	40 GB	12	85 GB

Host-Based Firewall¶

If running Docling on a standalone server (not cloud-managed), configure the host firewall to allow port 8001 from your web server:

# AlmaLinux/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=8001/tcp
sudo firewall-cmd --reload

# Ubuntu (ufw)
sudo ufw allow from YOUR_WEB_SERVER_IP to any port 8001

# CSF (ConfigServer Firewall)
# Option 1: Allow specific IP
sudo csf -a YOUR_WEB_SERVER_IP

# Option 2: Open port globally (less secure)
# Edit /etc/csf/csf.conf and add 8001 to TCP_IN
sudo csf -r

Security Best Practice

Only allow port 8001 from your web server's IP address, not from all sources. The Docling service should not be exposed to the public internet.

Configuration¶

Environment Variables¶

Create /opt/docling-service/.env:

# Service binding
DOCLING_SERVICE_HOST=0.0.0.0
DOCLING_SERVICE_PORT=8000

# Security (optional)
DOCLING_API_KEY=your-secret-key

# Device mode: auto, cuda, cpu, mps
DOCLING_DEVICE_MODE=auto

# GPU selection (for multi-GPU systems)
DOCLING_CUDA_DEVICE_ID=0

# Processing limits
DOCLING_MAX_FILE_SIZE_MB=100
DOCLING_MAX_BATCH_SIZE=50
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_JOB_TIMEOUT_SECONDS=3600

# Feature toggles
DOCLING_ENABLE_OCR=true
DOCLING_ENABLE_TABLE_EXTRACTION=true
DOCLING_ENABLE_LAYOUT_ANALYSIS=true
DOCLING_ENABLE_MATH_DETECTION=true

# OCR language
DOCLING_OCR_LANGUAGE=en

# Output format
DOCLING_OUTPUT_FORMAT=markdown

# Temporary storage
DOCLING_TEMP_DIR=/tmp/docling
DOCLING_RESULTS_TTL_SECONDS=3600

# Logging
DOCLING_LOG_LEVEL=INFO

Performance Tuning¶

CPU Mode¶

DOCLING_DEVICE_MODE=cpu
DOCLING_MAX_CONCURRENT_JOBS=2
DOCLING_MAX_BATCH_SIZE=20

GPU Mode¶

DOCLING_DEVICE_MODE=cuda
DOCLING_CUDA_DEVICE_ID=0
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_MAX_BATCH_SIZE=50

Proxy Configuration¶

When importing media from YouTube and other platforms, you may encounter rate limiting or geo-restrictions. The Docling service includes a proxy rotation system for yt-dlp that helps bypass these issues.

Why Use Proxies?¶

Rate limiting - YouTube may block repeated requests from the same IP
Geo-restrictions - Access content available only in certain regions
Reliability - Automatic failover if one proxy becomes unavailable

Setting Up Proxies¶

Copy the example proxy file in the config directory:

cd /path/to/docling-service
cp config/proxies.example.txt config/proxies.txt

Edit the proxy file and add your proxies (one per line):

nano config/proxies.txt

Supported formats:

# HTTP proxies
http://proxy1.example.com:8080
http://user:password@proxy2.example.com:8080

# HTTPS proxies
https://secure-proxy.example.com:443

# SOCKS5 proxies (better anonymity)
socks5://proxy3.example.com:1080
socks5://user:password@socks-proxy.example.com:1080

Configure the environment to point to the proxy file:

# Path to proxy list file
DOCLING_PROXY_FILE_PATH=/path/to/docling-service/config/proxies.txt

# How often to rotate to next proxy (seconds)
DOCLING_PROXY_ROTATION_INTERVAL_SECONDS=60

Restart the service:

docker compose restart

Proxy Rotation Behavior¶

The proxy manager automatically rotates through your proxy list:

Setting	Description
`DOCLING_PROXY_FILE_PATH`	Path to file containing proxy list
`DOCLING_PROXY_ROTATION_INTERVAL_SECONDS`	Seconds between automatic rotation (default: 60)

How it works:

Proxies are loaded from config/proxies.txt on startup
yt-dlp uses the current proxy for all media downloads
Automatically rotates to the next proxy at the configured interval
Skips invalid proxy entries and logs warnings
Reloads the proxy file automatically if it changes (no restart needed)
Credentials are masked in log output for security

Testing Proxies¶

Verify your proxies work before adding them:

# Test HTTP proxy
curl -x http://proxy.example.com:8080 https://api.ipify.org

# Test SOCKS5 proxy
curl -x socks5://proxy.example.com:1080 https://api.ipify.org

# Test authenticated proxy
curl -x http://user:pass@proxy.example.com:8080 https://api.ipify.org

Proxy Providers

For production use, consider residential proxy services that provide rotating IPs. Datacenter proxies are more likely to be blocked by media platforms.

Connecting to Main Application¶

Update your main application's .env:

# If Docling is on the same server
DOCLING_SERVICE_URL=http://localhost:8001

# If Docling is on a different server
DOCLING_SERVICE_URL=http://10.0.1.50:8001

# If using API key authentication
DOCLING_API_KEY=your-secret-key

# Increase timeout for large files
DOCLING_TIMEOUT=300

Health Monitoring¶

Health Check Endpoint¶

curl http://localhost:8001/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "device": "cuda",
  "models_loaded": true
}

Monitoring with CloudWatch (AWS)¶

Create a CloudWatch alarm for the health endpoint:

aws cloudwatch put-metric-alarm \
  --alarm-name docling-health \
  --metric-name HealthCheckStatus \
  --namespace Custom/Docling \
  --statistic Average \
  --period 60 \
  --threshold 1 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 3

Monitoring with Cloud Monitoring (GCP)¶

Create an uptime check in Cloud Console:

Go to Monitoring → Uptime Checks
Create check for http://YOUR_IP:8001/health
Set alerting threshold

Troubleshooting¶

Service Won't Start¶

# Check container logs
docker compose logs docling

# Check if port is in use
ss -tlnp | grep 8001

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Out of Memory (OOM)¶

Reduce concurrent jobs:

DOCLING_MAX_CONCURRENT_JOBS=1
DOCLING_MAX_BATCH_SIZE=10

CUDA Errors¶

# Verify NVIDIA driver
nvidia-smi

# Check CUDA version
nvcc --version

# Reinstall container toolkit
sudo apt install --reinstall nvidia-container-toolkit
sudo systemctl restart docker

Slow Processing¶

Enable GPU mode if available
Reduce concurrent jobs to prevent resource contention
Use SSD storage for temp directory
Increase instance size

Scaling¶

Horizontal Scaling¶

For high-traffic deployments, run multiple Docling instances:

Deploy multiple instances behind a load balancer
Use shared storage for results (S3, GCS)
Configure sticky sessions if needed

Auto-Scaling (AWS)¶

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name docling-asg \
  --launch-template LaunchTemplateName=docling-template \
  --min-size 1 \
  --max-size 5 \
  --target-group-arns arn:aws:elasticloadbalancing:...

Auto-Scaling (GCP)¶

gcloud compute instance-groups managed create docling-mig \
  --base-instance-name docling \
  --template docling-template \
  --size 1 \
  --zone us-central1-a

gcloud compute instance-groups managed set-autoscaling docling-mig \
  --max-num-replicas 5 \
  --target-cpu-utilization 0.7 \
  --zone us-central1-a