Skip to content

Docling Service

The Docling service is a Python-based microservice that handles document parsing, OCR, and media transcription. This guide covers deployment on AWS and Google Cloud.

Overview

Docling provides:

  • Document parsing - PDF, DOCX, PPTX, and more
  • OCR - Text extraction from scanned documents and images
  • Table extraction - Structured data from tables
  • Layout analysis - Document structure preservation
  • Media processing - Video/audio download and transcription

System Requirements

Minimum (CPU Mode)

Resource Requirement
CPU 4 cores
RAM 8 GB
Storage 50 GB SSD
OS Linux (Ubuntu 22.04, AlmaLinux 9)
Resource Requirement
GPU NVIDIA with 8+ GB VRAM
CPU 8 cores
RAM 16 GB
Storage 100 GB SSD
Driver CUDA 11.8+

AWS Deployment

Option 1: EC2 with Docker

Launch Instance

  1. Go to EC2 Console → Launch Instance
  2. Choose Amazon Linux 2023 or Ubuntu 22.04
  3. Select instance type:
  4. CPU: c6i.xlarge (4 vCPU, 8 GB RAM)
  5. GPU: g4dn.xlarge (1 T4 GPU, 4 vCPU, 16 GB RAM)
  6. Configure storage: 100 GB gp3
  7. Security group: Allow inbound port 8001 from your web server

Install Docker

# Amazon Linux 2023
sudo dnf install -y docker
sudo systemctl enable docker
sudo systemctl start docker
sudo usermod -aG docker $USER

# For GPU support
sudo dnf install -y nvidia-container-toolkit
sudo systemctl restart docker

Deploy Docling

# Clone or copy the docling-service directory
cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service

# Configure environment
cp .env.example .env
nano .env  # Adjust settings

# Start service (GPU)
docker compose --profile gpu up -d

# Or start service (CPU)
docker compose --profile cpu up -d

Configure Security Group

Ensure your EC2 security group allows:

Type Port Source
Custom TCP 8001 Web server IP/Security Group

Option 2: ECS with Fargate

For managed container deployment:

Create Task Definition

{
  "family": "docling-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "4096",
  "memory": "8192",
  "containerDefinitions": [
    {
      "name": "docling",
      "image": "your-ecr-repo/docling-service:latest",
      "portMappings": [
        {
          "containerPort": 8000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "DOCLING_DEVICE_MODE", "value": "cpu"},
        {"name": "DOCLING_MAX_CONCURRENT_JOBS", "value": "2"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/docling-service",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

Create Service

aws ecs create-service \
  --cluster your-cluster \
  --service-name docling-service \
  --task-definition docling-service:1 \
  --desired-count 1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx]}"

AWS GPU Instance Types

Instance GPU VRAM vCPU RAM Use Case
g4dn.xlarge 1x T4 16 GB 4 16 GB Development, low traffic
g4dn.2xlarge 1x T4 16 GB 8 32 GB Production, moderate traffic
g5.xlarge 1x A10G 24 GB 4 16 GB Production, high performance
p3.2xlarge 1x V100 16 GB 8 61 GB Heavy workloads

Google Cloud Deployment

Option 1: Compute Engine with Docker

Create VM Instance

Using gcloud CLI:

# CPU instance
gcloud compute instances create docling-service \
  --zone=us-central1-a \
  --machine-type=e2-standard-4 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=100GB \
  --boot-disk-type=pd-ssd

# GPU instance
gcloud compute instances create docling-service-gpu \
  --zone=us-central1-a \
  --machine-type=n1-standard-4 \
  --accelerator=type=nvidia-tesla-t4,count=1 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=100GB \
  --boot-disk-type=pd-ssd \
  --maintenance-policy=TERMINATE

Install Docker and NVIDIA Drivers

# Install Docker
sudo apt update
sudo apt install -y docker.io
sudo systemctl enable docker
sudo usermod -aG docker $USER

# For GPU: Install NVIDIA drivers
sudo apt install -y nvidia-driver-535
sudo reboot

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

Deploy Docling

cd /opt
git clone https://github.com/your-org/rag-chatbot.git
cd rag-chatbot/docling-service

cp .env.example .env
nano .env

# GPU mode
docker compose --profile gpu up -d

# CPU mode
docker compose --profile cpu up -d

Configure Firewall

gcloud compute firewall-rules create allow-docling \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:8001 \
  --source-ranges=YOUR_WEB_SERVER_IP/32

Option 2: Cloud Run

For serverless deployment (CPU only):

# Build and push image
cd docling-service
gcloud builds submit --tag gcr.io/your-project/docling-service

# Deploy to Cloud Run
gcloud run deploy docling-service \
  --image gcr.io/your-project/docling-service \
  --platform managed \
  --region us-central1 \
  --memory 8Gi \
  --cpu 4 \
  --timeout 300 \
  --concurrency 4 \
  --min-instances 1 \
  --max-instances 5

GCP GPU Machine Types

Machine Type GPU VRAM vCPU RAM
n1-standard-4 + T4 1x T4 16 GB 4 15 GB
n1-standard-8 + T4 1x T4 16 GB 8 30 GB
a2-highgpu-1g 1x A100 40 GB 12 85 GB

Host-Based Firewall

If running Docling on a standalone server (not cloud-managed), configure the host firewall to allow port 8001 from your web server:

# AlmaLinux/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=8001/tcp
sudo firewall-cmd --reload

# Ubuntu (ufw)
sudo ufw allow from YOUR_WEB_SERVER_IP to any port 8001

# CSF (ConfigServer Firewall)
# Option 1: Allow specific IP
sudo csf -a YOUR_WEB_SERVER_IP

# Option 2: Open port globally (less secure)
# Edit /etc/csf/csf.conf and add 8001 to TCP_IN
sudo csf -r

Security Best Practice

Only allow port 8001 from your web server's IP address, not from all sources. The Docling service should not be exposed to the public internet.


Configuration

Environment Variables

Create /opt/docling-service/.env:

# Service binding
DOCLING_SERVICE_HOST=0.0.0.0
DOCLING_SERVICE_PORT=8000

# Security (optional)
DOCLING_API_KEY=your-secret-key

# Device mode: auto, cuda, cpu, mps
DOCLING_DEVICE_MODE=auto

# GPU selection (for multi-GPU systems)
DOCLING_CUDA_DEVICE_ID=0

# Processing limits
DOCLING_MAX_FILE_SIZE_MB=100
DOCLING_MAX_BATCH_SIZE=50
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_JOB_TIMEOUT_SECONDS=3600

# Feature toggles
DOCLING_ENABLE_OCR=true
DOCLING_ENABLE_TABLE_EXTRACTION=true
DOCLING_ENABLE_LAYOUT_ANALYSIS=true
DOCLING_ENABLE_MATH_DETECTION=true

# OCR language
DOCLING_OCR_LANGUAGE=en

# Output format
DOCLING_OUTPUT_FORMAT=markdown

# Temporary storage
DOCLING_TEMP_DIR=/tmp/docling
DOCLING_RESULTS_TTL_SECONDS=3600

# Logging
DOCLING_LOG_LEVEL=INFO

Performance Tuning

CPU Mode

DOCLING_DEVICE_MODE=cpu
DOCLING_MAX_CONCURRENT_JOBS=2
DOCLING_MAX_BATCH_SIZE=20

GPU Mode

DOCLING_DEVICE_MODE=cuda
DOCLING_CUDA_DEVICE_ID=0
DOCLING_MAX_CONCURRENT_JOBS=4
DOCLING_MAX_BATCH_SIZE=50

Proxy Configuration

When importing media from YouTube and other platforms, you may encounter rate limiting or geo-restrictions. The Docling service includes a proxy rotation system for yt-dlp that helps bypass these issues.

Why Use Proxies?

  • Rate limiting - YouTube may block repeated requests from the same IP
  • Geo-restrictions - Access content available only in certain regions
  • Reliability - Automatic failover if one proxy becomes unavailable

Setting Up Proxies

  1. Copy the example proxy file in the config directory:
cd /path/to/docling-service
cp config/proxies.example.txt config/proxies.txt
  1. Edit the proxy file and add your proxies (one per line):
nano config/proxies.txt

Supported formats:

# HTTP proxies
http://proxy1.example.com:8080
http://user:password@proxy2.example.com:8080

# HTTPS proxies
https://secure-proxy.example.com:443

# SOCKS5 proxies (better anonymity)
socks5://proxy3.example.com:1080
socks5://user:password@socks-proxy.example.com:1080
  1. Configure the environment to point to the proxy file:
# Path to proxy list file
DOCLING_PROXY_FILE_PATH=/path/to/docling-service/config/proxies.txt

# How often to rotate to next proxy (seconds)
DOCLING_PROXY_ROTATION_INTERVAL_SECONDS=60
  1. Restart the service:
docker compose restart

Proxy Rotation Behavior

The proxy manager automatically rotates through your proxy list:

Setting Description
DOCLING_PROXY_FILE_PATH Path to file containing proxy list
DOCLING_PROXY_ROTATION_INTERVAL_SECONDS Seconds between automatic rotation (default: 60)

How it works:

  • Proxies are loaded from config/proxies.txt on startup
  • yt-dlp uses the current proxy for all media downloads
  • Automatically rotates to the next proxy at the configured interval
  • Skips invalid proxy entries and logs warnings
  • Reloads the proxy file automatically if it changes (no restart needed)
  • Credentials are masked in log output for security

Testing Proxies

Verify your proxies work before adding them:

# Test HTTP proxy
curl -x http://proxy.example.com:8080 https://api.ipify.org

# Test SOCKS5 proxy
curl -x socks5://proxy.example.com:1080 https://api.ipify.org

# Test authenticated proxy
curl -x http://user:pass@proxy.example.com:8080 https://api.ipify.org

Proxy Providers

For production use, consider residential proxy services that provide rotating IPs. Datacenter proxies are more likely to be blocked by media platforms.


Connecting to Main Application

Update your main application's .env:

# If Docling is on the same server
DOCLING_SERVICE_URL=http://localhost:8001

# If Docling is on a different server
DOCLING_SERVICE_URL=http://10.0.1.50:8001

# If using API key authentication
DOCLING_API_KEY=your-secret-key

# Increase timeout for large files
DOCLING_TIMEOUT=300

Health Monitoring

Health Check Endpoint

curl http://localhost:8001/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "device": "cuda",
  "models_loaded": true
}

Monitoring with CloudWatch (AWS)

Create a CloudWatch alarm for the health endpoint:

aws cloudwatch put-metric-alarm \
  --alarm-name docling-health \
  --metric-name HealthCheckStatus \
  --namespace Custom/Docling \
  --statistic Average \
  --period 60 \
  --threshold 1 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 3

Monitoring with Cloud Monitoring (GCP)

Create an uptime check in Cloud Console:

  1. Go to Monitoring → Uptime Checks
  2. Create check for http://YOUR_IP:8001/health
  3. Set alerting threshold

Troubleshooting

Service Won't Start

# Check container logs
docker compose logs docling

# Check if port is in use
ss -tlnp | grep 8001

# Verify GPU access
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

Out of Memory (OOM)

Reduce concurrent jobs:

DOCLING_MAX_CONCURRENT_JOBS=1
DOCLING_MAX_BATCH_SIZE=10

CUDA Errors

# Verify NVIDIA driver
nvidia-smi

# Check CUDA version
nvcc --version

# Reinstall container toolkit
sudo apt install --reinstall nvidia-container-toolkit
sudo systemctl restart docker

Slow Processing

  1. Enable GPU mode if available
  2. Reduce concurrent jobs to prevent resource contention
  3. Use SSD storage for temp directory
  4. Increase instance size

Scaling

Horizontal Scaling

For high-traffic deployments, run multiple Docling instances:

  1. Deploy multiple instances behind a load balancer
  2. Use shared storage for results (S3, GCS)
  3. Configure sticky sessions if needed

Auto-Scaling (AWS)

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name docling-asg \
  --launch-template LaunchTemplateName=docling-template \
  --min-size 1 \
  --max-size 5 \
  --target-group-arns arn:aws:elasticloadbalancing:...

Auto-Scaling (GCP)

gcloud compute instance-groups managed create docling-mig \
  --base-instance-name docling \
  --template docling-template \
  --size 1 \
  --zone us-central1-a

gcloud compute instance-groups managed set-autoscaling docling-mig \
  --max-num-replicas 5 \
  --target-cpu-utilization 0.7 \
  --zone us-central1-a