Self-Host Paperless-ngx: Document Management 2026
TL;DR
Paperless-ngx (GPL 3.0, ~20K GitHub stars, Python/TypeScript) eliminates physical filing cabinets. Scan documents, drop PDFs into a watched folder, and Paperless automatically OCRs them, suggests tags and correspondents, and makes them full-text searchable. Adobe Acrobat charges $12.99/month for OCR and PDF management. Paperless-ngx is free and stores everything locally. After setup: every receipt, tax document, medical form, and letter is searchable in under 2 seconds.
Key Takeaways
- Paperless-ngx: GPL 3.0, ~20K stars — OCR + full-text search + tag-based document organization
- Auto-tagging: ML-based classifier learns from your manual tags and auto-suggests on new documents
- Consumed folder: Drop files in a folder → Paperless automatically imports and OCRs them
- Full-text search: OCR makes every word in every PDF searchable
- Correspondents: Track who documents are from (IRS, Bank of America, doctor's office, etc.)
- Document types: Categorize by type (Invoice, Receipt, Medical, Tax, Contract, etc.)
Part 1: Docker Setup
# docker-compose.yml
services:
broker:
image: redis:7-alpine
restart: unless-stopped
db:
image: postgres:15-alpine
restart: unless-stopped
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: "${POSTGRES_PASSWORD}"
volumes:
- db_data:/var/lib/postgresql/data
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
container_name: paperless
restart: unless-stopped
ports:
- "8000:8000"
volumes:
- paperless_data:/usr/src/paperless/data
- paperless_media:/usr/src/paperless/media
- /path/to/consume:/usr/src/paperless/consume # Watch this folder
- /path/to/export:/usr/src/paperless/export # Export goes here
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_DBHOST: db
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: "${POSTGRES_PASSWORD}"
PAPERLESS_DBNAME: paperless
PAPERLESS_SECRET_KEY: "${SECRET_KEY}"
PAPERLESS_URL: "https://docs.yourdomain.com"
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: "${ADMIN_PASSWORD}"
PAPERLESS_ADMIN_MAIL: admin@yourdomain.com
PAPERLESS_TIME_ZONE: America/Los_Angeles
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_TIKA_ENABLED: 1 # Enable for Office docs (DOCX, XLSX, etc.)
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
depends_on:
- broker
- db
# Required for Office document conversion:
gotenberg:
image: docker.io/gotenberg/gotenberg:7.10
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-javascript=true"
- "--chromium-allow-list=file:///tmp/.*"
tika:
image: ghcr.io/paperless-ngx/tika:latest
restart: unless-stopped
volumes:
db_data:
paperless_data:
paperless_media:
# .env
POSTGRES_PASSWORD=your-db-password
SECRET_KEY=your-50-char-secret-key
ADMIN_PASSWORD=your-admin-password
# Create consume folder:
mkdir -p ~/paperless/consume ~/paperless/export
docker compose up -d
Part 2: HTTPS with Caddy
docs.yourdomain.com {
reverse_proxy localhost:8000
}
Part 3: Import Documents
Method 1: Consume folder (automated)
Drop any file into the consume folder:
# Any of these formats work:
cp ~/Downloads/tax-return-2025.pdf ~/paperless/consume/
cp ~/Downloads/bank-statement.pdf ~/paperless/consume/
cp ~/Desktop/receipt.jpg ~/paperless/consume/
# Paperless watches the folder and automatically:
# 1. Moves file to media storage
# 2. Runs OCR (Tesseract)
# 3. Extracts text
# 4. Suggests tags/correspondent/document type via ML classifier
# 5. Makes it searchable
Method 2: Upload via web UI
- Documents → Upload → drag and drop files
- Multiple files at once
Method 3: Email ingestion
# Add to docker-compose.yml environment:
PAPERLESS_EMAIL_TASK_CRON: "*/10 * * * *"
# In Paperless web UI:
# Settings → Mail → Add mail account:
PAPERLESS_EMAIL_IMAP_SERVER: mail.yourdomain.com
PAPERLESS_EMAIL_USERNAME: paperless@yourdomain.com
PAPERLESS_EMAIL_PASSWORD: your-email-password
Emails matching rules are automatically imported as documents.
Part 4: Organization System
Correspondents
Track who documents are from:
- Settings → Correspondents → Add:
IRS,Bank of America,Blue Cross,Employer,Landlord
Paperless auto-assigns based on patterns you define, or learns from your corrections.
Document types
Categorize by type:
Invoice,Receipt,Tax Return,Medical Record,Insurance,Contract,Letter
Tags
Tag freely:
2025-taxes,medical-2025,car,home,reimbursable
Tags are the primary organization tool — a document can have multiple tags.
Date extraction
Paperless extracts dates from document content automatically. For receipts or letters, it finds the date in the text.
Part 5: Full-Text Search
# Search examples in the web UI:
"electric bill" → finds all utility bills
correspondent:IRS → all IRS documents
tag:2025-taxes → all 2025 tax documents
type:Invoice → all invoices
created:[2025-01-01 TO 2025-12-31] → documents from 2025
content:"account number" → documents containing that phrase
Combine filters:
correspondent:IRS tag:2025-taxes type:"Tax Return"
Part 6: Scanner Integration
Network scanners (SANE)
# Scan directly to consume folder via command line:
scanimage --device="brother5:net1;dev0" \
--format=pdf \
--resolution=300 \
--mode=Color \
> ~/paperless/consume/scan-$(date +%Y%m%d-%H%M%S).pdf
iOS/Android scanning
Use a scanning app that saves directly to your consume folder:
- iOS: Scanner Pro, Microsoft Lens → save to Nextcloud → watched by Paperless
- Android: Adobe Scan, Microsoft Lens → save to synced folder
Automatic scan workflow
Scanner app (iOS/Android)
→ Saves to Nextcloud folder (auto-sync)
→ Nextcloud folder is also your Paperless consume path
→ Paperless auto-imports and OCRs
→ Document searchable within 60 seconds
Part 7: ML Auto-Classifier
Paperless learns from your tagging behavior:
# Train the classifier manually:
docker exec paperless python manage.py document_create_classifier
# After training, Paperless suggests:
# - Correspondent (who it's from)
# - Document type
# - Tags
# - Storage path
# The more you correct suggestions, the better it gets.
Custom matching rules
# Settings → Tags → Edit tag → Add matching rule:
Tag: "medical"
Algorithm: "Any word"
Pattern: "physician diagnosis prescription copay deductible"
Case insensitive: Yes
Part 8: Export and Backup
# Export all documents (preserves metadata):
docker exec paperless document_exporter /usr/src/paperless/export
# This creates:
# export/
# ├── document_001.pdf ← original file
# ├── document_001.json ← metadata (tags, date, correspondent)
# ├── document_002.jpg
# └── ...
# The JSON metadata lets you re-import to a fresh Paperless instance.
# Database backup:
docker exec paperless-db-1 pg_dump -U paperless paperless \
| gzip > paperless-db-$(date +%Y%m%d).sql.gz
# Media backup:
tar -czf paperless-media-$(date +%Y%m%d).tar.gz \
$(docker volume inspect paperless_paperless_media --format '{{.Mountpoint}}')
Maintenance
# Update:
docker compose pull
docker compose up -d
# Check Paperless status:
docker exec paperless python manage.py status
# Reprocess a document (e.g., if OCR failed):
docker exec paperless python manage.py document_retagger --id=42
# Re-run classifier on all documents:
docker exec paperless python manage.py document_create_classifier
# Logs:
docker compose logs -f webserver
See all open source document management tools at OSSAlt.com/categories/productivity.