The Complete Self-Hosted AI Stack in 2026
The Case for a Self-Hosted AI Stack
The typical AI tooling bill for a 5-person development team in 2026:
- ChatGPT Plus (5 users): $1,200/year
- GitHub Copilot (5 users): $2,280/year
- Claude Pro (5 users): $1,500/year
- Midjourney (5 users): $600/year
- Total: ~$5,580/year
A self-hosted AI stack running on a $35/month dedicated server: $420/year — with no per-query limits, no data leaving your network, and the ability to run models fine-tuned on your specific domain.
This guide covers building a production-ready self-hosted AI infrastructure using the best open source tools in 2026.
Stack Overview
┌─────────────────────────────────────────────────┐
│ Reverse Proxy (Caddy) │
│ HTTPS + subdomain routing │
└────────┬───────────────────────────────────────┘
│
┌────┴────────────────────────────────────┐
│ │
┌───▼────┐ ┌──────────┐ ┌────────────┐ ┌──▼─────┐
│ Open │ │ Dify │ │ n8n │ │Continue│
│ WebUI │ │ │ │ │ │ .dev │
│(Chat) │ │(RAG/Apps)│ │(Automation)│ │ (Code) │
└───┬────┘ └────┬─────┘ └─────┬──────┘ └────────┘
│ │ │
└─────────────┼──────────────┘
│
┌─────────────▼──────────────┐
│ Ollama │
│ (Local LLM serving) │
│ Llama 3.1, Mistral, Qwen │
└────────────────────────────┘
│ │
┌────▼────┐ ┌──────▼──────┐
│ Weaviate│ │ PostgreSQL │
│(Vector) │ │+ pgvector │
└─────────┘ └─────────────┘
Component Roles
| Component | Purpose | Alternative to |
|---|---|---|
| Ollama | Run local LLMs | OpenAI API |
| Open WebUI | ChatGPT-like interface | ChatGPT |
| Dify | AI app builder + RAG | Dify.ai Cloud, LangChain |
| n8n | Workflow automation | Zapier, Make |
| Continue.dev | AI code completion | GitHub Copilot |
| Weaviate | Vector database | Pinecone |
| PostgreSQL | Application database | Supabase |
| Caddy | HTTPS reverse proxy | Nginx + Certbot |
Hardware Requirements
Minimum Stack (Text-Only, CPU Inference)
| Component | Server | Monthly |
|---|---|---|
| Everything | Hetzner CPX31 (8GB, 4 cores) | $10 |
| Model quality | 7B parameters max | — |
| Inference speed | 5-10 tokens/sec | — |
CPU inference works. It's slow for 7B+ models but functional for non-latency-sensitive workloads.
Recommended Stack (GPU Inference)
| Component | Server | Monthly |
|---|---|---|
| AI + services | Hetzner GEX44 (RTX 4000, 20GB VRAM) | ~$90 |
| Or split: services | Hetzner CPX31 (8GB) | $10 |
| + AI inference | RunPod RTX 4090 (24GB) | ~$0.74/hr |
| Model quality | 70B parameters | — |
| Inference speed | 30-80 tokens/sec | — |
Best cost-efficient approach: CPU server for Dify, n8n, Open WebUI + Weaviate ($10/month) plus a GPU instance you start/stop on-demand for heavy AI tasks.
Local Machine Option
For individual developers:
- MacBook M3 Pro (36GB): Runs 70B models at 15-25 tokens/sec. No server cost.
- PC with RTX 4090 (24GB VRAM): Runs 30-70B models fast.
This guide focuses on server deployment for teams.
Step 1: Provision and Configure Server
# On your server (example: Hetzner CPX41, 16GB, 6 cores)
sudo apt update && sudo apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
# Install Caddy for HTTPS
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy
Point DNS records to your server IP before proceeding:
ai.yourdomain.com→ your server IP (Open WebUI)dify.yourdomain.com→ your server IPn8n.yourdomain.com→ your server IP
Step 2: Deploy the Core Stack
Create a workspace directory:
mkdir -p /opt/ai-stack && cd /opt/ai-stack
Create docker-compose.yml:
services:
# ===========================
# Local LLM Engine
# ===========================
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama_data:/root/.ollama
restart: unless-stopped
# Uncomment for NVIDIA GPU:
# runtime: nvidia
# environment:
# - NVIDIA_VISIBLE_DEVICES=all
# ===========================
# Chat Interface
# ===========================
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "127.0.0.1:3000:8080"
volumes:
- open-webui:/app/backend/data
environment:
OLLAMA_BASE_URL: http://ollama:11434
WEBUI_SECRET_KEY: ${OPENWEBUI_SECRET}
restart: unless-stopped
depends_on:
- ollama
# ===========================
# AI App Builder + RAG
# ===========================
dify-api:
image: langgenius/dify-api:latest
container_name: dify-api
ports:
- "127.0.0.1:5001:5001"
environment:
MODE: api
SECRET_KEY: ${DIFY_SECRET}
DATABASE_URL: postgresql://dify:${DB_PASSWORD}@db:5432/dify
REDIS_URL: redis://redis:6379
STORAGE_TYPE: local
volumes:
- dify_storage:/app/api/storage
depends_on:
- db
- redis
restart: unless-stopped
dify-worker:
image: langgenius/dify-api:latest
container_name: dify-worker
environment:
MODE: worker
SECRET_KEY: ${DIFY_SECRET}
DATABASE_URL: postgresql://dify:${DB_PASSWORD}@db:5432/dify
REDIS_URL: redis://redis:6379
STORAGE_TYPE: local
volumes:
- dify_storage:/app/api/storage
depends_on:
- db
- redis
restart: unless-stopped
dify-web:
image: langgenius/dify-web:latest
container_name: dify-web
ports:
- "127.0.0.1:3001:3000"
environment:
NEXT_PUBLIC_API_PREFIX: https://dify.yourdomain.com/api
restart: unless-stopped
# ===========================
# Workflow Automation
# ===========================
n8n:
image: n8nio/n8n:latest
container_name: n8n
ports:
- "127.0.0.1:5678:5678"
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: db
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: ${DB_PASSWORD}
N8N_ENCRYPTION_KEY: ${N8N_SECRET}
N8N_HOST: n8n.yourdomain.com
WEBHOOK_URL: https://n8n.yourdomain.com/
volumes:
- n8n_data:/home/node/.n8n
depends_on:
- db
restart: unless-stopped
# ===========================
# Vector Database
# ===========================
weaviate:
image: cr.weaviate.io/semitechnologies/weaviate:latest
container_name: weaviate
ports:
- "127.0.0.1:8080:8080"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
PERSISTENCE_DATA_PATH: /var/lib/weaviate
DEFAULT_VECTORIZER_MODULE: none
CLUSTER_HOSTNAME: node1
volumes:
- weaviate_data:/var/lib/weaviate
restart: unless-stopped
# ===========================
# Shared Database + Cache
# ===========================
db:
image: postgres:16-alpine
container_name: postgres
environment:
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_MULTIPLE_DATABASES: "dify,n8n"
volumes:
- pg_data:/var/lib/postgresql/data
- ./init-multi-db.sh:/docker-entrypoint-initdb.d/init-multi-db.sh
restart: unless-stopped
redis:
image: redis:7-alpine
container_name: redis
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
ollama_data:
open-webui:
dify_storage:
n8n_data:
weaviate_data:
pg_data:
redis_data:
Create .env:
# Database password (use for all services)
DB_PASSWORD=GenerateAStrongPassword123
# Secrets (generate each with: openssl rand -base64 32)
OPENWEBUI_SECRET=secret1
DIFY_SECRET=secret2
N8N_SECRET=secret3
Create init-multi-db.sh (initializes multiple PostgreSQL databases):
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
CREATE DATABASE dify;
CREATE USER dify WITH PASSWORD '$POSTGRES_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE dify TO dify;
CREATE DATABASE n8n;
CREATE USER n8n WITH PASSWORD '$POSTGRES_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE n8n TO n8n;
EOSQL
chmod +x init-multi-db.sh
Step 3: Start the Stack
docker compose up -d
Monitor all containers:
docker compose ps
docker compose logs -f
Initial startup takes 3-10 minutes as databases initialize and container images are pulled.
Step 4: Configure Caddy for HTTPS
Create /etc/caddy/Caddyfile:
# Open WebUI - Chat Interface
ai.yourdomain.com {
reverse_proxy localhost:3000
}
# Dify - AI App Builder
dify.yourdomain.com {
handle /api/* {
reverse_proxy localhost:5001
}
handle {
reverse_proxy localhost:3001
}
}
# n8n - Workflow Automation
n8n.yourdomain.com {
reverse_proxy localhost:5678
}
sudo systemctl restart caddy
Caddy automatically obtains SSL certificates for all configured domains.
Step 5: Download AI Models
Pull models into Ollama:
# Efficient general-purpose model (best quality per RAM)
docker exec ollama ollama pull llama3.1:8b
# For code assistance
docker exec ollama ollama pull qwen2.5-coder:7b
# For embeddings (required for RAG in Dify and Open WebUI)
docker exec ollama ollama pull nomic-embed-text
# If you have 16GB+ RAM: better quality model
docker exec ollama ollama pull qwen2.5:14b
Step 6: Configure Open WebUI
Navigate to https://ai.yourdomain.com
- Create admin account (first user becomes admin)
- Settings → Models: Verify Ollama models appear
- Settings → Web Search: Configure SearXNG or Brave API for web search
- Admin Panel → Users: Enable open registration or manage user accounts
Connect to External APIs (Optional)
If you also want cloud model access alongside local models:
Settings → Connections → OpenAI:
- Enter your OpenAI API key
- Open WebUI shows both local (Ollama) and cloud models in one dropdown
Step 7: Set Up Dify
Navigate to https://dify.yourdomain.com
-
Create admin account
-
Settings → Model Provider → Ollama:
- Base URL:
http://ollama:11434 - Add models:
llama3.1:8b,nomic-embed-text
- Base URL:
-
Knowledge → Create knowledge base:
- Upload your documentation, PDFs, wikis
- Select
nomic-embed-textas embedding model - This creates a searchable vector index
-
Build your first AI app:
- Studio → Create App → Chatbot
- Attach the knowledge base for RAG
- Deploy as an API or embedded widget
Example Use Cases for Dify
Internal Documentation Bot: Upload all your internal docs and wikis to a knowledge base. Create a chatbot that answers questions about your company's processes.
Customer FAQ Bot: Upload product documentation, embed the chatbot on your support page, route queries to your team only when the AI can't answer.
Code Review Assistant: Create a workflow that takes a GitHub PR diff, sends it to the LLM with a code review prompt, and posts the result as a PR comment.
Step 8: Configure n8n
Navigate to https://n8n.yourdomain.com
- Create admin account
- Settings → Credentials → Add:
- Ollama: Base URL
http://ollama:11434 - Your other integrations (Slack, GitHub, databases)
- Ollama: Base URL
Useful AI Automations to Build
Daily Summary: Every morning, n8n queries your project management tool, sends the list to Ollama for summarization, posts to Slack.
Email Triage: n8n watches your inbox, classifies emails by topic using Ollama, labels important emails and drafts responses for review.
PR Review Bot: Trigger on GitHub PR creation, send diff to Ollama for analysis, post review comments automatically.
Step 9: Set Up Continue.dev for Code Completion
Continue.dev runs in VS Code or JetBrains IDEs and provides AI code completion using your local Ollama models.
Install the Extension
In VS Code:
- Extensions → Search "Continue"
- Install the Continue extension
Configure
Create ~/.continue/config.json:
{
"models": [
{
"title": "Qwen Coder (Local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "https://ai.yourdomain.com:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen Coder Autocomplete",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "https://ai.yourdomain.com:11434"
}
}
Or connect directly to your server's Ollama API from your local machine.
Monitoring Your Stack
Resource Usage
# Overall container stats
docker stats
# Disk usage
docker system df
Uptime Monitoring
Deploy Uptime Kuma alongside your AI stack to monitor all services:
# Add to docker-compose.yml
uptime-kuma:
image: louislam/uptime-kuma:1
container_name: uptime-kuma
ports:
- "127.0.0.1:3002:3001"
volumes:
- uptime-kuma:/app/data
restart: unless-stopped
Add to Caddyfile:
status.yourdomain.com {
reverse_proxy localhost:3002
}
Monitor:
- Ollama API:
http://localhost:11434/api/tags - Open WebUI:
https://ai.yourdomain.com - Dify:
https://dify.yourdomain.com - n8n:
https://n8n.yourdomain.com
Logs
# Follow all logs
docker compose logs -f
# Specific service
docker compose logs -f ollama
Cost Analysis
Commercial AI Tools (5-Person Team, Annual)
| Tool | Annual Cost |
|---|---|
| ChatGPT Plus × 5 | $1,200 |
| GitHub Copilot × 5 | $2,280 |
| Zapier Professional | $588 |
| Pinecone Starter | $120 |
| Total | $4,188 |
Self-Hosted Stack (Annual)
| Component | Server | Annual |
|---|---|---|
| Everything (CPU only) | Hetzner CPX41 (16GB) | $228 |
| Everything (dedicated GPU) | Hetzner GEX44 | ~$1,080 |
| Domain | — | $12 |
| Total (CPU) | $240 |
CPU option savings: $3,948/year. Models run slower but all team services are covered.
GPU option savings: $3,108/year. Models run fast, full production capability.
Backup the Stack
# Database backup (all databases)
docker exec postgres pg_dumpall -U postgres | gzip > /opt/backups/stack-$(date +%Y%m%d).sql.gz
# Automated backup script
cat > /opt/backup-ai-stack.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/backups/ai-stack"
mkdir -p "$BACKUP_DIR"
docker exec postgres pg_dumpall -U postgres | gzip > "$BACKUP_DIR/db-$(date +%Y%m%d).sql.gz"
find "$BACKUP_DIR" -mtime +14 -delete
EOF
chmod +x /opt/backup-ai-stack.sh
(crontab -l 2>/dev/null; echo "0 3 * * * /opt/backup-ai-stack.sh") | crontab -
Find All Stack Components on OSSAlt
Browse all AI tools and alternatives on OSSAlt — compare Ollama, Dify, n8n, Open WebUI, and every other open source AI infrastructure component with deployment guides and feature comparisons.