The Complete Self-Hosted AI Stack in 2026

The Case for a Self-Hosted AI Stack

The typical AI tooling bill for a 5-person development team in 2026:

ChatGPT Plus (5 users): $1,200/year
GitHub Copilot (5 users): $2,280/year
Claude Pro (5 users): $1,500/year
Midjourney (5 users): $600/year
Total: ~$5,580/year

A self-hosted AI stack running on a $35/month dedicated server: $420/year — with no per-query limits, no data leaving your network, and the ability to run models fine-tuned on your specific domain.

This guide covers building a production-ready self-hosted AI infrastructure using the best open source tools in 2026.

Stack Overview

┌─────────────────────────────────────────────────┐
│                  Reverse Proxy (Caddy)          │
│           HTTPS + subdomain routing             │
└────────┬───────────────────────────────────────┘
         │
    ┌────┴────────────────────────────────────┐
    │                                         │
┌───▼────┐  ┌──────────┐  ┌────────────┐  ┌──▼─────┐
│  Open  │  │   Dify   │  │    n8n     │  │Continue│
│  WebUI │  │          │  │            │  │  .dev  │
│(Chat)  │  │(RAG/Apps)│  │(Automation)│  │ (Code) │
└───┬────┘  └────┬─────┘  └─────┬──────┘  └────────┘
    │             │              │
    └─────────────┼──────────────┘
                  │
    ┌─────────────▼──────────────┐
    │         Ollama             │
    │  (Local LLM serving)       │
    │  Llama 3.1, Mistral, Qwen  │
    └────────────────────────────┘
         │              │
    ┌────▼────┐   ┌──────▼──────┐
    │ Weaviate│   │ PostgreSQL  │
    │(Vector) │   │+ pgvector   │
    └─────────┘   └─────────────┘

Component Roles

Component	Purpose	Alternative to
Ollama	Run local LLMs	OpenAI API
Open WebUI	ChatGPT-like interface	ChatGPT
Dify	AI app builder + RAG	Dify.ai Cloud, LangChain
n8n	Workflow automation	Zapier, Make
Continue.dev	AI code completion	GitHub Copilot
Weaviate	Vector database	Pinecone
PostgreSQL	Application database	Supabase
Caddy	HTTPS reverse proxy	Nginx + Certbot

Hardware Requirements

Minimum Stack (Text-Only, CPU Inference)

Component	Server	Monthly
Everything	Hetzner CPX31 (8GB, 4 cores)	$10
Model quality	7B parameters max	—
Inference speed	5-10 tokens/sec	—

CPU inference works. It's slow for 7B+ models but functional for non-latency-sensitive workloads.

Recommended Stack (GPU Inference)

Component	Server	Monthly
AI + services	Hetzner GEX44 (RTX 4000, 20GB VRAM)	~$90
Or split: services	Hetzner CPX31 (8GB)	$10
+ AI inference	RunPod RTX 4090 (24GB)	~$0.74/hr
Model quality	70B parameters	—
Inference speed	30-80 tokens/sec	—

Best cost-efficient approach: CPU server for Dify, n8n, Open WebUI + Weaviate ($10/month) plus a GPU instance you start/stop on-demand for heavy AI tasks.

Local Machine Option

For individual developers:

MacBook M3 Pro (36GB): Runs 70B models at 15-25 tokens/sec. No server cost.
PC with RTX 4090 (24GB VRAM): Runs 30-70B models fast.

This guide focuses on server deployment for teams.

Step 1: Provision and Configure Server

# On your server (example: Hetzner CPX41, 16GB, 6 cores)
sudo apt update && sudo apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Install Caddy for HTTPS
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy

Point DNS records to your server IP before proceeding:

ai.yourdomain.com → your server IP (Open WebUI)
dify.yourdomain.com → your server IP
n8n.yourdomain.com → your server IP

Step 2: Deploy the Core Stack

Create a workspace directory:

mkdir -p /opt/ai-stack && cd /opt/ai-stack

Create docker-compose.yml:

services:
  # ===========================
  # Local LLM Engine
  # ===========================
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped
    # Uncomment for NVIDIA GPU:
    # runtime: nvidia
    # environment:
    #   - NVIDIA_VISIBLE_DEVICES=all

  # ===========================
  # Chat Interface
  # ===========================
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    ports:
      - "127.0.0.1:3000:8080"
    volumes:
      - open-webui:/app/backend/data
    environment:
      OLLAMA_BASE_URL: http://ollama:11434
      WEBUI_SECRET_KEY: ${OPENWEBUI_SECRET}
    restart: unless-stopped
    depends_on:
      - ollama

  # ===========================
  # AI App Builder + RAG
  # ===========================
  dify-api:
    image: langgenius/dify-api:latest
    container_name: dify-api
    ports:
      - "127.0.0.1:5001:5001"
    environment:
      MODE: api
      SECRET_KEY: ${DIFY_SECRET}
      DATABASE_URL: postgresql://dify:${DB_PASSWORD}@db:5432/dify
      REDIS_URL: redis://redis:6379
      STORAGE_TYPE: local
    volumes:
      - dify_storage:/app/api/storage
    depends_on:
      - db
      - redis
    restart: unless-stopped

  dify-worker:
    image: langgenius/dify-api:latest
    container_name: dify-worker
    environment:
      MODE: worker
      SECRET_KEY: ${DIFY_SECRET}
      DATABASE_URL: postgresql://dify:${DB_PASSWORD}@db:5432/dify
      REDIS_URL: redis://redis:6379
      STORAGE_TYPE: local
    volumes:
      - dify_storage:/app/api/storage
    depends_on:
      - db
      - redis
    restart: unless-stopped

  dify-web:
    image: langgenius/dify-web:latest
    container_name: dify-web
    ports:
      - "127.0.0.1:3001:3000"
    environment:
      NEXT_PUBLIC_API_PREFIX: https://dify.yourdomain.com/api
    restart: unless-stopped

  # ===========================
  # Workflow Automation
  # ===========================
  n8n:
    image: n8nio/n8n:latest
    container_name: n8n
    ports:
      - "127.0.0.1:5678:5678"
    environment:
      DB_TYPE: postgresdb
      DB_POSTGRESDB_HOST: db
      DB_POSTGRESDB_DATABASE: n8n
      DB_POSTGRESDB_USER: n8n
      DB_POSTGRESDB_PASSWORD: ${DB_PASSWORD}
      N8N_ENCRYPTION_KEY: ${N8N_SECRET}
      N8N_HOST: n8n.yourdomain.com
      WEBHOOK_URL: https://n8n.yourdomain.com/
    volumes:
      - n8n_data:/home/node/.n8n
    depends_on:
      - db
    restart: unless-stopped

  # ===========================
  # Vector Database
  # ===========================
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:latest
    container_name: weaviate
    ports:
      - "127.0.0.1:8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
      DEFAULT_VECTORIZER_MODULE: none
      CLUSTER_HOSTNAME: node1
    volumes:
      - weaviate_data:/var/lib/weaviate
    restart: unless-stopped

  # ===========================
  # Shared Database + Cache
  # ===========================
  db:
    image: postgres:16-alpine
    container_name: postgres
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_MULTIPLE_DATABASES: "dify,n8n"
    volumes:
      - pg_data:/var/lib/postgresql/data
      - ./init-multi-db.sh:/docker-entrypoint-initdb.d/init-multi-db.sh
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    container_name: redis
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  ollama_data:
  open-webui:
  dify_storage:
  n8n_data:
  weaviate_data:
  pg_data:
  redis_data:

Create .env:

# Database password (use for all services)
DB_PASSWORD=GenerateAStrongPassword123

# Secrets (generate each with: openssl rand -base64 32)
OPENWEBUI_SECRET=secret1
DIFY_SECRET=secret2
N8N_SECRET=secret3

Create init-multi-db.sh (initializes multiple PostgreSQL databases):

#!/bin/bash
set -e

psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
    CREATE DATABASE dify;
    CREATE USER dify WITH PASSWORD '$POSTGRES_PASSWORD';
    GRANT ALL PRIVILEGES ON DATABASE dify TO dify;

    CREATE DATABASE n8n;
    CREATE USER n8n WITH PASSWORD '$POSTGRES_PASSWORD';
    GRANT ALL PRIVILEGES ON DATABASE n8n TO n8n;
EOSQL

chmod +x init-multi-db.sh

Step 3: Start the Stack

docker compose up -d

Monitor all containers:

docker compose ps
docker compose logs -f

Initial startup takes 3-10 minutes as databases initialize and container images are pulled.

Step 4: Configure Caddy for HTTPS

Create /etc/caddy/Caddyfile:

# Open WebUI - Chat Interface
ai.yourdomain.com {
    reverse_proxy localhost:3000
}

# Dify - AI App Builder
dify.yourdomain.com {
    handle /api/* {
        reverse_proxy localhost:5001
    }
    handle {
        reverse_proxy localhost:3001
    }
}

# n8n - Workflow Automation
n8n.yourdomain.com {
    reverse_proxy localhost:5678
}

sudo systemctl restart caddy

Caddy automatically obtains SSL certificates for all configured domains.

Step 5: Download AI Models

Pull models into Ollama:

# Efficient general-purpose model (best quality per RAM)
docker exec ollama ollama pull llama3.1:8b

# For code assistance
docker exec ollama ollama pull qwen2.5-coder:7b

# For embeddings (required for RAG in Dify and Open WebUI)
docker exec ollama ollama pull nomic-embed-text

# If you have 16GB+ RAM: better quality model
docker exec ollama ollama pull qwen2.5:14b

Step 6: Configure Open WebUI

Navigate to https://ai.yourdomain.com

Create admin account (first user becomes admin)
Settings → Models: Verify Ollama models appear
Settings → Web Search: Configure SearXNG or Brave API for web search
Admin Panel → Users: Enable open registration or manage user accounts

Connect to External APIs (Optional)

If you also want cloud model access alongside local models:

Settings → Connections → OpenAI:

Enter your OpenAI API key
Open WebUI shows both local (Ollama) and cloud models in one dropdown

Step 7: Set Up Dify

Navigate to https://dify.yourdomain.com

Create admin account
Settings → Model Provider → Ollama:
- Base URL: http://ollama:11434
- Add models: llama3.1:8b, nomic-embed-text
Knowledge → Create knowledge base:
- Upload your documentation, PDFs, wikis
- Select nomic-embed-text as embedding model
- This creates a searchable vector index
Build your first AI app:
- Studio → Create App → Chatbot
- Attach the knowledge base for RAG
- Deploy as an API or embedded widget

Example Use Cases for Dify

Internal Documentation Bot: Upload all your internal docs and wikis to a knowledge base. Create a chatbot that answers questions about your company's processes.

Customer FAQ Bot: Upload product documentation, embed the chatbot on your support page, route queries to your team only when the AI can't answer.

Code Review Assistant: Create a workflow that takes a GitHub PR diff, sends it to the LLM with a code review prompt, and posts the result as a PR comment.

Step 8: Configure n8n

Navigate to https://n8n.yourdomain.com

Create admin account
Settings → Credentials → Add:
- Ollama: Base URL http://ollama:11434
- Your other integrations (Slack, GitHub, databases)

Useful AI Automations to Build

Daily Summary: Every morning, n8n queries your project management tool, sends the list to Ollama for summarization, posts to Slack.

Email Triage: n8n watches your inbox, classifies emails by topic using Ollama, labels important emails and drafts responses for review.

PR Review Bot: Trigger on GitHub PR creation, send diff to Ollama for analysis, post review comments automatically.

Step 9: Set Up Continue.dev for Code Completion

Continue.dev runs in VS Code or JetBrains IDEs and provides AI code completion using your local Ollama models.

Install the Extension

In VS Code:

Extensions → Search "Continue"
Install the Continue extension

Configure

Create ~/.continue/config.json:

{
  "models": [
    {
      "title": "Qwen Coder (Local)",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "https://ai.yourdomain.com:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen Coder Autocomplete",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "https://ai.yourdomain.com:11434"
  }
}

Or connect directly to your server's Ollama API from your local machine.

Monitoring Your Stack

Resource Usage

# Overall container stats
docker stats

# Disk usage
docker system df

Uptime Monitoring

Deploy Uptime Kuma alongside your AI stack to monitor all services:

# Add to docker-compose.yml
uptime-kuma:
  image: louislam/uptime-kuma:1
  container_name: uptime-kuma
  ports:
    - "127.0.0.1:3002:3001"
  volumes:
    - uptime-kuma:/app/data
  restart: unless-stopped

Add to Caddyfile:

status.yourdomain.com {
    reverse_proxy localhost:3002
}

Monitor:

Ollama API: http://localhost:11434/api/tags
Open WebUI: https://ai.yourdomain.com
Dify: https://dify.yourdomain.com
n8n: https://n8n.yourdomain.com

Logs

# Follow all logs
docker compose logs -f

# Specific service
docker compose logs -f ollama

Cost Analysis

Commercial AI Tools (5-Person Team, Annual)

Tool	Annual Cost
ChatGPT Plus × 5	$1,200
GitHub Copilot × 5	$2,280
Zapier Professional	$588
Pinecone Starter	$120
Total	$4,188

Self-Hosted Stack (Annual)

Component	Server	Annual
Everything (CPU only)	Hetzner CPX41 (16GB)	$228
Everything (dedicated GPU)	Hetzner GEX44	~$1,080
Domain	—	$12
Total (CPU)		$240

CPU option savings: $3,948/year. Models run slower but all team services are covered.

GPU option savings: $3,108/year. Models run fast, full production capability.

Backup the Stack

# Database backup (all databases)
docker exec postgres pg_dumpall -U postgres | gzip > /opt/backups/stack-$(date +%Y%m%d).sql.gz

# Automated backup script
cat > /opt/backup-ai-stack.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/opt/backups/ai-stack"
mkdir -p "$BACKUP_DIR"
docker exec postgres pg_dumpall -U postgres | gzip > "$BACKUP_DIR/db-$(date +%Y%m%d).sql.gz"
find "$BACKUP_DIR" -mtime +14 -delete
EOF
chmod +x /opt/backup-ai-stack.sh
(crontab -l 2>/dev/null; echo "0 3 * * * /opt/backup-ai-stack.sh") | crontab -

Find All Stack Components on OSSAlt

Browse all AI tools and alternatives on OSSAlt — compare Ollama, Dify, n8n, Open WebUI, and every other open source AI infrastructure component with deployment guides and feature comparisons.

Comments