Open-source alternatives guide
How to Migrate from ChatGPT to Open WebUI + Ollama 2026
ChatGPT Plus costs $20/month and sends your conversations to OpenAI. Open WebUI + Ollama gives you the same chat interface running locally with private.
Why Leave ChatGPT?
ChatGPT Plus costs $20/month ($240/year). Every conversation you have is sent to OpenAI's servers, logged, and potentially used for training. At scale — a team of 10 paying for Plus — that's $2,400/year.
More significantly: if your work involves confidential information, client data, code that can't leave your organization, or anything sensitive, ChatGPT's cloud model creates real risk.
Open WebUI + Ollama solves both problems:
- Cost: Self-hosted LLMs run on your hardware. No monthly subscription.
- Privacy: All conversation data stays on your machine or server. Nothing leaves your network.
- Quality: In 2026, open models like Llama 3.3 70B and Mistral Small 3 24B perform comparably to GPT-4 for most everyday tasks.
This guide walks you through setting up a ChatGPT-equivalent experience that you control.
What You're Building
The stack has two components:
Ollama: A runtime that downloads and serves open source language models locally. Think of it as the backend that loads the AI models and handles inference. Supports 100+ models.
Open WebUI: A web interface that looks and works like ChatGPT. Chat history, conversation management, file uploads, model switching, user accounts, and more. Connects to Ollama as its model backend.
Together they give you a ChatGPT-like interface with complete data privacy.
Hardware Requirements
Your hardware determines which models you can run and how fast they respond.
For Local Use (Your Own Machine)
| Hardware | RAM | Models You Can Run |
|---|---|---|
| MacBook M1/M2/M3 (8GB) | 8GB | 3B-7B models (fast) |
| MacBook M1/M2/M3 (16GB) | 16GB | 7B-13B models (fast) |
| MacBook M3 Pro/Max (36GB) | 36GB | 30B-70B models (fast) |
| PC with NVIDIA RTX 3060 (12GB VRAM) | 12GB | 7B-13B models (GPU-fast) |
| PC with NVIDIA RTX 4090 (24GB VRAM) | 24GB | 30B-70B models (GPU-fast) |
Apple Silicon (M-series) Macs have unified memory — both CPU and GPU share it. This makes Macs excellent for local model inference without a discrete GPU.
For Server/Team Use (VPS)
To serve a team, you need a server with enough RAM for your chosen model:
| Model Size | RAM Required | Hetzner Server | Monthly Cost |
|---|---|---|---|
| 7B (good quality) | 8GB | CPX21 | $6.50 |
| 13B (better quality) | 16GB | CPX31 | $10 |
| 70B (best quality) | 48GB | CCX43 | $54 |
CPU-only inference is slow but works: On a CPX21 without GPU, 7B models generate approximately 5-10 tokens/second — comfortable for interactive use.
GPU servers: For 70B models at useful speed, GPU instances are necessary. GPU cloud servers cost $1-8/hour. Run them on-demand if you don't need 24/7 service.
Step 1: Install Ollama
macOS
# Install via Homebrew
brew install ollama
# Or download from ollama.com
curl -L https://ollama.com/download/ollama-darwin.zip -o ollama.zip
unzip ollama.zip
Linux
curl -fsSL https://ollama.com/install.sh | sh
Ollama installs as a system service and starts automatically.
Windows
Download the installer from ollama.com and run it. Ollama integrates with WSL2 for GPU support.
Verify Installation
ollama --version
# ollama version 0.5.x
Step 2: Download Your First Model
Choose a model based on your hardware:
Recommended Starting Models
For 8GB RAM (lighter models):
ollama pull llama3.2:3b # Meta Llama 3.2 3B — fast, decent quality
ollama pull phi4-mini # Microsoft Phi-4 Mini — surprisingly capable
For 16GB RAM (sweet spot):
ollama pull llama3.1:8b # Meta Llama 3.1 8B — good balance
ollama pull mistral # Mistral 7B v0.3 — fast and capable
ollama pull qwen2.5:7b # Alibaba Qwen 2.5 7B — strong coding
For 32GB+ RAM (high quality):
ollama pull llama3.1:70b # Meta Llama 3.1 70B — near GPT-4 quality
ollama pull qwen2.5:32b # Qwen 2.5 32B — excellent coding
ollama pull mistral-small3 # Mistral Small 3 24B — efficient and capable
Model downloads range from 2GB (3B models) to 40GB (70B models). Download completes to ~/.ollama/models/.
Test Your Model
ollama run llama3.1:8b
# >>> Hello! How can I help you today?
Type a message and press Enter. When the model responds, you've confirmed Ollama is working. Exit with /bye or Ctrl+D.
Step 3: Install Open WebUI
Open WebUI provides the ChatGPT-like interface. Install via Docker:
Docker (Simplest)
# If Ollama is running locally on the same machine
docker run -d \
--name open-webui \
--network=host \
-v open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--restart always \
ghcr.io/open-webui/open-webui:main
Access Open WebUI at http://localhost:8080
Docker Compose (Recommended for Production)
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
volumes:
- ollama_data:/root/.ollama
restart: always
# For GPU support, uncomment:
# deploy:
# resources:
# reservations:
# devices:
# - capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "3000:8080"
volumes:
- open-webui:/app/backend/data
environment:
OLLAMA_BASE_URL: http://ollama:11434
WEBUI_SECRET_KEY: your-secret-key-here
restart: always
depends_on:
- ollama
volumes:
ollama_data:
open-webui:
docker compose up -d
# Pull models via the Ollama container
docker exec ollama ollama pull llama3.1:8b
Configure Open WebUI
- Navigate to http://localhost:3000 (or your server IP)
- Create an admin account (first account automatically becomes admin)
- Open WebUI will auto-detect your Ollama installation
Step 4: Configure to Match Your ChatGPT Workflow
Set Default Model
Go to Settings → Interface → Default Model and select your preferred model. For daily use, Llama 3.1 8B or Mistral 7B work well.
Enable Web Search
Open WebUI supports web search integration — similar to ChatGPT's browsing feature. Configure in Settings → Web Search:
Options:
- SearXNG (self-hosted, fully private): Best for privacy
- Brave Search API (free tier: 2,000 queries/month)
- Tavily (developer-friendly API)
Enable Document Upload (RAG)
Open WebUI has built-in RAG (Retrieval-Augmented Generation) — upload PDFs, text files, and other documents and chat with them.
Configure in Settings → Documents:
- Set chunk size and overlap
- Configure the embedding model (Open WebUI downloads one automatically)
This replicates ChatGPT's "Upload files" feature, but your documents never leave your server.
Connect External API Models (Optional)
If you also want access to cloud models like GPT-4 or Claude alongside local models, add API keys in Settings → Connections:
OpenAI API: sk-...
Anthropic API: sk-ant-...
Open WebUI shows all models (local Ollama + API) in a single model dropdown. Switch between a local Llama model and cloud GPT-4 in the same interface.
Step 5: Migrate Your ChatGPT Habits
Export ChatGPT History (Optional)
If you want to preserve your ChatGPT conversation history:
- ChatGPT → Settings → Data Controls → Export Data
- Wait for email with download link
- Download ZIP containing conversations.json
Open WebUI doesn't have an import function for ChatGPT exports, but you can reference old conversations manually or build a simple script to reformat and import them.
Custom System Prompts
ChatGPT Plus lets you set custom instructions. Open WebUI calls these "System Prompts":
- Settings → Interface → System Prompt
- Add your custom instructions (persona, response style, domain context)
Example system prompt for a coding assistant:
You are an expert programmer. When writing code:
- Default to TypeScript unless another language is specified
- Include error handling
- Add brief comments for non-obvious logic
- Suggest testing approaches when relevant
Conversation Management
Open WebUI mirrors ChatGPT's conversation sidebar:
- Previous conversations organized by date
- Search through conversation history
- Pin important conversations
- Archive or delete old chats
Keyboard Shortcuts
Open WebUI supports many ChatGPT-familiar shortcuts:
- Ctrl/Cmd + Enter: Submit message
- Ctrl/Cmd + Shift + O: New conversation
- ↑: Edit last message
Step 6: Team Setup (Optional)
If you're replacing ChatGPT for a team:
Enable Multi-User Mode
Open WebUI supports multiple user accounts:
- Admin creates accounts via Admin Panel → Users → Add User
- Or enable Open Registration for self-service signup
- Assign roles: Admin, User
Set Usage Policies
In Admin Panel → Settings:
- Limit which models specific user groups can access
- Set rate limits (if needed for resource management)
- Configure shared conversation spaces
Configure Authentication (Team)
For team deployments, integrate with your identity provider:
- OAuth: Google, GitHub, Microsoft, Authentik, Keycloak
- LDAP: Active Directory integration
- SAML: Enterprise SSO
Model Quality Comparison vs ChatGPT
In 2026, the gap between the best open models and ChatGPT has closed significantly for common tasks:
| Task | ChatGPT GPT-4o | Llama 3.1 70B | Mistral Small 3 |
|---|---|---|---|
| General Q&A | Excellent | Excellent | Very Good |
| Code generation | Excellent | Excellent | Excellent |
| Creative writing | Excellent | Very Good | Good |
| Reasoning | Excellent | Very Good | Good |
| Math | Excellent | Very Good | Good |
| Multimodal (images) | Yes | Via llava model | Limited |
| Response speed (local) | Fast (cloud) | Slow (needs hardware) | Fast |
For most everyday tasks — code help, writing assistance, question answering, summarization — local 7B-13B models are sufficient. 70B models approach GPT-4 quality.
Cost Analysis
ChatGPT Plus (Per Year)
| Users | Annual Cost |
|---|---|
| 1 | $240 |
| 5 | $1,200 |
| 10 | $2,400 |
| 25 | $6,000 |
Self-Hosted Open WebUI + Ollama
Personal Mac use: $0 (runs on your existing hardware)
Dedicated server for a team:
| Server | Model Quality | Annual |
|---|---|---|
| Hetzner CPX21 (8GB) | 7B models | $78 |
| Hetzner CPX31 (16GB) | 13B models | $120 |
| Hetzner CPX51 (32GB) | 30B models | $360 |
For a 10-person team: Open WebUI server at $120/year vs ChatGPT Plus at $2,400/year. Savings: $2,280/year.
Extending Your Setup: Pipelines, Tools, and Advanced Features
A working Open WebUI + Ollama setup is the foundation. Once basic chat is running, several extensions transform it from a ChatGPT replacement into a more capable system tuned to your specific use case.
Tool use and function calling. Modern models (Llama 3.1, Qwen 2.5, Mistral Nemo) support function calling — the ability to invoke external tools from within a conversation. Open WebUI's tool framework lets you define custom tools in Python that the model can call: fetch a URL, query a database, check a calendar, look up a stock price. The model decides when to call a tool based on the conversation context, calls it, and incorporates the result into its response. This is what makes ChatGPT's browsing feature work, and it's fully replicable in a self-hosted setup.
Pipelines for pre/post-processing. Open WebUI's Pipelines feature (introduced in late 2024) lets you insert processing steps between user messages and model responses. Common use cases: automatically prepend context to every message from a specific user (their role, their project, their preferences), filter requests that match certain patterns (content moderation, query classification), log all conversations to a database for compliance purposes, or route different question types to different models. Pipelines are Python scripts that run as a sidecar service alongside Open WebUI.
RAG with your own documents. Open WebUI's built-in document upload and RAG (Retrieval-Augmented Generation) allows you to chat with your own files — PDFs, text documents, markdown files. Upload a document and reference it in conversation with #filename, and the model answers based on the document's content rather than only its training data. For teams with internal documentation, this feature replaces ChatGPT's file upload capability while keeping documents entirely on your infrastructure. For more sophisticated RAG pipelines — multiple document collections, custom embedding models, hybrid search — the best open source RAG frameworks in 2026 covers LlamaIndex, AnythingLLM, and RAGFlow as dedicated alternatives.
Model Context Protocol (MCP) integration. MCP is an emerging standard (originally from Anthropic) for connecting AI assistants to external data sources and tools. Ollama models accessed through Open WebUI can use MCP server connections to pull data from GitHub, Jira, Notion, or any MCP-compatible service. This extends the model's context beyond what's in its training data to include live, application-specific information.
Coding assistance integration. Open WebUI and Ollama can serve as the backend for dedicated coding assistant tools. Continue.dev, the open source IDE extension, supports Ollama as a model provider for both tab autocomplete and chat. This means your local model setup powers your editor's AI assistance without any cloud API calls. See the best open source AI coding assistants in 2026 for the full workflow of connecting Ollama to your development environment.
Scaling to a team server. When moving from personal use to a shared team server, the key configuration changes are: enable authentication via Settings → Admin Panel (prevent anonymous access), configure OAuth or LDAP for team SSO (Open WebUI supports Google, Microsoft, GitHub, and OIDC providers), and set usage quotas per user group if you're on resource-constrained hardware. The LocalAI vs Ollama vs LM Studio comparison covers the infrastructure options for serving multiple users from a shared model server.
Find More AI Alternatives
Browse all ChatGPT alternatives on OSSAlt — compare Open WebUI, LibreChat, Jan, AnythingLLM, and every other open source AI interface with deployment guides and model comparisons.
The SaaS-to-Self-Hosted Migration Guide (Free PDF)
Step-by-step: infrastructure setup, data migration, backups, and security for 15+ common SaaS replacements. Used by 300+ developers.
Join 300+ self-hosters. Unsubscribe in one click.