Use Dify if: You want to build AI-powered apps without writing a custom backend You need RAG over internal documents without sending them to OpenAI's servers Your team includes non-engineers who need to modify AI prompts and workflows You want to compare LLM providers side-by-side with the same workflow You're building on local models (Ollama) for complete privacy Skip Dify if: You only need a simple chatbot with no workflow logic (use Open WebUI directly) You need extreme customization of LangC

Dify: Self-Hosted AI Agents Without Code 2026

Q: What You Can Build?

Chatbots — Conversational apps with persistent memory, session history, and context management. Connect to any LLM provider, tune system prompts, and set context window limits. Deploy as a hosted chat UI or embed via iframe. AI Workflows — Multi-step pipelines using a visual node editor. Nodes include: LLM (query a model), Code (Python/JS sandbox), HTTP Request (call external APIs), Knowledge Retrieval (RAG lookup), Tool (web search, calculator, etc.), Agent (autonomous reasoning), Conditional B

Dify crossed 100,000 GitHub stars in October 2025 — making it one of the fastest-growing AI infrastructure repos ever. It lets you build chatbots, AI workflows, autonomous agents, and RAG pipelines through a visual editor, then expose them as REST APIs. The self-hosted Community Edition is completely free with no usage limits, no per-message credits, and no data leaving your server.

If you've been paying for OpenAI Assistants, LangSmith, or PromptLayer for prototyping — or building LLM app infrastructure from scratch — Dify is worth 30 minutes of your time.

TL;DR

Dify is the most polished no-code AI application builder available for self-hosting in 2026. It handles the entire LLM stack: model provider integrations (OpenAI, Anthropic, Ollama, DeepSeek, and dozens more), RAG over your documents, visual workflow builder with Python/JS code nodes, agent reasoning strategies, and built-in chat UI. The Community Edition self-hosted is fully free. The trade-off vs. Flowise: Dify runs 11 Docker containers and needs 4 GB RAM minimum. For simple use cases, Flowise is lighter.

Key Takeaways

100,000+ GitHub stars (crossed Oct 2025); Apache 2.0 license with minor commercial restrictions
v1.10.x current (2026); v1.0.0 introduced the plugin ecosystem in 2025
Supported LLMs: OpenAI, Anthropic, Gemini, DeepSeek, Llama, Mistral, Ollama (local), any OpenAI-compatible endpoint
Self-hosted Community Edition: no credit limits, no per-message fees, no data sent to Dify servers
11 Docker containers: API, worker, web, plugin daemon, PostgreSQL, Redis, Weaviate, Nginx, sandbox, SSRF proxy, worker beat
Minimum requirements: 4 GB RAM, 2 CPU cores; 8 GB+ recommended for production with large knowledge bases

What You Can Build

Chatbots — Conversational apps with persistent memory, session history, and context management. Connect to any LLM provider, tune system prompts, and set context window limits. Deploy as a hosted chat UI or embed via iframe.

AI Workflows — Multi-step pipelines using a visual node editor. Nodes include: LLM (query a model), Code (Python/JS sandbox), HTTP Request (call external APIs), Knowledge Retrieval (RAG lookup), Tool (web search, calculator, etc.), Agent (autonomous reasoning), Conditional Branch, Loop, and Variable Aggregator. Build a pipeline that ingests a PDF, extracts data, transforms it with Python, queries a database, and returns a formatted report — all without writing a backend.

Autonomous Agents — The Agent Node gives an LLM planning capability using strategies like Chain-of-Thought, Tree-of-Thought, Graph-of-Thought, or BoT. The agent decides which tools to use, executes them, observes results, and iterates. Connect tools for web search, code execution, file reading, and custom HTTP endpoints.

RAG Pipelines — Upload documents (PDF, Word, PPT, Markdown, plain text, URLs), configure chunking and embedding, and query them with hybrid retrieval. Dify handles the full pipeline: ingestion → chunking → embedding → vector storage → retrieval → reranking. Built-in support for Weaviate (bundled), Qdrant, Milvus, Pinecone, pgvector, and Chroma.

APIs — Every Dify app generates a REST API endpoint with an API key. Use Dify as a Backend-as-a-Service layer: your front-end or other services call the Dify API, and Dify handles the LLM orchestration. OpenAI-compatible API format available for drop-in replacement.

Docker Compose Setup

# Clone the repository
git clone https://github.com/langgenius/dify.git
cd dify/docker

# Copy environment config
cp .env.example .env

# Edit .env — at minimum, set a secure SECRET_KEY
# SECRET_KEY=$(openssl rand -base64 42)

# Start all services
docker compose up -d

Access Dify at http://localhost (port 80 via Nginx). Create your admin account on first visit.

The 11 containers started:

# Core application
api:        # Flask backend (langgenius/dify-api)
worker:     # Celery async worker
worker_beat: # Celery scheduler
web:        # Next.js frontend (langgenius/dify-web)
plugin_daemon: # Plugin execution sandbox

# Infrastructure
db:         # PostgreSQL (primary database)
redis:      # Task queue + caching
weaviate:   # Bundled vector database
nginx:      # Reverse proxy (ports 80/443)
ssrf_proxy: # Outbound HTTP proxy (SSRF protection)
sandbox:    # Isolated code execution for Code nodes

Production .env settings to configure:

# Required: generate a strong secret key
SECRET_KEY=your-generated-secret-key-here

# Set your domain for cookie security
CONSOLE_API_URL=https://dify.yourdomain.com
APP_API_URL=https://dify.yourdomain.com

# Storage: local (default) or S3/Azure/GCS for file uploads
STORAGE_TYPE=local
# For S3:
# STORAGE_TYPE=s3
# S3_BUCKET_NAME=your-bucket
# AWS_ACCESS_KEY_ID=...
# AWS_SECRET_ACCESS_KEY=...

# File size limits
UPLOAD_FILE_SIZE_LIMIT=50   # MB
UPLOAD_IMAGE_FILE_SIZE_LIMIT=10

Connecting LLM Providers

After first login: Settings → Model Provider → add providers.

For cloud models (OpenAI, Anthropic, etc.), paste your API key. The model appears immediately in the workflow editor's model selector.

For local models via Ollama:

# First, set up Ollama (if not already running)
docker run -d \
  --name ollama \
  --gpus all \
  -p 11434:11434 \
  -v ollama:/root/.ollama \
  ollama/ollama

# Pull a model
docker exec ollama ollama pull qwen2.5:14b

In Dify Settings → Model Provider → Ollama:

Base URL: http://host.docker.internal:11434 (macOS/Windows) or http://172.17.0.1:11434 (Linux Docker host IP)
Select models to use from the ones pulled in Ollama

This gives you fully private, offline LLM inference — no API keys, no data sent anywhere.

Building a RAG Knowledge Base

RAG (Retrieval Augmented Generation) is where Dify shines. Building a document Q&A system takes under 10 minutes:

Knowledge tab → Create Knowledge Base
Upload documents (PDF, Word, Markdown, URLs — up to 50 MB per file)
Indexing mode: Economy (keyword only) or High Quality (embeddings, requires embedding API key)
Retrieval setting: Semantic Search (vector), Full-text Search, or Hybrid (recommended)
Enable Reranking for better result quality (requires a reranker API like Cohere)
Set Top K (how many chunks to retrieve) and score threshold

Once indexed, attach the knowledge base to any chatbot or workflow node. The Knowledge Retrieval node in workflows takes a query string and returns the top matching chunks.

For fully local RAG (no external APIs):

Use Ollama for both the chat LLM and embedding model
Dify supports local embedding via Ollama's embedding endpoint
All data stays on your server

Dify vs Flowise vs LangFlow

Dimension	Dify	Flowise	LangFlow
Target user	Business users + devs	Developers	Python developers
No-code UX	Best-in-class	Good	Steeper curve
Debug tooling	Full trace logs, version history	Minimal	Moderate
Nested workflows	Yes (loops, branches, sub-flows)	Limited	Yes
Plugin ecosystem	Yes (marketplace, v1.0+)	No	No
RAG	Built-in, rich	Plugin-based	Plugin-based
Resource usage	4 GB+ RAM (11 containers)	~1 GB RAM	~2 GB RAM
Setup complexity	Moderate	Very simple	Moderate
License	Apache 2.0 + restrictions	MIT	MIT
Enterprise SSO	Yes (paid)	No	Limited

Choose Dify if: You want the most polished builder with full MLOps observability, rich debugging, and the ability to let non-engineers build and deploy AI apps. Dify's workflow editor is genuinely better than its competitors.

Choose Flowise if: You want a lightweight single-container deployment with minimal setup. Flowise is the fastest path from zero to a working LangChain/LlamaIndex pipeline.

Choose LangFlow if: You're a Python developer who needs to modify component internals and want full code-level control over the pipeline.

The Plugin Ecosystem (v1.0+)

Dify's v1.0.0 release introduced a plugin marketplace — tools, model providers, and agent strategies installable like browser extensions:

Tools: web search, code execution, image generation, file operations, API connectors
Model providers: new providers added via plugin (no Dify version upgrade required)
Agent strategies: custom reasoning modules (beyond built-in CoT/ToT)
Extensions: custom integrations for Slack, Notion, GitHub, Google Drive

Install from the Marketplace (built into the UI) or via plugin URL. Community-contributed plugins follow the same sandbox architecture as built-in tools.

Exposing Dify as an API

Every app gets an API endpoint accessible via the Dify backend URL:

import requests

# Chat with a Dify chatbot app
response = requests.post(
    "http://your-dify-server/v1/chat-messages",
    headers={
        "Authorization": "Bearer your-app-api-key",
        "Content-Type": "application/json"
    },
    json={
        "inputs": {},
        "query": "Summarize the Q3 financial report",
        "response_mode": "blocking",
        "conversation_id": "",
        "user": "user-123"
    }
)

print(response.json()["answer"])

For streaming responses (real-time output):

json={
    "response_mode": "streaming",  # Returns SSE stream
    ...
}

The OpenAI-compatible API lets you swap in Dify for any app already using the OpenAI SDK — just change the base_url to your Dify server and the api_key to your app's API key.

MCP Protocol Support

Dify added HTTP-based MCP (Model Context Protocol, spec 2025-03-26) support in 2025. This means:

External MCP clients (Claude Desktop, other MCP hosts) can invoke Dify workflows as tools
Dify agents can consume external MCP servers as tools
Interoperability with the growing MCP ecosystem (GitHub, filesystem, databases) without custom integration code

This is significant for homelab and enterprise deployments where you want Dify to serve as a central AI orchestration layer that other agents and tools connect to.

Self-Hosted vs. Dify Cloud

	Community Edition (Self-Hosted)	Cloud Professional	Cloud Team
Price	Free	$59/month	$159/month
Message credits	Unlimited	5,000/month	10,000/month
Apps	Unlimited	50	Unlimited
Vector storage	Unlimited (your disk)	5 GB	20 GB
Documents	Unlimited	500	1,000
SSO/SAML	Enterprise license	No	Yes
Data residency	Your server	Dify servers	Dify servers

For privacy-sensitive use cases — medical records, legal documents, proprietary code — self-hosted Community Edition is the only option that keeps data on your infrastructure. The unlimited usage is a genuine advantage over the credit-based cloud tiers.

When to Use Dify

Use Dify if:

You want to build AI-powered apps without writing a custom backend
You need RAG over internal documents without sending them to OpenAI's servers
Your team includes non-engineers who need to modify AI prompts and workflows
You want to compare LLM providers side-by-side with the same workflow
You're building on local models (Ollama) for complete privacy

Skip Dify if:

You only need a simple chatbot with no workflow logic (use Open WebUI directly)
You need extreme customization of LangChain/LlamaIndex pipeline internals (use LangFlow)
Your VPS has under 4 GB RAM (use Flowise instead)
You need enterprise SSO without paying for the enterprise license

Browse all AI agent alternatives at OSSAlt. Related: Activepieces vs n8n automation comparison, self-hosted LLM guide with DeepSeek and Qwen.

How to Keep a Private AI Stack Useful After Launch

The hard part of a self-hosted AI stack is not getting the first model to answer a prompt. The hard part is building a system people continue to trust after the novelty fades. That means choosing a narrow set of approved models, documenting which one is the default for chat, extraction, and coding, and instrumenting latency so users know whether a bad answer came from the model itself or from an overloaded GPU. Teams that skip this governance stage often end up with a chaotic playground: five half-configured models, two abandoned vector stores, and nobody certain which workflow should be used for production tasks. A better pattern is to define tiers. Use a fast local model for internal drafting, a stronger model for longer-form reasoning, and a deterministic workflow layer for retrieval, approvals, and handoff.

This is also why adjacent tooling matters more than model benchmarks suggest. Dify guide is useful when you need repeatable workflows, prompt versioning, and API exposure rather than just a chat box. n8n guide matters because many valuable AI automations are not conversational at all; they are document triage, summarization, enrichment, and notification chains triggered by ordinary business events. And Authentik guide closes a gap that many AI teams ignore: once the stack contains internal docs, tickets, and customer data, you need role-aware access and auditability instead of a shared admin password on a sidecar dashboard.

Where Self-Hosted AI Wins and Where It Still Does Not

Self-hosted AI clearly wins when privacy, marginal cost, and workflow control dominate the decision. It is hard to justify sending internal runbooks, legal drafts, or product strategy documents to a third-party model API if a competent local setup handles the workload acceptably. The economics are also favorable for high-volume teams. Once the hardware is purchased or rented, the per-query cost becomes predictable, and experimentation becomes cheaper because nobody is afraid of API burn from testing prompts and embeddings. That changes behavior. Teams iterate more, keep more institutional knowledge in retrieval systems, and are more willing to build automations around routine analysis.

Where self-hosted AI still loses is turnkey convenience at the very top end of model quality. Frontier hosted models remain easier to access and often stronger for ambiguous reasoning, multimodal synthesis, and long-context work. The mature way to handle this is not ideology. It is workload routing. Keep sensitive, repetitive, and operationally embedded tasks on your infrastructure. Reserve external APIs for the few cases where a measurable quality gap justifies the trade-off. Articles on self-hosted AI are stronger when they acknowledge that split, because that is how experienced teams actually deploy these systems.

The SaaS-to-Self-Hosted Migration Guide (Free PDF)