Self-Hosting Grafana for Observability 2026
Grafana is the industry-standard open source observability platform — dashboards, alerting, and data exploration across metrics, logs, and traces. Self-hosting gives you unlimited dashboards, data sources, and users.
Requirements
- VPS with 2 GB RAM minimum (4 GB with Prometheus + Loki)
- Docker and Docker Compose
- Domain name (e.g.,
grafana.yourdomain.com) - 20+ GB disk
Step 1: Create Docker Compose
# docker-compose.yml
services:
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SERVER_ROOT_URL=https://grafana.yourdomain.com
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=your-admin-password
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SMTP_ENABLED=true
- GF_SMTP_HOST=smtp.resend.com:587
- GF_SMTP_USER=resend
- GF_SMTP_PASSWORD=re_your_api_key
- GF_SMTP_FROM_ADDRESS=grafana@yourdomain.com
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
ports:
- "9090:9090"
volumes:
- prometheus_data:/prometheus
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
volumes:
grafana_data:
prometheus_data:
Step 2: Configure Prometheus
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Server metrics
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
# Add your application metrics
# - job_name: 'my-app'
# static_configs:
# - targets: ['my-app:8080']
# metrics_path: /metrics
Step 3: Start the Stack
docker compose up -d
Step 4: Reverse Proxy (Caddy)
# /etc/caddy/Caddyfile
grafana.yourdomain.com {
reverse_proxy localhost:3000
}
sudo systemctl restart caddy
Step 5: Add Data Sources
- Open
https://grafana.yourdomain.com - Login with admin credentials
- Go to Connections → Data sources → Add data source
| Data Source | URL | Use Case |
|---|---|---|
| Prometheus | http://prometheus:9090 | Metrics (CPU, RAM, custom) |
| Loki | http://loki:3100 | Log aggregation |
| PostgreSQL | host:5432 | Database metrics |
| InfluxDB | http://influxdb:8086 | Time series data |
| Elasticsearch | http://elasticsearch:9200 | Logs and search |
| CloudWatch | AWS credentials | AWS metrics |
Step 6: Import Pre-Built Dashboards
Grafana has 1000+ community dashboards at grafana.com/grafana/dashboards.
Essential dashboards to import:
| Dashboard ID | Name | For |
|---|---|---|
| 1860 | Node Exporter Full | Server metrics |
| 3662 | Prometheus Overview | Prometheus health |
| 14055 | Docker Containers | Container metrics |
| 12708 | PostgreSQL | Database metrics |
| 763 | Redis | Redis metrics |
To import:
- Dashboards → New → Import
- Enter the dashboard ID
- Select your data source
- Click Import
Step 7: Create Custom Dashboards
Example: Application metrics dashboard
- Dashboards → New Dashboard → Add visualization
- Select Prometheus data source
- Use PromQL queries:
# CPU usage per container
rate(container_cpu_usage_seconds_total[5m]) * 100
# Memory usage
container_memory_usage_bytes / 1024 / 1024
# HTTP request rate
rate(http_requests_total[5m])
# HTTP error rate (5xx)
rate(http_requests_total{status=~"5.."}[5m])
# Request latency (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
Step 8: Set Up Alerting
- Go to Alerting → Alert rules → New alert rule
- Define your condition:
Example alerts:
| Alert | Condition | Severity |
|---|---|---|
| High CPU | CPU > 80% for 5 min | Warning |
| Disk full | Disk usage > 90% | Critical |
| Service down | Up metric = 0 for 1 min | Critical |
| High error rate | 5xx > 1% of requests | Warning |
| Memory pressure | RAM > 90% for 10 min | Warning |
- Configure Contact points (where alerts go):
- Email (via SMTP)
- Slack webhook
- Discord webhook
- PagerDuty
- Telegram
Step 9: Add Log Aggregation with Loki (Optional)
Add to docker-compose.yml:
loki:
image: grafana/loki:latest
container_name: loki
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- loki_data:/loki
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:latest
container_name: promtail
restart: unless-stopped
volumes:
- /var/log:/var/log:ro
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
Create promtail-config.yml:
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets: [localhost]
labels:
job: varlogs
__path__: /var/log/*.log
Production Hardening
Backups:
# Grafana (dashboards, users, settings)
docker cp grafana:/var/lib/grafana/grafana.db /backups/grafana-$(date +%Y%m%d).db
# Prometheus data (if needed — usually regenerated from exporters)
docker run --rm -v prometheus_data:/data -v /backups:/backup alpine \
tar czf /backup/prometheus-$(date +%Y%m%d).tar.gz /data
Data retention:
# Prometheus: keep 30 days
command: '--storage.tsdb.retention.time=30d'
# Loki: configure retention in loki config
# limits_config:
# retention_period: 744h # 31 days
Updates:
docker compose pull
docker compose up -d
Security:
- Disable sign-up:
GF_USERS_ALLOW_SIGN_UP=false - Set up OIDC auth for team access
- Restrict Prometheus/Loki to internal network
- Use read-only data source connections
Resource Usage
| Stack | RAM | CPU | Disk |
|---|---|---|---|
| Grafana only | 512 MB | 1 core | 5 GB |
| Grafana + Prometheus | 2 GB | 2 cores | 20 GB |
| Full stack (+ Loki) | 4 GB | 4 cores | 50 GB |
VPS Recommendations
| Provider | Spec (full stack) | Price |
|---|---|---|
| Hetzner | 4 vCPU, 8 GB RAM | €8/month |
| DigitalOcean | 2 vCPU, 4 GB RAM | $24/month |
| Linode | 2 vCPU, 4 GB RAM | $24/month |
Why Self-Host Grafana
Grafana Cloud's free tier limits you to 10,000 active series, 50 GB of logs, and 50 GB of traces — limits that a single production application can exceed within days of launch. The Pro plan starts at $8/month but scales linearly with usage. A team monitoring 5 microservices across staging and production can easily hit $50–100/month just in Grafana Cloud costs, before you factor in separate charges for Prometheus-compatible metric storage.
Self-hosted Grafana is free regardless of how many dashboards, users, or data sources you connect. The only cost is your server — a Hetzner CX21 (€3.79/month, 2 vCPU, 2 GB RAM) handles Grafana for teams of up to 20 users with room to spare. Add Prometheus and Loki on the same server and you have a complete observability stack for under €10/month.
Data sovereignty: Grafana Cloud routes your metrics and logs through Grafana Labs' infrastructure. If you're under GDPR, HIPAA, or SOC 2 compliance requirements, self-hosting keeps sensitive telemetry — which may include user IDs, request paths, and business metrics — within your own infrastructure. Financial services, healthcare, and SaaS companies with enterprise customers increasingly find self-hosting non-negotiable for their monitoring stack.
Plugin ecosystem: Grafana has 300+ community plugins. Some — including specialized data source connectors and custom authentication plugins — are only available when self-hosting. Grafana Cloud restricts plugin installation and frequently limits which plugins are available on lower-tier plans.
When NOT to self-host Grafana: If you have fewer than 5 services to monitor, Grafana Cloud's free tier is probably sufficient. The operational overhead of managing your own Grafana instance — updates, backups, SSL, data retention tuning — isn't worth it at small scale. Also consider the managed option if your team has no DevOps capacity; Grafana on a VPS requires active maintenance and is not "set and forget."
Prerequisites (Expanded)
Understanding what each requirement actually means before you start helps avoid mid-deployment surprises.
2 GB RAM minimum (4 GB with Prometheus + Loki): The RAM requirement isn't Grafana itself — Grafana alone runs comfortably in 256 MB. The larger footprint comes from Prometheus, which holds its time-series data in memory for fast querying, and Loki, which buffers log ingestion. For a production stack with all three services, plan for 4 GB minimum. If you're only running Grafana with external data sources (like a hosted Prometheus), 1 GB is workable.
20+ GB disk: Prometheus compresses metrics well, but data still accumulates. With a 30-day retention window and moderate scrape frequency (15s), expect 5–15 GB of Prometheus data. Loki log storage varies dramatically by log verbosity — a chatty application can generate gigabytes per day. Start with 40 GB and monitor disk usage with Grafana's Node Exporter dashboard.
Ubuntu 22.04 LTS is the recommended OS. It ships with the Docker version Compose v2 expects, has extensive community troubleshooting documentation, and receives security patches through 2027. Debian 12 is a solid alternative.
For choosing the right VPS for your observability stack — particularly if you're deciding between Hetzner, DigitalOcean, and Vultr — see the VPS comparison for self-hosters. Network egress pricing matters when exporters are sending metrics from multiple servers.
Production Security Hardening
Grafana with weak security is a significant risk — it has access to metrics from every service you're monitoring, and those metrics often reveal sensitive operational details. Take the following hardening steps seriously.
Firewall with UFW: Expose only the ports that need to be public. Prometheus (9090) and node-exporter (9100) should never be reachable from the internet.
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
Fail2ban: Brute-force attacks on Grafana's login page are common once the service is publicly accessible.
sudo apt install fail2ban
Create /etc/fail2ban/filter.d/grafana.conf:
[Definition]
failregex = logger=context userId=0 orgId=0 uname= t=\S+ level=warn msg="Invalid username or password" \S+ remote_addr=<HOST>
ignoreregex =
Add to /etc/fail2ban/jail.local:
[grafana]
enabled = true
port = http,https
filter = grafana
logpath = /var/log/grafana/grafana.log
maxretry = 5
bantime = 3600
Keep secrets out of docker-compose.yml: Store GF_SECURITY_ADMIN_PASSWORD, SMTP credentials, and any API keys in a .env file that is excluded from version control. Never commit credentials to a Git repository.
Disable SSH password authentication: After confirming key-based SSH access works:
sudo sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sudo systemctl restart sshd
Restrict Prometheus access: Prometheus has no authentication by default. Add a reverse proxy with basic auth in front of it, or restrict it to Docker's internal network only (remove the ports entry from the Compose service definition).
Automatic security updates:
sudo apt install unattended-upgrades
sudo dpkg-reconfigure --priority=low unattended-upgrades
For a complete hardening checklist covering log monitoring, certificate management, and network security for self-hosted services, see the self-hosting security checklist.
Troubleshooting Common Issues
Grafana shows "Data source not found" for Prometheus
The most common cause is a typo in the Prometheus URL. When Grafana and Prometheus are in the same Docker Compose network, the correct URL is http://prometheus:9090 — using the service name, not localhost. If you named the service differently in your Compose file, use that name. Verify connectivity with:
docker exec grafana wget -qO- http://prometheus:9090/-/healthy
Prometheus can't scrape targets — "connection refused"
Check that your scrape target is actually accessible from within the Prometheus container. A common issue is scraping localhost:9100 for node-exporter when they're in different containers. Use the Docker service name (node-exporter:9100) instead. Also check that the target port is not behind a firewall — node-exporter's port 9100 should be reachable within Docker's internal network but not exposed publicly.
Grafana dashboards show "No data" after importing
Dashboard templates reference a data source by name. If your Prometheus data source is named "Prometheus" but the dashboard template expects "prometheus" (lowercase) or "default", queries will return no data. Go to Dashboard settings → Variables and update the data source variable to match your actual data source name.
Container uses too much disk space
Prometheus stores data in prometheus_data Docker volume. If you didn't set a retention policy, it accumulates indefinitely. Check volume size with docker system df -v. Add --storage.tsdb.retention.time=30d to your Prometheus command flags to limit retention. For Loki, configure retention_period in the Loki config file.
Alerts fire but notifications aren't being sent
Test your notification channel from Alerting → Contact points → Test. If the test succeeds but real alerts don't notify, check that your alert rule's evaluation group matches the correct contact point. Also verify the SMTP credentials if using email alerts — authentication failures are logged in Grafana's logs:
docker compose logs grafana | grep -i smtp
Grafana is slow or unresponsive under load
Grafana itself is lightweight. Slowness usually originates from expensive PromQL queries running against large Prometheus datasets. Use the Query Inspector in dashboard edit mode to measure query execution time. Add rate() intervals that match your scrape interval, and avoid querying more time range than needed. For very large datasets, consider Thanos or Cortex as a Prometheus long-term storage backend.
Extending Grafana with Additional Data Sources
Grafana's power comes from its breadth of data source integrations. Beyond Prometheus and Loki, the plugin ecosystem connects Grafana to dozens of specialized data stores.
InfluxDB is a popular time-series database that pairs naturally with Grafana — many IoT and hardware monitoring setups use InfluxDB + Grafana + Telegraf as an alternative to the Prometheus stack. PostgreSQL and MySQL can be queried directly in Grafana, making it useful for business dashboards that pull from application databases. Elasticsearch integrations bring log search capabilities that complement Loki for cases where full-text search is needed.
Alertmanager integration is where Grafana's alerting capabilities fully emerge. Rather than managing alert rules and routing separately in Alertmanager, Grafana's unified alerting (introduced in Grafana 9+) centralizes alert definitions, silences, and routing in the Grafana UI. Contact points can route to Slack, PagerDuty, email, OpsGenie, Telegram, or any webhook endpoint. Alert rules can reference any configured data source, so you can alert on Prometheus metrics, Loki log patterns, and database queries from the same interface.
Dashboard sharing deserves mention for teams. Grafana supports public dashboards (accessible without login) for sharing infrastructure status with stakeholders, embedded panels in other web applications via iframe, and snapshot URLs that capture a dashboard's current state for sharing in incident reports. These features make Grafana useful not just for monitoring but as a reporting and communication tool.
For a broader view of observability tooling, see the best open source monitoring tools — Netdata, Zabbix, and Uptime Kuma each serve different niches that complement Grafana's dashboarding layer.
Set up automated server backups with restic to protect your Grafana dashboards and Prometheus data.
Compare monitoring and observability tools on OSSAlt — features, data sources, and pricing side by side.
See open source alternatives to Grafana on OSSAlt.