Grafana + Prometheus + Loki: Self-Hosted Observability Stack 2026
TL;DR
The Grafana + Prometheus + Loki stack is the open source equivalent of Datadog or New Relic — complete observability for your servers and containers with no per-seat or per-host pricing. Prometheus collects metrics, Loki aggregates logs, and Grafana visualizes everything on dashboards. Setup takes ~30 minutes with Docker Compose. Pre-built dashboards for common scenarios are available at grafana.com/grafana/dashboards.
Key Takeaways
- Prometheus (~55K stars): Pulls metrics from your services and stores them as time-series data
- Grafana (~62K stars): Dashboards and visualization for Prometheus + Loki + 40+ other data sources
- Loki (~23K stars): Log aggregation — like Prometheus but for logs
- Resource usage: Full stack runs on ~512MB RAM; add node_exporter and cAdvisor for complete coverage
- No per-host pricing: Unlike Datadog ($15–35/host/month), this runs for server cost only
- Pre-built dashboards: Import Node Exporter Full (ID: 1860) for instant host visibility
Stack Overview
Your servers/containers
↓ metrics exposed at /metrics
node_exporter ← host CPU/RAM/disk/network
cAdvisor ← Docker container metrics
Your app ← custom Prometheus metrics
↓ Prometheus scrapes every 15s
Prometheus ← stores time-series metrics (15-day default retention)
↓ Promtail ships logs
Promtail ← reads Docker container logs → sends to Loki
Loki ← stores and indexes logs by labels
↑ Grafana queries both
Grafana ← dashboards, alerting, visualization
Docker Compose: Full Stack
# docker-compose.yml
version: '3.8'
volumes:
prometheus_data:
grafana_data:
loki_data:
networks:
monitoring:
driver: bridge
services:
# ─── Prometheus (metrics store) ────────────────────────────────────
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d' # Keep 30 days of metrics
- '--web.enable-lifecycle' # Allow hot-reload
ports:
- "9090:9090"
networks:
- monitoring
# ─── Node Exporter (host metrics) ──────────────────────────────────
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
ports:
- "9100:9100"
networks:
- monitoring
# ─── cAdvisor (Docker container metrics) ───────────────────────────
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: unless-stopped
privileged: true
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- "8080:8080"
networks:
- monitoring
# ─── Loki (log storage) ────────────────────────────────────────────
loki:
image: grafana/loki:latest
container_name: loki
restart: unless-stopped
volumes:
- ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
- loki_data:/loki
command: -config.file=/etc/loki/local-config.yaml
ports:
- "3100:3100"
networks:
- monitoring
# ─── Promtail (log shipper) ────────────────────────────────────────
promtail:
image: grafana/promtail:latest
container_name: promtail
restart: unless-stopped
volumes:
- ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
command: -config.file=/etc/promtail/config.yml
networks:
- monitoring
# ─── Grafana (dashboards) ──────────────────────────────────────────
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_SERVER_ROOT_URL=https://grafana.yourdomain.com
- GF_SMTP_ENABLED=true # Optional: alerting emails
- GF_SMTP_HOST=${SMTP_HOST}
- GF_SMTP_USER=${SMTP_USER}
- GF_SMTP_PASSWORD=${SMTP_PASSWORD}
- GF_SMTP_FROM_ADDRESS=grafana@yourdomain.com
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
ports:
- "3000:3000"
networks:
- monitoring
depends_on:
- prometheus
- loki
# .env
GRAFANA_PASSWORD=your-strong-password
SMTP_HOST=smtp.yourdomain.com:587
SMTP_USER=grafana@yourdomain.com
SMTP_PASSWORD=your-smtp-password
Configuration Files
prometheus/prometheus.yml
global:
scrape_interval: 15s # How often to scrape targets
evaluation_interval: 15s # How often to evaluate alert rules
rule_files:
- /etc/prometheus/alerts.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# Add your own services:
- job_name: 'myapp'
static_configs:
- targets: ['myapp:8000'] # If your app exposes /metrics
metrics_path: '/metrics'
prometheus/alerts.yml
groups:
- name: infrastructure
rules:
- alert: HighCPULoad
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU is {{ $value | humanize }}% for 5 minutes"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.instance }}"
description: "Memory is {{ $value | humanize }}% used"
- alert: DiskSpaceLow
expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85
for: 1m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "{{ $labels.mountpoint }} is {{ $value | humanize }}% full"
- alert: ContainerDown
expr: absent(container_last_seen{name=~".+"})
for: 1m
labels:
severity: critical
annotations:
summary: "Container is down"
loki/loki-config.yml
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 30d
ingestion_rate_mb: 16
compactor:
working_directory: /loki/compactor
retention_enabled: true
promtail/promtail-config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# Collect all Docker container logs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['__meta_docker_container_image']
target_label: 'image'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'service'
# Collect system logs:
- job_name: syslog
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
Start Everything
# Create config directories:
mkdir -p prometheus loki grafana/provisioning promtail
# Place config files as above, then:
docker compose up -d
# Verify services are up:
docker compose ps
# Check Prometheus targets:
# Open http://your-server:9090/targets — all should be UP
Grafana Setup (First Login)
- Open
http://your-server:3000 - Login:
admin/ yourGRAFANA_PASSWORD - Add Prometheus data source:
- Configuration → Data Sources → Add → Prometheus
- URL:
http://prometheus:9090 - Save & Test
- Add Loki data source:
- Configuration → Data Sources → Add → Loki
- URL:
http://loki:3100 - Save & Test
Provision Data Sources Automatically
# grafana/provisioning/datasources/datasources.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
- name: Loki
type: loki
access: proxy
url: http://loki:3100
Import Pre-Built Dashboards
Grafana has 1,000+ community dashboards. Import by ID in Dashboards → Import:
| Dashboard | ID | Description |
|---|---|---|
| Node Exporter Full | 1860 | Complete host metrics (CPU, RAM, disk, network) |
| Docker Monitoring | 193 | Docker container stats |
| cAdvisor | 14282 | Detailed container resource usage |
| Loki Dashboard | 13639 | Log volume and error rates |
| Nginx Metrics | 9614 | Nginx request rates, latencies |
Import steps: Dashboards → + New → Import → Enter ID → Load → Select Prometheus data source → Import
PromQL Examples
Query Prometheus directly or use in Grafana panels:
# CPU usage per server (percentage):
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory used (GB):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024^3
# Disk usage per mount:
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100
# Network traffic (Mbps in):
rate(node_network_receive_bytes_total[5m]) * 8 / 1024^2
# Container CPU by name:
sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (name) * 100
LogQL Examples
Query Loki logs in Grafana's Explore view:
# All logs from a specific container:
{container="my-app"}
# Filter for errors:
{container="my-app"} |= "ERROR"
# Count error rate over time:
rate({container="my-app"} |= "ERROR" [5m])
# Parse JSON logs and filter by field:
{container="api"} | json | status_code >= 500
# Find slow requests:
{container="api"} | json | duration_ms > 1000
Grafana Alerting
Set up alerts in Grafana that send to Slack, email, or PagerDuty:
- Alerting → Contact Points → Add → Slack or Email
- Alerting → Alert Rules → New Rule:
When: avg(node_memory_MemAvailable_bytes) < 200000000 For: 5 minutes Message: "Server low on memory: {{ $values.A | humanizeBytes }}"
Resource Usage
| Service | RAM | CPU (idle) |
|---|---|---|
| Prometheus | ~100MB | Low |
| Grafana | ~150MB | Low |
| Loki | ~100MB | Low |
| node-exporter | ~15MB | Minimal |
| cAdvisor | ~80MB | Low |
| Promtail | ~30MB | Minimal |
| Total | ~475MB | Low |
Entire stack runs comfortably on a $6/month VPS with 1GB RAM.
Cost Comparison
| Solution | Cost at 10 hosts |
|---|---|
| Datadog | $150–350/month |
| New Relic | $100–200/month |
| Grafana + Prometheus + Loki | ~$0 (server cost) |
| Grafana Cloud (managed) | Free tier; ~$8/host for large scale |
Compare all open source monitoring alternatives at OSSAlt.com/alternatives/datadog.