Skip to main content

Grafana + Prometheus + Loki: Self-Hosted Observability Stack 2026

·OSSAlt Team
grafanaprometheuslokiobservabilitymonitoringself-hostingdocker2026

TL;DR

The Grafana + Prometheus + Loki stack is the open source equivalent of Datadog or New Relic — complete observability for your servers and containers with no per-seat or per-host pricing. Prometheus collects metrics, Loki aggregates logs, and Grafana visualizes everything on dashboards. Setup takes ~30 minutes with Docker Compose. Pre-built dashboards for common scenarios are available at grafana.com/grafana/dashboards.

Key Takeaways

  • Prometheus (~55K stars): Pulls metrics from your services and stores them as time-series data
  • Grafana (~62K stars): Dashboards and visualization for Prometheus + Loki + 40+ other data sources
  • Loki (~23K stars): Log aggregation — like Prometheus but for logs
  • Resource usage: Full stack runs on ~512MB RAM; add node_exporter and cAdvisor for complete coverage
  • No per-host pricing: Unlike Datadog ($15–35/host/month), this runs for server cost only
  • Pre-built dashboards: Import Node Exporter Full (ID: 1860) for instant host visibility

Stack Overview

Your servers/containers
    ↓ metrics exposed at /metrics
node_exporter  ← host CPU/RAM/disk/network
cAdvisor       ← Docker container metrics
Your app       ← custom Prometheus metrics

    ↓ Prometheus scrapes every 15s
Prometheus     ← stores time-series metrics (15-day default retention)

    ↓ Promtail ships logs
Promtail       ← reads Docker container logs → sends to Loki
Loki           ← stores and indexes logs by labels

    ↑ Grafana queries both
Grafana        ← dashboards, alerting, visualization

Docker Compose: Full Stack

# docker-compose.yml
version: '3.8'

volumes:
  prometheus_data:
  grafana_data:
  loki_data:

networks:
  monitoring:
    driver: bridge

services:
  # ─── Prometheus (metrics store) ────────────────────────────────────
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'   # Keep 30 days of metrics
      - '--web.enable-lifecycle'               # Allow hot-reload
    ports:
      - "9090:9090"
    networks:
      - monitoring

  # ─── Node Exporter (host metrics) ──────────────────────────────────
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  # ─── cAdvisor (Docker container metrics) ───────────────────────────
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  # ─── Loki (log storage) ────────────────────────────────────────────
  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    volumes:
      - ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki_data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    ports:
      - "3100:3100"
    networks:
      - monitoring

  # ─── Promtail (log shipper) ────────────────────────────────────────
  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    networks:
      - monitoring

  # ─── Grafana (dashboards) ──────────────────────────────────────────
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_SERVER_ROOT_URL=https://grafana.yourdomain.com
      - GF_SMTP_ENABLED=true                       # Optional: alerting emails
      - GF_SMTP_HOST=${SMTP_HOST}
      - GF_SMTP_USER=${SMTP_USER}
      - GF_SMTP_PASSWORD=${SMTP_PASSWORD}
      - GF_SMTP_FROM_ADDRESS=grafana@yourdomain.com
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    ports:
      - "3000:3000"
    networks:
      - monitoring
    depends_on:
      - prometheus
      - loki
# .env
GRAFANA_PASSWORD=your-strong-password
SMTP_HOST=smtp.yourdomain.com:587
SMTP_USER=grafana@yourdomain.com
SMTP_PASSWORD=your-smtp-password

Configuration Files

prometheus/prometheus.yml

global:
  scrape_interval: 15s       # How often to scrape targets
  evaluation_interval: 15s   # How often to evaluate alert rules

rule_files:
  - /etc/prometheus/alerts.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  # Add your own services:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp:8000']  # If your app exposes /metrics
    metrics_path: '/metrics'

prometheus/alerts.yml

groups:
  - name: infrastructure
    rules:
      - alert: HighCPULoad
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU is {{ $value | humanize }}% for 5 minutes"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory on {{ $labels.instance }}"
          description: "Memory is {{ $value | humanize }}% used"

      - alert: DiskSpaceLow
        expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "{{ $labels.mountpoint }} is {{ $value | humanize }}% full"

      - alert: ContainerDown
        expr: absent(container_last_seen{name=~".+"})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container is down"

loki/loki-config.yml

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
  filesystem:
    directory: /loki/chunks

limits_config:
  retention_period: 30d
  ingestion_rate_mb: 16

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

promtail/promtail-config.yml

server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Collect all Docker container logs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_image']
        target_label: 'image'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'service'

  # Collect system logs:
  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*log

Start Everything

# Create config directories:
mkdir -p prometheus loki grafana/provisioning promtail

# Place config files as above, then:
docker compose up -d

# Verify services are up:
docker compose ps

# Check Prometheus targets:
# Open http://your-server:9090/targets — all should be UP

Grafana Setup (First Login)

  1. Open http://your-server:3000
  2. Login: admin / your GRAFANA_PASSWORD
  3. Add Prometheus data source:
    • Configuration → Data Sources → Add → Prometheus
    • URL: http://prometheus:9090
    • Save & Test
  4. Add Loki data source:
    • Configuration → Data Sources → Add → Loki
    • URL: http://loki:3100
    • Save & Test

Provision Data Sources Automatically

# grafana/provisioning/datasources/datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100

Import Pre-Built Dashboards

Grafana has 1,000+ community dashboards. Import by ID in Dashboards → Import:

DashboardIDDescription
Node Exporter Full1860Complete host metrics (CPU, RAM, disk, network)
Docker Monitoring193Docker container stats
cAdvisor14282Detailed container resource usage
Loki Dashboard13639Log volume and error rates
Nginx Metrics9614Nginx request rates, latencies

Import steps: Dashboards → + New → Import → Enter ID → Load → Select Prometheus data source → Import


PromQL Examples

Query Prometheus directly or use in Grafana panels:

# CPU usage per server (percentage):
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory used (GB):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024^3

# Disk usage per mount:
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

# Network traffic (Mbps in):
rate(node_network_receive_bytes_total[5m]) * 8 / 1024^2

# Container CPU by name:
sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (name) * 100

LogQL Examples

Query Loki logs in Grafana's Explore view:

# All logs from a specific container:
{container="my-app"}

# Filter for errors:
{container="my-app"} |= "ERROR"

# Count error rate over time:
rate({container="my-app"} |= "ERROR" [5m])

# Parse JSON logs and filter by field:
{container="api"} | json | status_code >= 500

# Find slow requests:
{container="api"} | json | duration_ms > 1000

Grafana Alerting

Set up alerts in Grafana that send to Slack, email, or PagerDuty:

  1. Alerting → Contact Points → Add → Slack or Email
  2. Alerting → Alert Rules → New Rule:
    When: avg(node_memory_MemAvailable_bytes) < 200000000
    For: 5 minutes
    Message: "Server low on memory: {{ $values.A | humanizeBytes }}"
    

Resource Usage

ServiceRAMCPU (idle)
Prometheus~100MBLow
Grafana~150MBLow
Loki~100MBLow
node-exporter~15MBMinimal
cAdvisor~80MBLow
Promtail~30MBMinimal
Total~475MBLow

Entire stack runs comfortably on a $6/month VPS with 1GB RAM.


Cost Comparison

SolutionCost at 10 hosts
Datadog$150–350/month
New Relic$100–200/month
Grafana + Prometheus + Loki~$0 (server cost)
Grafana Cloud (managed)Free tier; ~$8/host for large scale

Compare all open source monitoring alternatives at OSSAlt.com/alternatives/datadog.

Comments