Grafana + Prometheus + Loki 2026

TL;DR

The Grafana + Prometheus + Loki stack is the open source equivalent of Datadog or New Relic — complete observability for your servers and containers with no per-seat or per-host pricing. Prometheus collects metrics, Loki aggregates logs, and Grafana visualizes everything on dashboards. Setup takes ~30 minutes with Docker Compose. Pre-built dashboards for common scenarios are available at grafana.com/grafana/dashboards.

Key Takeaways

Prometheus (~55K stars): Pulls metrics from your services and stores them as time-series data
Grafana (~62K stars): Dashboards and visualization for Prometheus + Loki + 40+ other data sources
Loki (~23K stars): Log aggregation — like Prometheus but for logs
Resource usage: Full stack runs on ~512MB RAM; add node_exporter and cAdvisor for complete coverage
No per-host pricing: Unlike Datadog ($15–35/host/month), this runs for server cost only
Pre-built dashboards: Import Node Exporter Full (ID: 1860) for instant host visibility

Stack Overview

Your servers/containers
    ↓ metrics exposed at /metrics
node_exporter  ← host CPU/RAM/disk/network
cAdvisor       ← Docker container metrics
Your app       ← custom Prometheus metrics

    ↓ Prometheus scrapes every 15s
Prometheus     ← stores time-series metrics (15-day default retention)

    ↓ Promtail ships logs
Promtail       ← reads Docker container logs → sends to Loki
Loki           ← stores and indexes logs by labels

    ↑ Grafana queries both
Grafana        ← dashboards, alerting, visualization

Docker Compose: Full Stack

# docker-compose.yml
version: '3.8'

volumes:
  prometheus_data:
  grafana_data:
  loki_data:

networks:
  monitoring:
    driver: bridge

services:
  # ─── Prometheus (metrics store) ────────────────────────────────────
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'   # Keep 30 days of metrics
      - '--web.enable-lifecycle'               # Allow hot-reload
    ports:
      - "9090:9090"
    networks:
      - monitoring

  # ─── Node Exporter (host metrics) ──────────────────────────────────
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  # ─── cAdvisor (Docker container metrics) ───────────────────────────
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    privileged: true
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  # ─── Loki (log storage) ────────────────────────────────────────────
  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    volumes:
      - ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
      - loki_data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    ports:
      - "3100:3100"
    networks:
      - monitoring

  # ─── Promtail (log shipper) ────────────────────────────────────────
  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    networks:
      - monitoring

  # ─── Grafana (dashboards) ──────────────────────────────────────────
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
      - GF_SERVER_ROOT_URL=https://grafana.yourdomain.com
      - GF_SMTP_ENABLED=true                       # Optional: alerting emails
      - GF_SMTP_HOST=${SMTP_HOST}
      - GF_SMTP_USER=${SMTP_USER}
      - GF_SMTP_PASSWORD=${SMTP_PASSWORD}
      - GF_SMTP_FROM_ADDRESS=grafana@yourdomain.com
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    ports:
      - "3000:3000"
    networks:
      - monitoring
    depends_on:
      - prometheus
      - loki

# .env
GRAFANA_PASSWORD=your-strong-password
SMTP_HOST=smtp.yourdomain.com:587
SMTP_USER=grafana@yourdomain.com
SMTP_PASSWORD=your-smtp-password

Configuration Files

prometheus/prometheus.yml

global:
  scrape_interval: 15s       # How often to scrape targets
  evaluation_interval: 15s   # How often to evaluate alert rules

rule_files:
  - /etc/prometheus/alerts.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  # Add your own services:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp:8000']  # If your app exposes /metrics
    metrics_path: '/metrics'

prometheus/alerts.yml

groups:
  - name: infrastructure
    rules:
      - alert: HighCPULoad
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU is {{ $value | humanize }}% for 5 minutes"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory on {{ $labels.instance }}"
          description: "Memory is {{ $value | humanize }}% used"

      - alert: DiskSpaceLow
        expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 85
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "{{ $labels.mountpoint }} is {{ $value | humanize }}% full"

      - alert: ContainerDown
        expr: absent(container_last_seen{name=~".+"})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container is down"

loki/loki-config.yml

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
  filesystem:
    directory: /loki/chunks

limits_config:
  retention_period: 30d
  ingestion_rate_mb: 16

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

promtail/promtail-config.yml

server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Collect all Docker container logs:
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_image']
        target_label: 'image'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'service'

  # Collect system logs:
  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*log

Start Everything

# Create config directories:
mkdir -p prometheus loki grafana/provisioning promtail

# Place config files as above, then:
docker compose up -d

# Verify services are up:
docker compose ps

# Check Prometheus targets:
# Open http://your-server:9090/targets — all should be UP

Open http://your-server:3000
Login: admin / your GRAFANA_PASSWORD
Add Prometheus data source:
- Configuration → Data Sources → Add → Prometheus
- URL: http://prometheus:9090
- Save & Test
Add Loki data source:
- Configuration → Data Sources → Add → Loki
- URL: http://loki:3100
- Save & Test

Provision Data Sources Automatically

# grafana/provisioning/datasources/datasources.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100

Import Pre-Built Dashboards

Grafana has 1,000+ community dashboards. Import by ID in Dashboards → Import:

Dashboard	ID	Description
Node Exporter Full	1860	Complete host metrics (CPU, RAM, disk, network)
Docker Monitoring	193	Docker container stats
cAdvisor	14282	Detailed container resource usage
Loki Dashboard	13639	Log volume and error rates
Nginx Metrics	9614	Nginx request rates, latencies

Import steps: Dashboards → + New → Import → Enter ID → Load → Select Prometheus data source → Import

PromQL Examples

Query Prometheus directly or use in Grafana panels:

# CPU usage per server (percentage):
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory used (GB):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024^3

# Disk usage per mount:
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100

# Network traffic (Mbps in):
rate(node_network_receive_bytes_total[5m]) * 8 / 1024^2

# Container CPU by name:
sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (name) * 100

LogQL Examples

Query Loki logs in Grafana's Explore view:

# All logs from a specific container:
{container="my-app"}

# Filter for errors:
{container="my-app"} |= "ERROR"

# Count error rate over time:
rate({container="my-app"} |= "ERROR" [5m])

# Parse JSON logs and filter by field:
{container="api"} | json | status_code >= 500

# Find slow requests:
{container="api"} | json | duration_ms > 1000

Grafana Alerting

Set up alerts in Grafana that send to Slack, email, or PagerDuty:

Alerting → Contact Points → Add → Slack or Email

Alerting → Alert Rules → New Rule:

When: avg(node_memory_MemAvailable_bytes) < 200000000
For: 5 minutes
Message: "Server low on memory: {{ $values.A | humanizeBytes }}"

Resource Usage

Service	RAM	CPU (idle)
Prometheus	~100MB	Low
Grafana	~150MB	Low
Loki	~100MB	Low
node-exporter	~15MB	Minimal
cAdvisor	~80MB	Low
Promtail	~30MB	Minimal
Total	~475MB	Low

Entire stack runs comfortably on a $6/month VPS with 1GB RAM.

Cost Comparison

Solution	Cost at 10 hosts
Datadog	$150–350/month
New Relic	$100–200/month
Grafana + Prometheus + Loki	~$0 (server cost)
Grafana Cloud (managed)	Free tier; ~$8/host for large scale

Compare all open source monitoring alternatives at OSSAlt.com/alternatives/datadog.

See open source alternatives to Grafana on OSSAlt.

Monitoring and Operational Health

Deploying a self-hosted service without monitoring is running blind. At minimum, set up three layers: uptime monitoring, resource monitoring, and log retention.

Uptime monitoring with Uptime Kuma gives you HTTP endpoint checks every 30-60 seconds with alerts to Telegram, Slack, email, or webhook. Create a monitor for your primary application URL and any API health endpoints. The status page feature lets you communicate incidents to users without custom tooling.

Resource monitoring tells you when a container is leaking memory or when disk is filling up. Prometheus + Grafana is the standard self-hosted monitoring stack — Prometheus scrapes container metrics via cAdvisor, Grafana visualizes them with pre-built Docker dashboards. Set alerts for memory above 80% and disk above 75%; both give you time to act before they become incidents.

Log retention: Docker container logs are ephemeral by default. Add logging: driver: json-file; options: max-size: 100m; max-file: 3 to your docker-compose.yml to limit log growth and retain recent logs for debugging. For centralized log search across multiple containers, Loki integrates with the same Grafana instance.

Backup discipline: Schedule automated backups of your Docker volumes using Duplicati or Restic. Back up to remote storage (Backblaze B2 or Cloudflare R2 cost $0.006/GB/month). Run a restore drill monthly — a backup that has never been tested is not a reliable backup. Your restore procedure documentation should live somewhere accessible from outside the failed server.

Update strategy: Pin Docker image versions in your compose file rather than using latest. Create a monthly maintenance window to review changelogs and update images. Major version updates often require running migration scripts before the new container starts — check the release notes before pulling.

Network Security and Hardening

Self-hosted services exposed to the internet require baseline hardening. The default Docker networking model exposes container ports directly — without additional configuration, any open port is accessible from anywhere.

Firewall configuration: Use ufw (Uncomplicated Firewall) on Ubuntu/Debian or firewalld on RHEL-based systems. Allow only ports 22 (SSH), 80 (HTTP redirect), and 443 (HTTPS). Block all other inbound ports. Docker bypasses ufw's OUTPUT rules by default — install the ufw-docker package or configure Docker's iptables integration to prevent containers from opening ports that bypass your firewall rules.

SSH hardening: Disable password authentication and root login in /etc/ssh/sshd_config. Use key-based authentication only. Consider changing the default SSH port (22) to a non-standard port to reduce brute-force noise in your logs.

Fail2ban: Install fail2ban to automatically ban IPs that make repeated failed authentication attempts. Configure jails for SSH, Nginx, and any application-level authentication endpoints.

TLS/SSL: Use Let's Encrypt certificates via Certbot or Traefik's automatic ACME integration. Never expose services over HTTP in production. Configure HSTS headers to prevent protocol downgrade attacks. Check your SSL configuration with SSL Labs' server test — aim for an A or A+ rating.

Container isolation: Avoid running containers as root. Add user: "1000:1000" to your docker-compose.yml service definitions where the application supports non-root execution. Use read-only volumes (volumes: - /host/path:/container/path:ro) for configuration files the container only needs to read.

Secrets management: Never put passwords and API keys directly in docker-compose.yml files committed to version control. Use Docker secrets, environment files (.env), or a secrets manager like Vault for sensitive configuration. Add .env to your .gitignore before your first commit.

Production Deployment Checklist

Before treating any self-hosted service as production-ready, work through this checklist. Each item represents a class of failure that will eventually affect your service if left unaddressed.

Infrastructure

Server OS is running latest security patches (apt upgrade / dnf upgrade)
Firewall configured: only ports 22, 80, 443 open
SSH key-only authentication (password auth disabled)
Docker and Docker Compose are current stable versions
Swap space configured (at minimum equal to RAM for <4GB servers)

Application

Docker image version pinned (not latest) in docker-compose.yml
Data directories backed by named volumes (not bind mounts to ephemeral paths)
Environment variables stored in .env file (not hardcoded in compose)
Container restart policy set to unless-stopped or always
Health check configured in Compose or Dockerfile

Networking

SSL certificate issued and auto-renewal configured
HTTP requests redirect to HTTPS
Domain points to server IP (verify with dig +short your.domain)
Reverse proxy (Nginx/Traefik) handles SSL termination

Monitoring and Backup

Uptime monitoring configured with alerting
Automated daily backup of Docker volumes to remote storage
Backup tested with a successful restore drill
Log retention configured (no unbounded log accumulation)

Access Control

Default admin credentials changed
Email confirmation configured if the app supports it
User registration disabled if the service is private
Authentication middleware added if the service lacks native login

Conclusion and Getting Started

The self-hosting ecosystem has matured dramatically. What required significant Linux expertise in 2015 is now achievable for any developer comfortable with Docker Compose and a basic understanding of DNS. The tools have gotten better, the documentation has improved, and the community has built enough tutorials that most common configurations have been solved publicly.

The operational overhead that remains is real but manageable. A stable self-hosted service — one that is properly monitored, backed up, and kept updated — requires roughly 30-60 minutes of attention per month once the initial deployment is complete. That time investment is justified for services where data ownership, cost savings, or customization requirements make the cloud alternative unsuitable.

Start with one service. Trying to migrate your entire stack to self-hosted infrastructure at once is a recipe for an overwhelming weekend project that doesn't get finished. Pick the service where the cloud alternative is most expensive or where data ownership matters most, run it for 30 days, and then evaluate whether to expand.

Build your operational foundation before adding services. Get monitoring, backup, and SSL configured correctly for your first service before adding a second. These cross-cutting concerns become easier to extend to new services once the pattern is established, and much harder to retrofit to a fleet of services that were deployed without them.

Treat this like a product. Your self-hosted services have users (even if that's just you). Write a runbook. Document the restore procedure. Create a status page. These practices don't take long but they transform self-hosting from a series of experiments into reliable infrastructure you can depend on.

The community around self-hosted software is active and helpful. Reddit's r/selfhosted, the Awesome-Selfhosted GitHub list, and Discord servers for specific applications all have people who have already solved the problem you're encountering. The configuration questions that feel unique usually aren't.

The Grafana + Prometheus + Loki stack is the standard choice for self-hosted observability because its components are independently useful. You can run Prometheus without Loki, add Loki later for logs, and add Grafana Alloy as a replacement for Promtail when you need more flexible log processing. Start with Prometheus + Grafana for metrics visibility, and add Loki when you need to correlate log events with resource metric anomalies. The three-component stack gives you the same observability surface as Datadog or New Relic at near-zero marginal cost beyond the server running the containers.

Grafana's alerting capabilities are worth configuring early. Set up alerts for the services that matter most — database query latency, memory pressure, disk growth rate — before you have an incident that requires you to investigate retroactively. Grafana alerts can route to Slack, PagerDuty, email, or webhook, and alert rules can reference any Prometheus or Loki query you've already built for dashboards. The investment in alert configuration upfront pays for itself the first time an alert fires while you're asleep rather than discovering a problem from a user report.

The SaaS-to-Self-Hosted Migration Guide (Free PDF)