Self-Host Loki: Log Aggregation Splunk Alternative 2026

TL;DR

Grafana Loki (AGPL 3.0, ~24K GitHub stars, Go) is a horizontally scalable log aggregation system. Unlike Elasticsearch (which indexes all log content), Loki only indexes log labels — making it 10x cheaper on storage. Logs are queried with LogQL (similar to PromQL). Splunk charges $150+/GB/day. Loki self-hosted stores logs on local disk or S3 for pennies. Grafana has native Loki integration — you get logs alongside metrics in the same dashboards.

Key Takeaways

Loki: AGPL 3.0, ~24K stars, Go — label-indexed logs (not full-text index), cheap storage
Promtail: Agent that tails log files and Docker logs, ships to Loki
LogQL: Log query language — filter by labels, extract fields, aggregate
Grafana integration: Native Loki datasource — correlate logs and metrics in one view
10x cheaper than Elasticsearch: No full-text index means tiny storage footprint
vs Elasticsearch: Loki = cheap+simple; Elasticsearch = full-text search+complex

Loki vs Elasticsearch vs Splunk

Feature	Loki	Elasticsearch	Splunk
License	AGPL 3.0	SSPL (not OSS)	Proprietary
Index type	Labels only	Full-text	Full-text
Storage cost	Low (10x cheaper)	High	Very high
Query language	LogQL	Elasticsearch DSL	SPL
Grafana integration	Native	Via plugin	Via plugin
Ingestion rate	High	High	High
Full-text search	No (regex only)	Yes	Yes
Self-host complexity	Low	Medium	High

Part 1: Docker Compose Setup

# docker-compose.yml
services:
  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - ./loki/loki-config.yml:/etc/loki/loki-config.yml:ro
      - loki_data:/loki
    command: -config.file=/etc/loki/loki-config.yml

  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: unless-stopped
    volumes:
      - ./promtail/promtail-config.yml:/etc/promtail/promtail-config.yml:ro
      - /var/log:/var/log:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/promtail-config.yml
    depends_on:
      - loki

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"
      GF_USERS_ALLOW_SIGN_UP: "false"

volumes:
  loki_data:
  grafana_data:

Part 2: Loki Configuration

# loki/loki-config.yml
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://alertmanager:9093

limits_config:
  allow_structured_metadata: true
  volume_enabled: true
  retention_period: 744h    # 31 days

compactor:
  working_directory: /loki/retention
  delete_request_store: filesystem
  retention_enabled: true

Part 3: Promtail Configuration

# promtail/promtail-config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # All Docker container logs:
  - job_name: containers
    static_configs:
      - targets:
          - localhost
        labels:
          job: containerlogs
          __path__: /var/run/docker.sock

    # Use Docker service discovery:
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: ['__meta_docker_container_name']
        regex: '/(.*)'
        target_label: 'container'
      - source_labels: ['__meta_docker_container_log_stream']
        target_label: 'logstream'
      - source_labels: ['__meta_docker_container_label_com_docker_compose_service']
        target_label: 'service'

  # Syslog:
  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          host: my-server
          __path__: /var/log/syslog

  # Nginx access logs:
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>\S+) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>\d+) (?P<body_bytes>\d+)'
      - labels:
          method:
          status:

Part 4: Grafana Datasource for Loki

# grafana/provisioning/datasources/loki.yml
apiVersion: 1
datasources:
  - name: Loki
    type: loki
    url: http://loki:3100
    isDefault: false
    access: proxy
    jsonData:
      maxLines: 1000

Part 5: LogQL Queries

# All logs from a container:
{container="nginx"}

# Filter by log content:
{container="myapp"} |= "ERROR"

# Regex filter:
{container="myapp"} |~ "error|exception|panic"

# Exclude pattern:
{container="myapp"} != "health check"

# Parse JSON logs and filter by field:
{container="myapp"} | json | level="error"

# Count errors per minute:
sum(count_over_time({container="myapp"} |= "ERROR" [1m]))

# Error rate as percentage:
sum(rate({container="myapp"} |= "ERROR" [5m])) /
sum(rate({container="myapp"} [5m]))

# Top 10 slowest requests from nginx:
{job="nginx"} | logfmt | response_time > 1.0 | sort by response_time desc | limit 10

# Logs from multiple services:
{service=~"api|worker|scheduler"} |= "ERROR"

# Last 24h of a specific user's activity:
{container="myapp"} | json | user_id="42"

Part 6: Grafana Dashboard — Logs Panel

Grafana → + New Dashboard → + Add visualization
Select Loki as data source
Query: {container="myapp"} — all logs from container
Visualization: Logs type
Add a Time series panel with:
- Query: sum(rate({container="myapp"} |= "ERROR" [5m]))
- Shows error rate over time

Correlate logs with metrics

In a Grafana dashboard:

Add Prometheus panel (e.g., request rate)
Add Loki panel with {service="api"} |= "ERROR"
Both panels share the same time range — click a spike in metrics, see the logs from that moment

Part 7: Loki Alert Rules

# loki/rules/alerts.yml
groups:
  - name: log_alerts
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({container="myapp"} |= "ERROR" [5m])) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in myapp logs"

      - alert: OOMKill
        expr: |
          count_over_time({job="syslog"} |= "Out of memory" [10m]) > 0
        for: 0m
        labels:
          severity: critical
        annotations:
          summary: "OOM kill detected on {{ $labels.host }}"

Part 8: S3 Storage for Large Scale

For more than a few servers, use S3 instead of local filesystem:

# loki-config.yml (S3 section):
common:
  storage:
    s3:
      endpoint: s3.amazonaws.com
      region: us-east-1
      bucketnames: your-loki-bucket
      access_key_id: "${AWS_ACCESS_KEY}"
      secret_access_key: "${AWS_SECRET_KEY}"

# Or MinIO (self-hosted S3):
common:
  storage:
    s3:
      endpoint: minio:9000
      bucketnames: loki
      access_key_id: "${MINIO_USER}"
      secret_access_key: "${MINIO_PASSWORD}"
      insecure: true
      s3forcepathstyle: true

Maintenance

# Update Loki stack:
docker compose pull
docker compose up -d

# Check Loki health:
curl http://localhost:3100/ready

# Check ingestion stats:
curl http://localhost:3100/metrics | grep loki_distributor

# Backup:
tar -czf loki-backup-$(date +%Y%m%d).tar.gz \
  $(docker volume inspect loki_loki_data --format '{{.Mountpoint}}')

# Logs:
docker compose logs -f loki
docker compose logs -f promtail

Why Self-Host Loki

Splunk's pricing is notoriously opaque, but their standard workload pricing starts at $150 per GB ingested per day. A modest production environment generating 5GB of logs per day costs $750/day on Splunk — $273,750/year. Even Datadog's log management starts at $0.10/GB ingested plus $1.70/million log events/month. For a startup logging 10GB/day, that's $1,000+/month just for logs.

Loki's cost model is radically different. You pay for storage, not ingestion events. Since Loki only indexes labels (not log content), storage requirements are 5-10x smaller than Elasticsearch for the same log volume. A self-hosted Loki stack handling 10GB/day of logs fits comfortably on a $20-40/month VPS with 100GB storage. Annual cost: $240-480 versus $12,000+ on Datadog.

The Grafana integration is Loki's biggest practical advantage. If you're already running Grafana for metrics (Prometheus), adding Loki means your logs appear in the same dashboards, with the same time range controls, correlated with your metrics. When CPU spikes at 3 AM, you click into that time range and immediately see the error logs that caused it — no switching between tools, no separate log search interface.

Data residency and compliance are increasingly important. Shipping all your application logs to a third-party SaaS means that vendor has access to your error messages, user IDs, IP addresses, and potentially sensitive data in stack traces. Self-hosted Loki keeps logs on infrastructure you control.

When NOT to self-host Loki: If you need full-text search across log content (not just label-based filtering), Loki's regex-only approach will frustrate you — Elasticsearch handles this better. Also, Loki's operational complexity increases at scale — for large distributed systems with hundreds of log sources, the managed Grafana Cloud offering may be worth the cost for the operational simplicity.

Prerequisites

The Loki stack consists of three components — Loki itself, Promtail (log collector), and Grafana (UI) — and each has different resource requirements.

Server specs: Loki's resource usage scales with log ingestion rate, not log volume. For a single server running 10-20 Docker containers, 2 vCPUs and 4GB RAM handles the full PLG stack (Prometheus + Loki + Grafana) comfortably. If you're aggregating logs from multiple servers, scale Loki to 4 vCPUs and 8GB RAM. Storage is the main constraint — plan for 10-20GB per day of raw logs before Loki's compression (actual storage is typically 5x smaller). See our VPS comparison for self-hosters for good options at the 4 vCPU tier.

Operating system: Ubuntu 22.04 LTS. Docker's log driver integrations work best on Ubuntu, and Promtail's Docker service discovery requires access to /var/run/docker.sock.

Understanding labels: Loki's query performance depends entirely on label design. Labels should be low-cardinality values (container name, service name, log level) — never put user IDs, request IDs, or timestamps in labels. High-cardinality labels cause Loki to create millions of streams and degrade performance significantly.

Retention planning: Set retention_period in loki-config.yml before you start ingesting. The default is no retention (logs kept forever). For most setups, 30-90 days is appropriate. You'll also want automatic backups of the Loki data volume.

Skill level: Intermediate. Understanding of Docker Compose, basic YAML, and log concepts (what a label is, what a stream is) is needed.

Production Security Hardening

Loki's default configuration has no authentication — anyone who can reach port 3100 can query all your logs. Logs often contain sensitive data: error messages with user IDs, stack traces with file paths, request logs with IP addresses. Lock this down. Follow the self-hosting security checklist and implement these Loki-specific measures:

Firewall (UFW): Loki's port (3100) should only be accessible from Grafana and Promtail — never from the public internet.

sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Loki (3100) and Grafana internal (3000) should not be exposed directly
sudo ufw enable

Grafana authentication: Disable anonymous access and configure Grafana's built-in user management or OAuth. Set a strong admin password in the environment variable rather than using the default admin/admin.

Secrets management: Keep Grafana passwords, S3 credentials, and any other secrets in a .env file, never hardcoded in docker-compose.yml:

# .env (never commit this)
GRAFANA_PASSWORD=your-strong-grafana-password
AWS_ACCESS_KEY=your-s3-key
AWS_SECRET_KEY=your-s3-secret

echo ".env" >> .gitignore

Disable SSH password authentication: Edit /etc/ssh/sshd_config: set PasswordAuthentication no and PermitRootLogin no. Restart: sudo systemctl restart ssh.

Enable auth_enabled: true in Loki for multi-tenant: If you're aggregating logs from multiple teams and need access control, enable auth. With auth_enabled: true, each request must include an X-Scope-OrgID header. Configure Grafana's Loki datasource with org-specific credentials.

Automatic security updates:

sudo apt install unattended-upgrades
sudo dpkg-reconfigure --priority=low unattended-upgrades

Regular backups: The loki_data volume contains all your indexed logs. Back this up with restic or borgbackup. See automated server backups with restic for a production-tested approach that handles Docker volumes elegantly.

Effective Log Management Strategy

Getting value from Loki requires more than just collecting logs — it requires a thoughtful label schema, sensible retention policies, and dashboards designed to surface actionable signals.

Label design is the most important architectural decision for a Loki deployment. Labels in Loki are like dimensions in a time-series database — they define what makes each log stream unique. The rule is to keep label cardinality low. Good labels are things like container (50 values for a typical server), service (10-20 values), environment (3 values: prod/staging/dev), and level (4 values: debug/info/warn/error). Bad labels are things like request_id, user_id, or trace_id — these create millions of unique streams, degrade query performance, and consume massive amounts of index storage.

Instead of putting high-cardinality data in labels, extract it at query time using LogQL's parsing capabilities. A log line like {"level":"error","user_id":"12345","message":"payment failed"} should be labeled with just level and have user_id extracted at query time with | json | user_id="12345". This keeps your index small while still enabling rich queries.

Define alert thresholds based on baselines, not intuition. Your first step before writing any Loki alert rules should be running queries over a week of production logs to understand normal error rates. If your application generates 3-5 errors per minute normally, an alert threshold of 10 errors per minute might be appropriate. An alert that fires constantly because the threshold is too low trains your team to ignore it.

The Grafana "Explore" view is your primary tool for ad-hoc investigation. Unlike building dashboards (which requires knowing what you're looking for), Explore lets you iterate on LogQL queries interactively. When something goes wrong in production, open Explore, select a time range around the incident, and start with broad queries: {service="api"} |= "error". Then narrow by adding more filters until you find the specific log lines that explain the incident.

For multi-server environments, Promtail runs on each server and ships logs to your central Loki instance. Tag each server's logs with a host label: - replacement: your-server-1 in the Promtail config under target_label: host. This lets you query {host="server-1"} to isolate issues to specific machines, or use regex {host=~"server-[12]"} to query multiple servers simultaneously.

Troubleshooting Common Issues

Loki returns "context deadline exceeded" on queries

This usually means your query is too broad and scanning too many log streams. Add more specific label selectors to narrow the stream set — {container="myapp"} instead of {job="containerlogs"}. Also check Loki's resource usage: docker stats loki. If Loki is CPU-bound during queries, it may need more resources or a query timeout increase in config.

Promtail not picking up new containers

Promtail's Docker service discovery refreshes based on refresh_interval. If you add a new container and Promtail doesn't start tailing its logs within the interval, check docker compose logs -f promtail. Common issues: the container isn't producing logs to stdout/stderr (some apps write to files instead), or the Docker socket isn't mounted correctly in the Promtail container.

"entry out of order" errors in Loki

Loki requires log entries to arrive in timestamp order within a stream. If your application produces out-of-order logs, add unordered_writes: true to the Loki config. Also check that your server's time is synchronized with NTP — clock skew between the logging host and Loki causes ordering issues.

Disk fills up rapidly

If you didn't configure retention, Loki keeps logs forever. Set retention_period: 744h (31 days) in limits_config and ensure the compactor is configured with retention_enabled: true. For immediate relief, you can manually delete old chunks from the storage directory, but the compactor is the proper mechanism.

Grafana shows "no data" for Loki queries

Verify Loki is healthy: curl http://localhost:3100/ready should return ready. Then check the datasource URL in Grafana — if Grafana and Loki are in the same Docker Compose network, use the service name (http://loki:3100), not localhost. Test the datasource from Grafana → Data Sources → Loki → Test.

High memory usage in Loki

Loki caches chunks in memory for faster reads. If memory usage is excessive, reduce chunk_idle_period and max_chunk_age in config to flush chunks to disk sooner. Also verify your label cardinality isn't exploding — curl http://localhost:3100/metrics | grep loki_ingester_streams_created_total shows the current total active stream count.

See all open source monitoring and logging tools at OSSAlt.com/categories/devops.

See open source alternatives to Grafana on OSSAlt.

The SaaS-to-Self-Hosted Migration Guide (Free PDF)