Open-source alternatives guide
Self-Host Loki: Log Aggregation Splunk Alternative 2026
Self-host Grafana Loki in 2026. AGPL 3.0, ~24K stars, Go — log aggregation with Promtail. Query logs with LogQL, correlate with Prometheus metrics in Grafana.
TL;DR
Grafana Loki (AGPL 3.0, ~24K GitHub stars, Go) is a horizontally scalable log aggregation system. Unlike Elasticsearch (which indexes all log content), Loki only indexes log labels — making it 10x cheaper on storage. Logs are queried with LogQL (similar to PromQL). Splunk charges $150+/GB/day. Loki self-hosted stores logs on local disk or S3 for pennies. Grafana has native Loki integration — you get logs alongside metrics in the same dashboards.
Key Takeaways
- Loki: AGPL 3.0, ~24K stars, Go — label-indexed logs (not full-text index), cheap storage
- Promtail: Agent that tails log files and Docker logs, ships to Loki
- LogQL: Log query language — filter by labels, extract fields, aggregate
- Grafana integration: Native Loki datasource — correlate logs and metrics in one view
- 10x cheaper than Elasticsearch: No full-text index means tiny storage footprint
- vs Elasticsearch: Loki = cheap+simple; Elasticsearch = full-text search+complex
Loki vs Elasticsearch vs Splunk
| Feature | Loki | Elasticsearch | Splunk |
|---|---|---|---|
| License | AGPL 3.0 | SSPL (not OSS) | Proprietary |
| Index type | Labels only | Full-text | Full-text |
| Storage cost | Low (10x cheaper) | High | Very high |
| Query language | LogQL | Elasticsearch DSL | SPL |
| Grafana integration | Native | Via plugin | Via plugin |
| Ingestion rate | High | High | High |
| Full-text search | No (regex only) | Yes | Yes |
| Self-host complexity | Low | Medium | High |
Part 1: Docker Compose Setup
# docker-compose.yml
services:
loki:
image: grafana/loki:latest
container_name: loki
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- ./loki/loki-config.yml:/etc/loki/loki-config.yml:ro
- loki_data:/loki
command: -config.file=/etc/loki/loki-config.yml
promtail:
image: grafana/promtail:latest
container_name: promtail
restart: unless-stopped
volumes:
- ./promtail/promtail-config.yml:/etc/promtail/promtail-config.yml:ro
- /var/log:/var/log:ro
- /var/run/docker.sock:/var/run/docker.sock
command: -config.file=/etc/promtail/promtail-config.yml
depends_on:
- loki
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
environment:
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_PASSWORD}"
GF_USERS_ALLOW_SIGN_UP: "false"
volumes:
loki_data:
grafana_data:
Part 2: Loki Configuration
# loki/loki-config.yml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://alertmanager:9093
limits_config:
allow_structured_metadata: true
volume_enabled: true
retention_period: 744h # 31 days
compactor:
working_directory: /loki/retention
delete_request_store: filesystem
retention_enabled: true
Part 3: Promtail Configuration
# promtail/promtail-config.yml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
# All Docker container logs:
- job_name: containers
static_configs:
- targets:
- localhost
labels:
job: containerlogs
__path__: /var/run/docker.sock
# Use Docker service discovery:
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
regex: '/(.*)'
target_label: 'container'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'logstream'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'service'
# Syslog:
- job_name: syslog
static_configs:
- targets:
- localhost
labels:
job: varlogs
host: my-server
__path__: /var/log/syslog
# Nginx access logs:
- job_name: nginx
static_configs:
- targets:
- localhost
labels:
job: nginx
__path__: /var/log/nginx/access.log
pipeline_stages:
- regex:
expression: '^(?P<remote_addr>[\w.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>\S+) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (?P<status>\d+) (?P<body_bytes>\d+)'
- labels:
method:
status:
Part 4: Grafana Datasource for Loki
# grafana/provisioning/datasources/loki.yml
apiVersion: 1
datasources:
- name: Loki
type: loki
url: http://loki:3100
isDefault: false
access: proxy
jsonData:
maxLines: 1000
Part 5: LogQL Queries
# All logs from a container:
{container="nginx"}
# Filter by log content:
{container="myapp"} |= "ERROR"
# Regex filter:
{container="myapp"} |~ "error|exception|panic"
# Exclude pattern:
{container="myapp"} != "health check"
# Parse JSON logs and filter by field:
{container="myapp"} | json | level="error"
# Count errors per minute:
sum(count_over_time({container="myapp"} |= "ERROR" [1m]))
# Error rate as percentage:
sum(rate({container="myapp"} |= "ERROR" [5m])) /
sum(rate({container="myapp"} [5m]))
# Top 10 slowest requests from nginx:
{job="nginx"} | logfmt | response_time > 1.0 | sort by response_time desc | limit 10
# Logs from multiple services:
{service=~"api|worker|scheduler"} |= "ERROR"
# Last 24h of a specific user's activity:
{container="myapp"} | json | user_id="42"
Part 6: Grafana Dashboard — Logs Panel
- Grafana → + New Dashboard → + Add visualization
- Select Loki as data source
- Query:
{container="myapp"}— all logs from container - Visualization: Logs type
- Add a Time series panel with:
- Query:
sum(rate({container="myapp"} |= "ERROR" [5m])) - Shows error rate over time
- Query:
Correlate logs with metrics
In a Grafana dashboard:
- Add Prometheus panel (e.g., request rate)
- Add Loki panel with
{service="api"} |= "ERROR" - Both panels share the same time range — click a spike in metrics, see the logs from that moment
Part 7: Loki Alert Rules
# loki/rules/alerts.yml
groups:
- name: log_alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate({container="myapp"} |= "ERROR" [5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate in myapp logs"
- alert: OOMKill
expr: |
count_over_time({job="syslog"} |= "Out of memory" [10m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "OOM kill detected on {{ $labels.host }}"
Part 8: S3 Storage for Large Scale
For more than a few servers, use S3 instead of local filesystem:
# loki-config.yml (S3 section):
common:
storage:
s3:
endpoint: s3.amazonaws.com
region: us-east-1
bucketnames: your-loki-bucket
access_key_id: "${AWS_ACCESS_KEY}"
secret_access_key: "${AWS_SECRET_KEY}"
# Or MinIO (self-hosted S3):
common:
storage:
s3:
endpoint: minio:9000
bucketnames: loki
access_key_id: "${MINIO_USER}"
secret_access_key: "${MINIO_PASSWORD}"
insecure: true
s3forcepathstyle: true
Maintenance
# Update Loki stack:
docker compose pull
docker compose up -d
# Check Loki health:
curl http://localhost:3100/ready
# Check ingestion stats:
curl http://localhost:3100/metrics | grep loki_distributor
# Backup:
tar -czf loki-backup-$(date +%Y%m%d).tar.gz \
$(docker volume inspect loki_loki_data --format '{{.Mountpoint}}')
# Logs:
docker compose logs -f loki
docker compose logs -f promtail
Why Self-Host Loki
Splunk's pricing is notoriously opaque, but their standard workload pricing starts at $150 per GB ingested per day. A modest production environment generating 5GB of logs per day costs $750/day on Splunk — $273,750/year. Even Datadog's log management starts at $0.10/GB ingested plus $1.70/million log events/month. For a startup logging 10GB/day, that's $1,000+/month just for logs.
Loki's cost model is radically different. You pay for storage, not ingestion events. Since Loki only indexes labels (not log content), storage requirements are 5-10x smaller than Elasticsearch for the same log volume. A self-hosted Loki stack handling 10GB/day of logs fits comfortably on a $20-40/month VPS with 100GB storage. Annual cost: $240-480 versus $12,000+ on Datadog.
The Grafana integration is Loki's biggest practical advantage. If you're already running Grafana for metrics (Prometheus), adding Loki means your logs appear in the same dashboards, with the same time range controls, correlated with your metrics. When CPU spikes at 3 AM, you click into that time range and immediately see the error logs that caused it — no switching between tools, no separate log search interface.
Data residency and compliance are increasingly important. Shipping all your application logs to a third-party SaaS means that vendor has access to your error messages, user IDs, IP addresses, and potentially sensitive data in stack traces. Self-hosted Loki keeps logs on infrastructure you control.
When NOT to self-host Loki: If you need full-text search across log content (not just label-based filtering), Loki's regex-only approach will frustrate you — Elasticsearch handles this better. Also, Loki's operational complexity increases at scale — for large distributed systems with hundreds of log sources, the managed Grafana Cloud offering may be worth the cost for the operational simplicity.
Prerequisites
The Loki stack consists of three components — Loki itself, Promtail (log collector), and Grafana (UI) — and each has different resource requirements.
Server specs: Loki's resource usage scales with log ingestion rate, not log volume. For a single server running 10-20 Docker containers, 2 vCPUs and 4GB RAM handles the full PLG stack (Prometheus + Loki + Grafana) comfortably. If you're aggregating logs from multiple servers, scale Loki to 4 vCPUs and 8GB RAM. Storage is the main constraint — plan for 10-20GB per day of raw logs before Loki's compression (actual storage is typically 5x smaller). See our VPS comparison for self-hosters for good options at the 4 vCPU tier.
Operating system: Ubuntu 22.04 LTS. Docker's log driver integrations work best on Ubuntu, and Promtail's Docker service discovery requires access to /var/run/docker.sock.
Understanding labels: Loki's query performance depends entirely on label design. Labels should be low-cardinality values (container name, service name, log level) — never put user IDs, request IDs, or timestamps in labels. High-cardinality labels cause Loki to create millions of streams and degrade performance significantly.
Retention planning: Set retention_period in loki-config.yml before you start ingesting. The default is no retention (logs kept forever). For most setups, 30-90 days is appropriate. You'll also want automatic backups of the Loki data volume.
Skill level: Intermediate. Understanding of Docker Compose, basic YAML, and log concepts (what a label is, what a stream is) is needed.
Production Security Hardening
Loki's default configuration has no authentication — anyone who can reach port 3100 can query all your logs. Logs often contain sensitive data: error messages with user IDs, stack traces with file paths, request logs with IP addresses. Lock this down. Follow the self-hosting security checklist and implement these Loki-specific measures:
Firewall (UFW): Loki's port (3100) should only be accessible from Grafana and Promtail — never from the public internet.
sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Loki (3100) and Grafana internal (3000) should not be exposed directly
sudo ufw enable
Grafana authentication: Disable anonymous access and configure Grafana's built-in user management or OAuth. Set a strong admin password in the environment variable rather than using the default admin/admin.
Secrets management: Keep Grafana passwords, S3 credentials, and any other secrets in a .env file, never hardcoded in docker-compose.yml:
# .env (never commit this)
GRAFANA_PASSWORD=your-strong-grafana-password
AWS_ACCESS_KEY=your-s3-key
AWS_SECRET_KEY=your-s3-secret
echo ".env" >> .gitignore
Disable SSH password authentication: Edit /etc/ssh/sshd_config: set PasswordAuthentication no and PermitRootLogin no. Restart: sudo systemctl restart ssh.
Enable auth_enabled: true in Loki for multi-tenant: If you're aggregating logs from multiple teams and need access control, enable auth. With auth_enabled: true, each request must include an X-Scope-OrgID header. Configure Grafana's Loki datasource with org-specific credentials.
Automatic security updates:
sudo apt install unattended-upgrades
sudo dpkg-reconfigure --priority=low unattended-upgrades
Regular backups: The loki_data volume contains all your indexed logs. Back this up with restic or borgbackup. See automated server backups with restic for a production-tested approach that handles Docker volumes elegantly.
Effective Log Management Strategy
Getting value from Loki requires more than just collecting logs — it requires a thoughtful label schema, sensible retention policies, and dashboards designed to surface actionable signals.
Label design is the most important architectural decision for a Loki deployment. Labels in Loki are like dimensions in a time-series database — they define what makes each log stream unique. The rule is to keep label cardinality low. Good labels are things like container (50 values for a typical server), service (10-20 values), environment (3 values: prod/staging/dev), and level (4 values: debug/info/warn/error). Bad labels are things like request_id, user_id, or trace_id — these create millions of unique streams, degrade query performance, and consume massive amounts of index storage.
Instead of putting high-cardinality data in labels, extract it at query time using LogQL's parsing capabilities. A log line like {"level":"error","user_id":"12345","message":"payment failed"} should be labeled with just level and have user_id extracted at query time with | json | user_id="12345". This keeps your index small while still enabling rich queries.
Define alert thresholds based on baselines, not intuition. Your first step before writing any Loki alert rules should be running queries over a week of production logs to understand normal error rates. If your application generates 3-5 errors per minute normally, an alert threshold of 10 errors per minute might be appropriate. An alert that fires constantly because the threshold is too low trains your team to ignore it.
The Grafana "Explore" view is your primary tool for ad-hoc investigation. Unlike building dashboards (which requires knowing what you're looking for), Explore lets you iterate on LogQL queries interactively. When something goes wrong in production, open Explore, select a time range around the incident, and start with broad queries: {service="api"} |= "error". Then narrow by adding more filters until you find the specific log lines that explain the incident.
For multi-server environments, Promtail runs on each server and ships logs to your central Loki instance. Tag each server's logs with a host label: - replacement: your-server-1 in the Promtail config under target_label: host. This lets you query {host="server-1"} to isolate issues to specific machines, or use regex {host=~"server-[12]"} to query multiple servers simultaneously.
Troubleshooting Common Issues
Loki returns "context deadline exceeded" on queries
This usually means your query is too broad and scanning too many log streams. Add more specific label selectors to narrow the stream set — {container="myapp"} instead of {job="containerlogs"}. Also check Loki's resource usage: docker stats loki. If Loki is CPU-bound during queries, it may need more resources or a query timeout increase in config.
Promtail not picking up new containers
Promtail's Docker service discovery refreshes based on refresh_interval. If you add a new container and Promtail doesn't start tailing its logs within the interval, check docker compose logs -f promtail. Common issues: the container isn't producing logs to stdout/stderr (some apps write to files instead), or the Docker socket isn't mounted correctly in the Promtail container.
"entry out of order" errors in Loki
Loki requires log entries to arrive in timestamp order within a stream. If your application produces out-of-order logs, add unordered_writes: true to the Loki config. Also check that your server's time is synchronized with NTP — clock skew between the logging host and Loki causes ordering issues.
Disk fills up rapidly
If you didn't configure retention, Loki keeps logs forever. Set retention_period: 744h (31 days) in limits_config and ensure the compactor is configured with retention_enabled: true. For immediate relief, you can manually delete old chunks from the storage directory, but the compactor is the proper mechanism.
Grafana shows "no data" for Loki queries
Verify Loki is healthy: curl http://localhost:3100/ready should return ready. Then check the datasource URL in Grafana — if Grafana and Loki are in the same Docker Compose network, use the service name (http://loki:3100), not localhost. Test the datasource from Grafana → Data Sources → Loki → Test.
High memory usage in Loki
Loki caches chunks in memory for faster reads. If memory usage is excessive, reduce chunk_idle_period and max_chunk_age in config to flush chunks to disk sooner. Also verify your label cardinality isn't exploding — curl http://localhost:3100/metrics | grep loki_ingester_streams_created_total shows the current total active stream count.
See all open source monitoring and logging tools at OSSAlt.com/categories/devops.
See open source alternatives to Grafana on OSSAlt.
The SaaS-to-Self-Hosted Migration Guide (Free PDF)
Step-by-step: infrastructure setup, data migration, backups, and security for 15+ common SaaS replacements. Used by 300+ developers.
Join 300+ self-hosters. Unsubscribe in one click.