How to Self-Host Netdata Monitoring 2026

TL;DR

Netdata (GPL 3.0, ~69K GitHub stars, C) is the most comprehensive real-time infrastructure monitoring tool available. It collects 1-second resolution metrics for CPU, memory, disk, network, processes, containers, databases, and 800+ other systems — and streams them to a beautiful web dashboard with zero configuration. Unlike Prometheus (pull-based), Netdata agents push metrics in real-time. Run it on any server for instant observability with no setup beyond docker compose up.

Key Takeaways

Netdata: GPL 3.0, ~69K stars, C — 1-second per-metric resolution, 800+ collectors
Zero config: Automatically detects and monitors MySQL, PostgreSQL, Redis, Nginx, Docker, etc.
Anomaly detection: ML-based anomaly detection on every metric, no configuration needed
Alerts: Built-in alert conditions for common failure patterns with notifications
Distributed: Each agent is standalone; use Netdata Parents for centralized multi-host view
vs Prometheus+Grafana: Netdata is turnkey; Prometheus is more customizable but needs configuration

Netdata vs Prometheus+Grafana vs Zabbix

Feature	Netdata	Prometheus+Grafana	Zabbix
License	GPL 3.0	Apache 2.0	AGPL 2.0
Setup time	Minutes	Hours	Hours
Resolution	1 second	15s default	1 minute
Auto-discovery	Yes (800+ collectors)	Manual scrape config	Agent-based
Anomaly detection	Yes (ML, built-in)	Manual rules	Trigger-based
Long-term storage	1 month (local)	Forever (disk)	DB-based
Dashboards	Built-in	Grafana required	Built-in
Alerting	Built-in	Alertmanager	Built-in
Agent RAM	~100MB	~50MB	~200MB

Part 1: Docker Setup

Single-node monitoring

# docker-compose.yml
services:
  netdata:
    image: netdata/netdata:latest
    container_name: netdata
    restart: unless-stopped
    pid: host
    network_mode: host     # Required for full network monitoring
    cap_add:
      - SYS_PTRACE
      - SYS_ADMIN
    security_opt:
      - apparmor:unconfined
    volumes:
      - netdataconfig:/etc/netdata
      - netdatalib:/var/lib/netdata
      - netdatacache:/var/cache/netdata
      - /etc/passwd:/host/etc/passwd:ro
      - /etc/group:/host/etc/group:ro
      - /etc/localtime:/etc/localtime:ro
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
      - /var/log:/host/var/log:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      NETDATA_CLAIM_TOKEN: "${NETDATA_CLAIM_TOKEN}"    # Optional: Netdata Cloud claim
      NETDATA_CLAIM_URL: "https://app.netdata.cloud"
      NETDATA_CLAIM_ROOMS: "${NETDATA_CLAIM_ROOMS}"

volumes:
  netdataconfig:
  netdatalib:
  netdatacache:

docker compose up -d

Visit http://your-server:19999 — the Netdata dashboard loads immediately.

Note: network_mode: host is needed for Netdata to see host network interfaces and monitor processes properly. Port 19999 is exposed by default.

Part 2: HTTPS with Caddy

Since Netdata uses network_mode: host, it's already on port 19999. Add a Caddy reverse proxy on the same host:

metrics.yourdomain.com {
    reverse_proxy localhost:19999
}

Access Control

Netdata is open by default. Restrict access:

# Edit netdata.conf:
docker exec -it netdata sh
vi /etc/netdata/netdata.conf

# Add:
[web]
    allow connections from = localhost 192.168.0.0/24 10.0.0.0/8

Or restrict entirely and access only via Caddy with basic auth:

metrics.yourdomain.com {
    basicauth {
        admin $2a$14$hashofyourpassword
    }
    reverse_proxy localhost:19999
}

Part 3: Auto-Detected Collectors

Netdata automatically detects and configures:

Service	What it monitors
Docker	Container CPU, memory, network, I/O
PostgreSQL	Queries/sec, connections, replication lag, table bloat
MySQL/MariaDB	Queries, threads, InnoDB metrics
Redis	Operations/sec, memory, hit rate, keyspace
Nginx	Requests/sec, connections, response codes
Node.js	Event loop lag, heap, GC (via node_exporter)
systemd	Service status, CPU, memory per service
Disk	IOPS, latency, utilization per device
Network	Packets/sec, bandwidth, errors per interface
CPU	Per-core utilization, interrupts, softirqs

No configuration needed — Netdata finds running services automatically.

Part 4: Custom Alerts

Default alerts cover common failure scenarios. Add custom ones:

# Edit alerts:
docker exec -it netdata sh
vi /etc/netdata/health.d/custom.conf

# Alert if disk usage > 85%:
alarm: disk_space_warning
on: disk.space
os: linux
lookup: average -10m unaligned of used
calc: $this * 100 / ($used + $avail)
every: 1m
warn: $this > 85
crit: $this > 95
info: disk ${label:mount_point} space utilization
delay: down 5m multiplier 1.5 max 1h
to: sysadmin

# Alert if PostgreSQL has too many connections:
alarm: pg_connections_warning
on: postgres.connections
lookup: average -5m unaligned
every: 1m
warn: $this > 80
crit: $this > 95
info: PostgreSQL active connections
to: dba

Part 5: Notifications

Configure alert notifications:

# Edit notifications:
docker exec -it netdata vi /etc/netdata/health_alarm_notify.conf

# Slack:
SEND_SLACK="YES"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
DEFAULT_RECIPIENT_SLACK="#alerts"

# Telegram:
SEND_TELEGRAM="YES"
TELEGRAM_BOT_TOKEN="your-bot-token"
DEFAULT_RECIPIENT_TELEGRAM="your-chat-id"

# ntfy:
SEND_NTFY="YES"
NTFY_URL="https://ntfy.yourdomain.com"
DEFAULT_RECIPIENT_NTFY="alerts"

# PagerDuty:
SEND_PAGERDUTY="YES"
PAGERDUTY_SERVICE_KEY="your-service-key"

Part 6: Distributed Multi-Server Setup

Monitor multiple servers from one UI using Netdata Parents:

On the parent server

# docker-compose.yml on parent server
environment:
  # Allow children to connect:
  NETDATA_ALLOW_CONNECTIONS_FROM: "10.0.0.0/8 192.168.0.0/24"

On each child server

# stream.conf on each agent:
docker exec -it netdata vi /etc/netdata/stream.conf

[stream]
    enabled = yes
    destination = parent.yourdomain.com:19999
    api key = your-api-key   # Same on parent's netdata.conf

[parent.yourdomain.com:19999]
    enabled = yes
    api key = your-api-key

All child server metrics stream to the parent in real-time. The parent UI shows all servers in one dashboard.

Part 7: Anomaly Detection

Netdata runs ML models on every metric:

Each metric gets a Gaussian Mixture Model (GMM) trained on the last 2 hours
Anomaly Rate = percentage of metrics in anomalous state right now
No configuration needed — it just works
Access: Dashboard → Anomaly Advisor tab

# Get anomaly rate via API:
curl "http://localhost:19999/api/v1/alarms?all" | jq '.alarms | to_entries[] | select(.value.status=="WARNING" or .value.status=="CRITICAL") | .value.name'

Maintenance

# Update Netdata:
docker compose pull
docker compose up -d

# Check Netdata version:
docker exec netdata netdata --version

# Logs:
docker compose logs -f netdata

# Reload config (no restart):
docker exec netdata kill -HUP 1

# Backup config:
tar -czf netdata-config-$(date +%Y%m%d).tar.gz \
  $(docker volume inspect netdata_netdataconfig --format '{{.Mountpoint}}')

Why Self-Host Netdata

Datadog starts at $15/host/month for infrastructure monitoring — a 10-server setup costs $1,800/year before adding APM, logs, or synthetics. New Relic's compute-based pricing lands in similar territory for any meaningful fleet. Grafana Cloud's free tier cuts off after 10,000 series, and production workloads routinely exceed that. Netdata gives you 1-second resolution monitoring for every metric, ML-based anomaly detection, and 800+ auto-configuring collectors — for the cost of the server you're already running.

The 1-second resolution is the headline differentiator. Prometheus defaults to 15-second scrape intervals, Zabbix to 1-minute polling. When a CPU spike lasts 3 seconds and causes an outage, 15-second resolution tells you "something happened." Netdata's 1-second resolution shows you exactly which processes spiked, for how long, and what happened to disk I/O and network at the same moment. This matters enormously for debugging intermittent issues that don't show up in coarser metrics.

Zero-configuration discovery is the other standout. Netdata detects running MySQL, PostgreSQL, Redis, Nginx, Docker, and dozens of other services automatically — no YAML scrape configs, no exporters to install. The moment you run a container, Netdata starts collecting from it. For teams that are already managing complex infrastructure, eliminating the maintenance overhead of Prometheus exporters and scrape config files is a significant operational saving.

The anomaly detection capability runs without any configuration. Netdata trains Gaussian Mixture Models on every metric and flags deviations from normal behavior. This catches subtle degradations — a slow memory leak, gradually increasing disk I/O, a creeping queue backup — before they become outages. This kind of ML-based alerting typically requires expensive observability platforms or significant custom engineering.

When NOT to self-host Netdata. Netdata stores metrics locally (1 month by default). If you need indefinite long-term metric retention for capacity planning or compliance, Prometheus with remote write to object storage is a better fit. Netdata's dashboards are excellent for real-time operations but are less flexible for custom reporting compared to Grafana. And if your team is already invested in the Prometheus/Grafana ecosystem, adding Netdata alongside it can create alert duplication rather than reducing overhead.

Prerequisites

Netdata's resource footprint depends on how many metrics it collects. On a baseline server with moderate services, expect around 100–150MB RAM and 2–5% CPU at idle. With many containers and databases, RAM can climb to 300–400MB. A Hetzner CX22 (2 vCPU, 4GB RAM) at €4.50/month handles a well-loaded single node comfortably. If you're running the multi-server Parent setup (collecting from 10+ child nodes), upgrade to a CX32 (8GB RAM). Refer to the VPS comparison guide for a detailed breakdown of provider options for monitoring workloads.

Docker Engine 24+ and Docker Compose v2 are required. Note that Netdata's compose configuration uses network_mode: host and mounts several /proc and /sys paths — this is necessary for Netdata to see host-level metrics rather than just container metrics. On SELinux-enabled systems (CentOS, Fedora), you may need to add :z to bind mounts or set security_opt: label:disable. The SYS_PTRACE and SYS_ADMIN capabilities are also required — these let Netdata inspect running processes and their resource usage. Without them, process-level monitoring (seeing which specific app is consuming CPU) is unavailable.

The Netdata dashboard runs on port 19999 with no authentication by default. This port should never be publicly exposed — proxy it through Caddy with at minimum HTTP basic auth. The recommended approach is to access it only over a private VPN or Tailscale/Headscale mesh, with no public exposure at all. Exposing monitoring dashboards publicly leaks infrastructure topology to anyone who discovers the URL.

For the multi-server setup, plan your network topology before deployment. The Parent node needs to be reachable from each child node on port 19999. If your servers are on different networks, you'll need either public exposure (not recommended without authentication) or a VPN mesh to link them privately.

DNS: create an A record for metrics.yourdomain.com pointing to your server. Caddy manages the TLS certificate automatically. Make sure port 80 and 443 are open in your firewall before Caddy starts, as Let's Encrypt uses an HTTP-01 challenge on port 80 for initial certificate issuance.

Production Security Hardening

Netdata's dashboard shows detailed information about every process, service, and network connection on your server. Exposing it publicly without authentication is a significant information leak.

UFW firewall. Allow only SSH, HTTP, and HTTPS. Block 19999 from external access:

ufw default deny incoming
ufw default allow outgoing
ufw allow ssh
ufw allow 80/tcp
ufw allow 443/tcp
ufw enable

Port 19999 is blocked by default since ufw default deny incoming covers it. Confirm with ufw status.

Caddy with basic auth. Protect the metrics dashboard with a password at minimum:

metrics.yourdomain.com {
    basicauth {
        admin $2a$14$yourhashhere
    }
    reverse_proxy localhost:19999
}

Generate the hash with: caddy hash-password --plaintext yourpassword.

Fail2ban. Enable the SSH jail to protect the server from brute-force login attempts:

apt install fail2ban -y

Create /etc/fail2ban/jail.local:

[DEFAULT]
bantime  = 2h
findtime = 15m
maxretry = 5

[sshd]
enabled = true

SSH hardening. Disable password authentication:

# /etc/ssh/sshd_config
PasswordAuthentication no
PermitRootLogin no

Restart SSH: systemctl restart sshd. Ensure your SSH key is in ~/.ssh/authorized_keys before doing this.

Automatic updates. OS security patches should apply without manual intervention:

apt install unattended-upgrades -y
dpkg-reconfigure --priority=low unattended-upgrades

Backup Netdata configuration. The netdataconfig volume contains your custom alert rules and any tuned collector configs. Back this up using Restic to avoid reconfiguring from scratch after a server rebuild — see the automated backup guide for the full workflow. The metrics data itself (in netdatalib and netdatacache) is ephemeral by nature and generally not worth backing up.

For a complete hardening reference, see the self-hosting security checklist.

Troubleshooting Common Issues

Netdata container starts but shows no host metrics. This usually means network_mode: host is not set, or the /proc and /sys mounts are missing. Verify the compose configuration includes both network_mode: host and all required volume mounts (particularly /proc:/host/proc:ro and /sys:/host/sys:ro). Without these, Netdata sees the container's namespaced view rather than the host.

Docker container metrics missing. Netdata needs access to the Docker socket to collect container metrics. Verify the volume mount /var/run/docker.sock:/var/run/docker.sock:ro is present in the compose file. Also confirm the SYS_PTRACE and SYS_ADMIN capabilities are listed under cap_add:.

Alert notifications not sending. Edit /etc/netdata/health_alarm_notify.conf inside the container and set SEND_SLACK="YES" (or your target). Then test with docker exec netdata /usr/libexec/netdata/plugins.d/alarm-notify.sh test. Common issues are incorrect webhook URLs or Telegram bot tokens with no permission to send messages to the target chat.

High CPU usage on the Netdata container. Netdata is written in C and is normally very efficient, but certain collectors (Python-based ones, in particular) can spike CPU if the monitored service is returning slow responses. Check the Netdata dashboard's own performance section — go to Netdata Monitoring → Plugins to see which collector is consuming CPU. Disable noisy collectors by editing the relevant config in /etc/netdata/.

Metrics data lost after container restart. Verify the Docker volumes netdatalib and netdatacache are defined and mounted correctly. If you're using bind mounts instead of named volumes, check file permissions. The Netdata process needs write access to its data directories.

Cannot access the Netdata dashboard over Caddy. Since Netdata uses network_mode: host, Caddy and Netdata are both on the host network and can communicate via localhost:19999 without a Docker bridge. If Caddy can't reach Netdata, confirm the Netdata container is healthy: docker compose ps and curl http://localhost:19999/api/v1/info. If Caddy is itself running in a container (not host mode), use host.docker.internal:19999 as the upstream address.

Database-specific collectors not showing up. Netdata auto-discovers services but needs to be able to connect to them. For PostgreSQL, Netdata needs the pg_stat_* views to be accessible. If PostgreSQL is running in a container without network mode host, Netdata may not be able to reach it. Configure the PostgreSQL collector manually in /etc/netdata/python.d/postgres.conf with the connection string for your PostgreSQL container's IP or Docker network name.

Alerts firing incorrectly for expected conditions. Netdata ships with default alert thresholds that may not match your specific workload. A 90% disk usage alert may trigger legitimately in production systems where high disk utilization is normal. Edit the relevant health configuration file inside the container to adjust thresholds for your environment. Use docker exec -it netdata vi /etc/netdata/health.d/ to find and modify the relevant alert rules. Changes take effect on the next alert evaluation cycle without a restart.

Netdata Parent not receiving data from child nodes. Multi-server streaming requires the child's stream.conf to have the correct API key and destination. The API key must also be configured on the Parent's netdata.conf under the [API_KEY] section with enabled = yes. Firewall rules on the Parent must allow inbound connections on port 19999 from the child server IPs. Use docker compose logs netdata on both parent and child to see streaming connection attempts and any rejection messages.

Very high cardinality metrics causing memory growth. If you have many short-lived containers (CI/CD build agents, for example), Netdata tracks metrics for each container ID including terminated ones. Over time this increases memory use. Configure [global] memory mode = ram to limit memory (at the cost of losing data after restart) or set a shorter history duration. For ephemeral container environments, also configure Netdata to track containers by image name rather than container ID to reduce cardinality.

See all open source monitoring tools at OSSAlt.com/categories/devops.

The SaaS-to-Self-Hosted Migration Guide (Free PDF)