Distributed tracing: Track requests across microservices (like Datadog APM) Service maps: Auto-generated topology of your services Metrics: Infrastructure and custom application metrics Log management: Correlated with traces and metrics Alerts: Anomaly detection and threshold alerts

Best Open Source Datadog Alternatives for Metrics, Logs, and APM in 2026

Q: Why VictoriaMetrics instead of Prometheus?

5–10x lower storage requirements than Prometheus for the same data Better performance at high cardinality (millions of time series) Longer retention possible on the same hardware Drop-in Prometheus replacement — Grafana, Alertmanager, and all Prometheus integrations work unchanged

Q: What Zabbix Specializes In?

Network monitoring: SNMP support, network device monitoring (switches, routers, firewalls) Agent-based monitoring: Zabbix agent installed on servers Auto-discovery: Discovers network devices automatically Template library: 1,000+ pre-built templates for common software Enterprise features: SLA tracking, trend prediction, audit log

TL;DR

Datadog charges $15–35/host/month — $150–350/month for 10 servers. The open source stack covers the same ground for server cost only. For most teams: Grafana + Prometheus + Loki replaces Datadog's metrics and log pipeline; SigNoz replaces Datadog's APM; Netdata provides real-time per-second metrics with zero config. No single tool matches Datadog feature-for-feature, but the combination gets you 90% there at 10% of the cost.

Key Takeaways

Grafana + Prometheus + Loki: Best metrics + logs stack, ~62K + 55K + 23K stars
SigNoz: Best APM (distributed tracing + metrics + logs unified), ~19K stars
Netdata: Best real-time monitoring, ~73K stars, 1-min setup, per-second resolution
VictoriaMetrics: Best Prometheus alternative for high-cardinality data, ~12K stars
Zabbix: Best for enterprise/network monitoring, GPL, ~8K stars
Cost: $0 vs Datadog's $150–350/month for 10 hosts

What Datadog Provides (and OSS alternatives)

Datadog Feature	Open Source Replacement
Infrastructure metrics	Prometheus + Grafana / Netdata
Log management	Loki + Grafana / ELK Stack
APM (distributed tracing)	SigNoz / Jaeger + Grafana Tempo
Dashboards	Grafana
Alerting	Grafana Alerts / Alertmanager
Container monitoring	cAdvisor + Prometheus
Synthetic monitoring	Checkly (OSS) / Blackbox Exporter
Profiling	Grafana Pyroscope

1. Grafana + Prometheus + Loki: The Core Stack

The most widely deployed open source monitoring stack. Covers metrics, logs, and dashboards with a consistent UI.

Prometheus (~55K stars): Metrics collection and storage
Grafana (~62K stars): Dashboards, visualization, alerting
Loki (~23K stars): Log aggregation and search
node_exporter + cAdvisor: Host and container metrics

Deployment: See our full setup guide at /guides/grafana-prometheus-loki-self-hosted-observability-stack-2026.

Cost comparison:

Datadog Pro: $23/host × 10 hosts = $230/month
Grafana+Prometheus+Loki: ~$15/month VPS running the stack = $15/month

Gaps vs Datadog:

No built-in APM / distributed tracing (add SigNoz for this)
Requires more configuration upfront
No hosted SaaS option (use Grafana Cloud free tier if you want managed)

2. SigNoz: The APM Replacement

SigNoz is an open source APM (Application Performance Monitoring) platform with ~19K GitHub stars. It's the most direct replacement for Datadog APM — unified metrics, logs, and distributed tracing in one UI, built on OpenTelemetry.

What SigNoz Does

Distributed tracing: Track requests across microservices (like Datadog APM)
Service maps: Auto-generated topology of your services
Metrics: Infrastructure and custom application metrics
Log management: Correlated with traces and metrics
Alerts: Anomaly detection and threshold alerts

Quick Deploy

# docker-compose.yml — SigNoz uses ClickHouse as the backend
# The full compose file has many services; use the official installer:

# Clone and run:
git clone https://github.com/SigNoz/signoz.git
cd signoz/deploy
./install.sh

# Access at: http://your-server:3301

Instrument Your App (OpenTelemetry)

SigNoz uses the OpenTelemetry standard — the same SDK works for any OTEL-compatible backend:

// Node.js instrumentation:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-http';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://your-signoz-server:4318/v1/traces',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: 'http://your-signoz-server:4318/v1/metrics',
    }),
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

# Python:
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(endpoint="http://your-signoz-server:4318/v1/traces")

Resource requirements: SigNoz needs ~4GB RAM for a small production deployment (ClickHouse is resource-intensive).

3. Netdata: Real-Time Monitoring

Netdata (~73K stars) provides per-second metrics with zero configuration. Install in 60 seconds, get instant visibility into CPU, memory, disk, network, Docker containers, Postgres, Redis, Nginx, and 800+ more.

Best for: Real-time operational monitoring — "is the server behaving normally right now?"

# Install:
bash <(curl -Ss https://my-netdata.io/kickstart.sh)

# Immediate dashboard at:
http://your-server:19999

See our full guide: /guides/how-to-self-host-netdata-real-time-server-monitoring-2026

Gaps vs Datadog:

No long-term metric retention in free version
No APM / distributed tracing
Less powerful alerting than Prometheus Alertmanager

4. VictoriaMetrics: High-Performance Prometheus

VictoriaMetrics is a fast, cost-efficient monitoring solution compatible with Prometheus and Grafana. ~12K GitHub stars, Apache 2.0 license.

Why VictoriaMetrics instead of Prometheus

5–10x lower storage requirements than Prometheus for the same data
Better performance at high cardinality (millions of time series)
Longer retention possible on the same hardware
Drop-in Prometheus replacement — Grafana, Alertmanager, and all Prometheus integrations work unchanged

Quick Deploy

services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:latest
    ports:
      - "8428:8428"
    volumes:
      - victoriametrics_data:/victoria-metrics-data
    command:
      - '--storageDataPath=/victoria-metrics-data'
      - '--retentionPeriod=12'   # 12 months retention

Prometheus Config (just change remote_write URL)

# prometheus.yml — use VictoriaMetrics as remote storage:
remote_write:
  - url: http://victoriametrics:8428/api/v1/write

# Or replace Prometheus entirely — VictoriaMetrics has its own scrape config

When to use: If Prometheus is struggling with storage costs or high-cardinality metrics from hundreds of services/containers.

5. Zabbix: Enterprise Infrastructure Monitoring

Zabbix is a mature enterprise monitoring platform with ~8K GitHub stars and 20+ years of production use. GPL 2.0 license.

What Zabbix Specializes In

Network monitoring: SNMP support, network device monitoring (switches, routers, firewalls)
Agent-based monitoring: Zabbix agent installed on servers
Auto-discovery: Discovers network devices automatically
Template library: 1,000+ pre-built templates for common software
Enterprise features: SLA tracking, trend prediction, audit log

When to Choose Zabbix

You're monitoring network infrastructure (not just servers/containers)
Your team already knows Zabbix
You need SNMP monitoring for network devices
You want a GUI-first monitoring tool (not YAML config)

Gaps: Steeper learning curve, heavier than Netdata, less modern than Grafana.

6. Elastic Stack (ELK): Log-Focused Observability

Elasticsearch + Logstash + Kibana — the original large-scale log management stack.

Component	Purpose
Elasticsearch	Full-text search and analytics engine for logs
Logstash	Log ingestion and transformation pipeline
Kibana	Visualization and search UI
Beats (Filebeat)	Lightweight log shippers

License note: Elasticsearch and Kibana changed from Apache 2.0 to SSPL (non-OSS) in 2021. OpenSearch (AWS fork) maintains the Apache 2.0 license.

When to use ELK/OpenSearch: You need powerful full-text log search across petabytes of logs. For most self-hosters, Loki + Grafana provides a simpler and lighter alternative.

Decision Guide

For most self-hosted teams (start here):
  Grafana + Prometheus + Loki
  → Add Netdata for real-time per-second monitoring
  → Add SigNoz for distributed tracing if needed

For high-cardinality metrics (100K+ time series):
  VictoriaMetrics as Prometheus backend

For real-time only (simplest setup):
  Netdata → 60 seconds to install, zero config

For APM + distributed tracing:
  SigNoz (all-in-one) or Jaeger + Grafana Tempo

For network/SNMP monitoring:
  Zabbix

For log search at scale:
  OpenSearch (Apache 2.0 Elasticsearch fork) + OpenSearch Dashboards

Cost Breakdown: Self-Hosted vs Datadog

Setup: 10 application servers + monitoring server

Solution	Infrastructure	License	Total/month
Datadog Pro (10 hosts)	Included	$230	$230
Datadog Enterprise (10 hosts)	Included	$350	$350
Grafana+Prometheus+Loki	$6–15/month VPS	$0	$6–15
SigNoz	$8/month VPS	$0	$8
Netdata	$0 (runs on monitored servers)	$0	$0
Full self-hosted stack	$15–20/month	$0	$15–20

Alerting and Incident Response Without Datadog

A monitoring stack without alerting is just a dashboard — the operational value comes from reliable, low-noise alerting that routes the right signal to the right person.

Grafana Alerting (v11+) is now the standard alerting layer for the self-hosted stack. Grafana Alerting supports:

Alert rules on any Grafana data source (Prometheus, Loki, InfluxDB, PostgreSQL)
Alert routing to PagerDuty, OpsGenie, Slack, email, and webhook
Alert grouping and inhibition rules (silence low-priority alerts when a high-severity one fires)
Multi-dimensional alerts (alert per-service, per-host, per-endpoint — matching Datadog's monitor scoping)
Notification policies with escalation chains

For teams coming from Datadog, the functional equivalence is high. The configuration syntax differs (Grafana uses YAML-based provisioning; Datadog uses a GUI with HCL export), but the alert model — conditions, thresholds, anomaly detection, composite monitors — maps directly.

Alertmanager (the Prometheus native alerting component) handles routing, grouping, and deduplication of alerts generated by Prometheus recording rules. For simpler stacks without Grafana, Alertmanager + Prometheus recording rules is a low-overhead alerting setup.

SigNoz includes a built-in alerting UI that creates alerts directly from traces, metrics, and logs in a single interface. For teams migrating from Datadog's APM-integrated monitors (alert when p99 latency > 500ms for service X), SigNoz's correlation between traces and alerts is the closest open source equivalent.

Migrating from Datadog: Practical Steps

The migration path from Datadog follows a parallel-run approach to avoid monitoring gaps:

Step 1: Install the self-hosted stack alongside Datadog. Run Prometheus and the Grafana stack for 1-2 weeks while still paying for Datadog. This validates that your metrics collection is complete and catches any gaps in coverage.

Step 2: Recreate your Datadog dashboards in Grafana. Datadog dashboard exports (JSON) don't import directly to Grafana, but the panel types (timeseries, table, histogram, heatmap) have direct equivalents. For teams with many dashboards, Grafana's provisioning API allows bulk dashboard creation.

Step 3: Migrate Datadog monitors to Grafana alerting rules. Map each Datadog monitor to a Prometheus recording rule + Grafana alert rule. Test alerts fire correctly by temporarily lowering thresholds.

Step 4: Validate log coverage with Promtail + Loki. Datadog's log management is often the most-used feature after metrics. Loki + Promtail replaces log ingestion; Grafana Explore provides the log search interface. For high-volume log environments, benchmark query performance against your retention requirements.

Step 5: Cancel Datadog. After 2-4 weeks of parallel operation with no gaps, the migration is complete.

For the complete self-hosting setup, see our Grafana + Prometheus self-hosted observability guide and the best open source monitoring tools roundup for additional tool comparisons.

Infrastructure Requirements for Your Monitoring Stack

One practical question when evaluating the migration is what server resources the self-hosted stack requires. Datadog's agent runs on every monitored host and ships data to Datadog's cloud — you pay per host but don't provision monitoring infrastructure yourself. Self-hosting shifts that burden to you.

Grafana + Prometheus stack requirements (monitoring 10-30 servers):

A single VPS with 2 vCPUs and 4GB RAM handles the Prometheus + Grafana + Loki + Alertmanager stack comfortably at this scale
Disk: budget 10-20GB per month for Prometheus TSDB (default 15-day retention), 5-15GB per month for Loki log storage depending on verbosity
At 30-50 servers: upgrade to 4 vCPUs / 8GB RAM; consider VictoriaMetrics as the Prometheus backend for its better compression and lower memory footprint

Netdata (agent-based, runs on each monitored server):

Netdata runs on the same server it monitors, adding only 1-2% CPU overhead and 100-200MB RAM per host
No separate monitoring server needed for standalone Netdata — each node has its own dashboard
For centralized dashboards across many Netdata nodes: Netdata Cloud (free tier available) or Grafana federation

SigNoz (all-in-one APM):

Minimum 4 vCPUs / 8GB RAM for the SigNoz stack (ClickHouse-backed trace storage is resource-intensive)
Recommended 8 vCPUs / 16GB RAM for production workloads with >10 instrumented services
Docker Compose deployment is well-documented; Kubernetes Helm chart available for larger deployments

The infrastructure cost at $6-20/month is the headline number, but don't ignore the operational time budget: expect 4-8 hours of initial setup, then 1-2 hours per month for maintenance (updates, retention tuning, dashboard additions). At any engineering hourly rate, this remains dramatically cheaper than Datadog billing.

One underrated benefit of self-hosting observability: your monitoring data is yours indefinitely. Datadog's retention is limited by plan tier — Infrastructure Pro retains metrics for 15 months maximum. Grafana + Prometheus with a VictoriaMetrics long-term storage backend can retain years of metric history at minimal cost ($5-10/month for a high-density storage VPS). For teams that need historical trending for capacity planning, compliance audits, or incident post-mortems, unlimited retention is a structural advantage that no Datadog tier can match at equivalent cost.

For teams starting the migration, Netdata's self-hosting guide covers the quickest path to immediate monitoring coverage while the longer-term Grafana stack is being configured.

Compare all open source Datadog alternatives at OSSAlt.com/alternatives/datadog.

The SaaS-to-Self-Hosted Migration Guide (Free PDF)