SigNoz vs Datadog: Self-Hosted Observability 2026

A VP of Engineering budgeted $12,000/month for Datadog and received a bill for $147,000. Data from 47 companies shows Datadog's actual invoices run 3–12x higher than initial estimates. A 500-host enterprise deployment — fully outfitted with APM, logs, and custom metrics — was estimated by Datadog at $446,499 per month.

SigNoz (24,000+ GitHub stars) offers traces, metrics, and logs in a single unified interface, is Apache 2.0 licensed, and costs $49/month on their cloud tier or just infrastructure costs when self-hosted. This is the comparison worth reading before you sign a Datadog contract.

TL;DR

Datadog's pricing model has multiple billing dimensions (per-host, per-GB ingestion, per-million log events, per-custom-metric) that compound unpredictably. Teams reliably spend 3–12x their initial estimates. SigNoz delivers the same three observability pillars — logs, metrics, traces — in a single ClickHouse-backed tool, OpenTelemetry-native from day one, with no "custom metric" surcharge for using open standards. Self-hosted SigNoz costs infrastructure only; SigNoz Cloud starts at $49/month with usage-based pricing that scales predictably.

Key Takeaways

Datadog typical bill for 50 hosts with APM + logs: $8,000–$25,000/month — not the $2,300 base rate
SigNoz GitHub stars: 24,000+; Apache 2.0 license; ~10 million Docker pulls
SigNoz Cloud: $49/month includes $49 of usage; additional at $0.30/GB for logs and traces
Self-hosted cost: Infrastructure only — a 16 GB RAM server at ~$50–200/month depending on volume
OTel native: SigNoz was built on OpenTelemetry; Datadog charges OTel metrics as "custom metrics"
Query advantage: SigNoz exposes raw ClickHouse SQL for ad-hoc queries; Datadog's query languages are proprietary

Why Datadog Bills Are So High

Datadog uses a multi-dimensional pricing model that looks simple on the pricing page and becomes complex in practice:

Dimension 1: Per-host infrastructure

Infrastructure Pro: $15/host/month
APM (traces): $31/host/month
Just infra + APM for 50 hosts: $2,300/month

Dimension 2: Log ingestion + indexing (two separate charges)

Log ingestion: $0.10/GB
Log indexing: $1.70/million log events
A team ingesting 100 GB/day of logs: $300/month ingestion + potentially thousands in indexing

Dimension 3: Custom metrics

All OpenTelemetry-generated metrics count as "custom metrics"
Custom metrics pricing increases with cardinality — high-cardinality labels multiply costs
Adding OTel instrumentation to your stack can trigger significant unexpected charges

Dimension 4: High-water mark billing

Datadog measures host count hourly, drops top 1%, bills at the peak
Auto-scaling events (a traffic spike that scales you from 50 to 200 hosts for 4 hours) creates a permanently higher monthly bill

Dimension 5: AI/LLM observability

Datadog's LLM Observability product bills at $120/day (~$3,600/month)
It auto-activates when AI spans are detected in your traces — no manual opt-in required

The compounding effect: a 50-host team with APM, moderate logging, and some custom metrics easily reaches $8,000–$25,000/month. A team that auto-scales during peak traffic and uses OTel can hit $50,000+ without anyone making a deliberate decision to spend that.

SigNoz: What You Get

SigNoz is a unified observability platform built on ClickHouse (a columnar database designed for analytics). All three observability signals — traces, metrics, logs — share a single backend and query interface.

Architecture:

Your application (instrumented with OpenTelemetry SDK)
    ↓
OpenTelemetry Collector
    ↓
SigNoz Query Service
    ↓
ClickHouse (storage + analytics)
    ↓
SigNoz Frontend (React UI)

Core features:

Distributed tracing (APM) with flame graphs and service maps
Metrics monitoring with PromQL support
Log management with full-text search
Infrastructure monitoring (host metrics via OTel collector)
Unified dashboards across all signal types
Alerts with multiple notification channels (Slack, PagerDuty, webhook)

What SigNoz does not have (vs. Datadog):

RUM (Real User Monitoring) — no equivalent
Synthetic monitoring — no equivalent
SIEM security features — Datadog has security monitoring; SigNoz does not
Mobile APM — limited
Large pre-built integration library (~500+ in Datadog vs. OTel collector plugins in SigNoz)

For the core observability use case — traces, metrics, logs for backend services — SigNoz covers it. For full-stack observability including frontend monitoring and security, Datadog's broader surface area is real.

Pricing Comparison

Scenario	Datadog	SigNoz Cloud	SigNoz Self-Hosted
10 hosts, APM + infra only	~$460/month	~$49–200/month	~$50–150/month VPS
50 hosts, APM + logs + metrics	$8,000–$25,000/month	~$500–2,000/month	~$200–500/month infra
With auto-scaling (peak 200 hosts)	Bill spikes permanently	No per-host charge	VPS cost stays fixed
OTel metrics	Custom metric surcharge	No surcharge	No surcharge
Enterprise (500 hosts)	~$446,000/month (Datadog estimate)	$4,000+/month	Custom

SigNoz Cloud pricing:

Teams: $49/month includes $49 of usage
Logs: $0.30/GB ingested
Traces: $0.30/GB ingested
Metrics: $0.10/million samples
No per-host charge
Data retention: 15/30/90/180 days or 1 year

Self-Hosted Setup with Docker Compose

SigNoz requires a minimum of 8 GB RAM and 4 CPU cores for Docker. Recommended production: 16 GB RAM, 8 CPU cores.

# Clone the SigNoz repository
git clone -b main https://github.com/SigNoz/signoz.git && cd signoz/deploy

# Start SigNoz (Docker Compose)
docker compose up -d

SigNoz will start at http://localhost:3301. On first load, it displays a sample application called "HotROD" showing how trace data looks.

Production docker-compose.yaml with custom config:

version: "3"

services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.1.2-alpine
    restart: unless-stopped
    volumes:
      - ./data/clickhouse:/var/lib/clickhouse
      - ./clickhouse-config.xml:/etc/clickhouse-server/config.d/config.xml
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

  query-service:
    image: signoz/query-service:latest
    restart: unless-stopped
    environment:
      - ClickHouseUrl=tcp://clickhouse:9000
      - STORAGE=clickhouse
    depends_on:
      - clickhouse

  frontend:
    image: signoz/frontend:latest
    restart: unless-stopped
    ports:
      - "3301:3301"
    depends_on:
      - query-service

  otel-collector:
    image: signoz/signoz-otel-collector:latest
    restart: unless-stopped
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
    depends_on:
      - clickhouse

OpenTelemetry Kubernetes setup uses the official Helm chart:

helm repo add signoz https://charts.signoz.io
helm install -n platform signoz signoz/signoz

Instrumenting Your Application

SigNoz uses the standard OpenTelemetry SDK — the same instrumentation works for SigNoz, Jaeger, Tempo, or any OTel-compatible backend.

Node.js:

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://your-signoz-host:4318/v1/traces',
  }),
});

sdk.start();

Python:

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(
        OTLPSpanExporter(endpoint="http://your-signoz-host:4318/v1/traces")
    )
)
trace.set_tracer_provider(provider)

Because SigNoz is OTel-native, you're not writing vendor-specific code. If you decide to switch from SigNoz to Grafana Tempo or any other OTel backend later, you change the endpoint URL — not your instrumentation.

SigNoz vs the Grafana LGTM Stack

The other major open-source observability option is Grafana's LGTM stack: Loki (logs) + Grafana (dashboards) + Tempo (traces) + Mimir (metrics).

Dimension	SigNoz	Grafana LGTM Stack
Architecture	Unified (single ClickHouse)	4 separate backends
Setup complexity	Medium	High
Unified UI	Yes	Grafana frontend, but querying across systems requires context-switching
Log indexing	Full attribute indexing	Loki: max 15 low-cardinality attributes
High-cardinality support	All OTel attributes	Limited in Loki
Query language	PromQL + SQL + Query Builder	PromQL, LogQL, TraceQL (different per signal)
Managed cloud	SigNoz Cloud	Grafana Cloud
Best for	Teams wanting a single tool	Teams with existing Prometheus/Grafana investment

When to choose Grafana LGTM: If you're already running Prometheus and Grafana dashboards, adding Loki for logs and Tempo for traces is a natural extension. The tooling familiarity outweighs the complexity of managing four backends.

When to choose SigNoz: If you're starting from scratch or migrating from Datadog and want a single-pane-of-glass replacement with minimal operational complexity.

Migrating from Datadog to SigNoz

Step 1: Instrument with OpenTelemetry

If you're currently using the Datadog APM agent, you'll need to add OpenTelemetry instrumentation. Most languages have auto-instrumentation packages that handle this with minimal code changes:

# Node.js auto-instrumentation
npm install @opentelemetry/auto-instrumentations-node

# Python auto-instrumentation
pip install opentelemetry-distro && opentelemetry-bootstrap -a install

Step 2: Run parallel for 2–4 weeks

Run both Datadog and SigNoz simultaneously. Validate that SigNoz captures the same traces and metrics before cutting over. Keep Datadog as backup during the evaluation window.

Step 3: Recreate dashboards and alerts

SigNoz supports PromQL for metrics — if you have Grafana-style dashboards from Datadog, they'll need recreation but the logic translates. SigNoz's query builder makes this visual.

Step 4: Cancel Datadog

Datadog has annual contracts. Plan your migration to end before renewal.

Who Uses SigNoz

SigNoz is particularly popular with:

Engineering teams hitting unexpected Datadog bills — the most common migration trigger
Startups on a budget — $0 software cost vs. $460/month minimum for 10 Datadog hosts
Teams with GDPR/data residency requirements — self-hosted SigNoz keeps telemetry data on-premise
OTel adopters — teams standardizing on OpenTelemetry who don't want to pay Datadog's custom metric surcharge

The Bottom Line

If you're spending more than $1,000/month on Datadog and primarily need traces, metrics, and logs for backend services, SigNoz is worth a 2-week evaluation. The setup time is an afternoon. The OTel instrumentation is portable. The cost difference compounds dramatically at scale.

The honest caveats: SigNoz lacks Datadog's RUM, synthetic monitoring, security monitoring, and mobile APM. If you rely on those features, the calculus changes. For core backend observability, SigNoz is feature-complete and dramatically cheaper.

Building Dashboards in SigNoz

SigNoz ships with pre-built dashboards for common infrastructure metrics (CPU, memory, disk, network via the OTel host metrics receiver). Custom dashboards use a drag-and-drop builder with support for:

Time series charts, bar charts, and pie charts
Metrics via PromQL or the visual query builder
Log-based panels using filter/aggregate on indexed attributes
Trace-based panels (P50/P95/P99 latency, error rate, throughput by service)

Creating a service latency dashboard:

Dashboard → New Dashboard → Add Panel
Select metric type: Traces
Filter by service.name = your-service
Aggregate: P95 of duration_nano
Group by: http.url for per-endpoint breakdown

For teams migrating from Datadog, SigNoz's query builder is visually similar to Datadog's metrics explorer — the mental model transfers.

Alerts and On-Call Integration

SigNoz supports alert rules on any metric, log pattern, or trace condition. Notification channels:

Slack webhooks
PagerDuty
OpsGenie
Email
Generic webhook (connects to anything)

Example alert: Service error rate > 1%:

# In SigNoz Alerts → New Alert
alert_type: Metric
query: sum(rate(signoz_calls_total{status_code="STATUS_CODE_ERROR"}[5m]))
  / sum(rate(signoz_calls_total[5m])) * 100
condition: > 1
for: 5m
labels:
  severity: warning
annotations:
  summary: "{{ $labels.service_name }} error rate above 1%"

For on-call workflows, SigNoz integrates with PagerDuty and OpsGenie out of the box — no additional tooling required.

Storage and Retention Planning

ClickHouse is efficient but storage still accumulates. Plan retention based on your ingestion volume:

Retention	Logs @ 10 GB/day	Traces @ 5 GB/day
15 days	150 GB	75 GB
30 days	300 GB	150 GB
90 days	900 GB	450 GB

SigNoz supports tiered retention: keep high-resolution data for 30 days, downsample to longer retention. Configure in Settings → Retention:

# Example: 30-day full retention, 1-year downsampled
traces:
  cold_tier_duration: 30d
logs:
  cold_tier_duration: 30d
metrics:
  cold_tier_duration: 90d

For cost optimization, store ClickHouse data on object storage (S3, GCS, MinIO) using ClickHouse's S3Queue table engine. This reduces the local SSD requirement significantly for long-retention configurations.

OpenTelemetry Collector Configuration

The OTel Collector acts as a pipeline between your applications and SigNoz. A production-ready collector config handles batching, retry, and sampling:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 100ms
    send_batch_size: 10000
  memory_limiter:
    check_interval: 5s
    limit_mib: 1500
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow-traces
        type: latency
        latency: {threshold_ms: 500}
      - name: probabilistic
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

exporters:
  otlp:
    endpoint: "signoz-otel-collector:4317"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlp]

The tail_sampling processor reduces trace volume by keeping only errors, slow requests (>500ms), and a 10% sample of healthy traces. This alone can reduce trace storage by 70–80% while preserving all actionable data.

Browse all Datadog alternatives at OSSAlt. Related: best open-source monitoring tools, SigNoz vs Grafana comparison.

Comments