SigNoz vs Datadog: Self-Hosted Observability 2026
SigNoz vs Datadog: Self-Hosted Observability 2026
A VP of Engineering budgeted $12,000/month for Datadog and received a bill for $147,000. Data from 47 companies shows Datadog's actual invoices run 3–12x higher than initial estimates. A 500-host enterprise deployment — fully outfitted with APM, logs, and custom metrics — was estimated by Datadog at $446,499 per month.
SigNoz (24,000+ GitHub stars) offers traces, metrics, and logs in a single unified interface, is Apache 2.0 licensed, and costs $49/month on their cloud tier or just infrastructure costs when self-hosted. This is the comparison worth reading before you sign a Datadog contract.
TL;DR
Datadog's pricing model has multiple billing dimensions (per-host, per-GB ingestion, per-million log events, per-custom-metric) that compound unpredictably. Teams reliably spend 3–12x their initial estimates. SigNoz delivers the same three observability pillars — logs, metrics, traces — in a single ClickHouse-backed tool, OpenTelemetry-native from day one, with no "custom metric" surcharge for using open standards. Self-hosted SigNoz costs infrastructure only; SigNoz Cloud starts at $49/month with usage-based pricing that scales predictably.
Key Takeaways
- Datadog typical bill for 50 hosts with APM + logs: $8,000–$25,000/month — not the $2,300 base rate
- SigNoz GitHub stars: 24,000+; Apache 2.0 license; ~10 million Docker pulls
- SigNoz Cloud: $49/month includes $49 of usage; additional at $0.30/GB for logs and traces
- Self-hosted cost: Infrastructure only — a 16 GB RAM server at ~$50–200/month depending on volume
- OTel native: SigNoz was built on OpenTelemetry; Datadog charges OTel metrics as "custom metrics"
- Query advantage: SigNoz exposes raw ClickHouse SQL for ad-hoc queries; Datadog's query languages are proprietary
Why Datadog Bills Are So High
Datadog uses a multi-dimensional pricing model that looks simple on the pricing page and becomes complex in practice:
Dimension 1: Per-host infrastructure
- Infrastructure Pro: $15/host/month
- APM (traces): $31/host/month
- Just infra + APM for 50 hosts: $2,300/month
Dimension 2: Log ingestion + indexing (two separate charges)
- Log ingestion: $0.10/GB
- Log indexing: $1.70/million log events
- A team ingesting 100 GB/day of logs: $300/month ingestion + potentially thousands in indexing
Dimension 3: Custom metrics
- All OpenTelemetry-generated metrics count as "custom metrics"
- Custom metrics pricing increases with cardinality — high-cardinality labels multiply costs
- Adding OTel instrumentation to your stack can trigger significant unexpected charges
Dimension 4: High-water mark billing
- Datadog measures host count hourly, drops top 1%, bills at the peak
- Auto-scaling events (a traffic spike that scales you from 50 to 200 hosts for 4 hours) creates a permanently higher monthly bill
Dimension 5: AI/LLM observability
- Datadog's LLM Observability product bills at $120/day (~$3,600/month)
- It auto-activates when AI spans are detected in your traces — no manual opt-in required
The compounding effect: a 50-host team with APM, moderate logging, and some custom metrics easily reaches $8,000–$25,000/month. A team that auto-scales during peak traffic and uses OTel can hit $50,000+ without anyone making a deliberate decision to spend that.
SigNoz: What You Get
SigNoz is a unified observability platform built on ClickHouse (a columnar database designed for analytics). All three observability signals — traces, metrics, logs — share a single backend and query interface.
Architecture:
Your application (instrumented with OpenTelemetry SDK)
↓
OpenTelemetry Collector
↓
SigNoz Query Service
↓
ClickHouse (storage + analytics)
↓
SigNoz Frontend (React UI)
Core features:
- Distributed tracing (APM) with flame graphs and service maps
- Metrics monitoring with PromQL support
- Log management with full-text search
- Infrastructure monitoring (host metrics via OTel collector)
- Unified dashboards across all signal types
- Alerts with multiple notification channels (Slack, PagerDuty, webhook)
What SigNoz does not have (vs. Datadog):
- RUM (Real User Monitoring) — no equivalent
- Synthetic monitoring — no equivalent
- SIEM security features — Datadog has security monitoring; SigNoz does not
- Mobile APM — limited
- Large pre-built integration library (~500+ in Datadog vs. OTel collector plugins in SigNoz)
For the core observability use case — traces, metrics, logs for backend services — SigNoz covers it. For full-stack observability including frontend monitoring and security, Datadog's broader surface area is real.
Pricing Comparison
| Scenario | Datadog | SigNoz Cloud | SigNoz Self-Hosted |
|---|---|---|---|
| 10 hosts, APM + infra only | ~$460/month | ~$49–200/month | ~$50–150/month VPS |
| 50 hosts, APM + logs + metrics | $8,000–$25,000/month | ~$500–2,000/month | ~$200–500/month infra |
| With auto-scaling (peak 200 hosts) | Bill spikes permanently | No per-host charge | VPS cost stays fixed |
| OTel metrics | Custom metric surcharge | No surcharge | No surcharge |
| Enterprise (500 hosts) | ~$446,000/month (Datadog estimate) | $4,000+/month | Custom |
SigNoz Cloud pricing:
- Teams: $49/month includes $49 of usage
- Logs: $0.30/GB ingested
- Traces: $0.30/GB ingested
- Metrics: $0.10/million samples
- No per-host charge
- Data retention: 15/30/90/180 days or 1 year
Self-Hosted Setup with Docker Compose
SigNoz requires a minimum of 8 GB RAM and 4 CPU cores for Docker. Recommended production: 16 GB RAM, 8 CPU cores.
# Clone the SigNoz repository
git clone -b main https://github.com/SigNoz/signoz.git && cd signoz/deploy
# Start SigNoz (Docker Compose)
docker compose up -d
SigNoz will start at http://localhost:3301. On first load, it displays a sample application called "HotROD" showing how trace data looks.
Production docker-compose.yaml with custom config:
version: "3"
services:
clickhouse:
image: clickhouse/clickhouse-server:24.1.2-alpine
restart: unless-stopped
volumes:
- ./data/clickhouse:/var/lib/clickhouse
- ./clickhouse-config.xml:/etc/clickhouse-server/config.d/config.xml
ulimits:
nofile:
soft: 262144
hard: 262144
query-service:
image: signoz/query-service:latest
restart: unless-stopped
environment:
- ClickHouseUrl=tcp://clickhouse:9000
- STORAGE=clickhouse
depends_on:
- clickhouse
frontend:
image: signoz/frontend:latest
restart: unless-stopped
ports:
- "3301:3301"
depends_on:
- query-service
otel-collector:
image: signoz/signoz-otel-collector:latest
restart: unless-stopped
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
depends_on:
- clickhouse
OpenTelemetry Kubernetes setup uses the official Helm chart:
helm repo add signoz https://charts.signoz.io
helm install -n platform signoz signoz/signoz
Instrumenting Your Application
SigNoz uses the standard OpenTelemetry SDK — the same instrumentation works for SigNoz, Jaeger, Tempo, or any OTel-compatible backend.
Node.js:
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: 'http://your-signoz-host:4318/v1/traces',
}),
});
sdk.start();
Python:
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
provider.add_span_processor(
BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://your-signoz-host:4318/v1/traces")
)
)
trace.set_tracer_provider(provider)
Because SigNoz is OTel-native, you're not writing vendor-specific code. If you decide to switch from SigNoz to Grafana Tempo or any other OTel backend later, you change the endpoint URL — not your instrumentation.
SigNoz vs the Grafana LGTM Stack
The other major open-source observability option is Grafana's LGTM stack: Loki (logs) + Grafana (dashboards) + Tempo (traces) + Mimir (metrics).
| Dimension | SigNoz | Grafana LGTM Stack |
|---|---|---|
| Architecture | Unified (single ClickHouse) | 4 separate backends |
| Setup complexity | Medium | High |
| Unified UI | Yes | Grafana frontend, but querying across systems requires context-switching |
| Log indexing | Full attribute indexing | Loki: max 15 low-cardinality attributes |
| High-cardinality support | All OTel attributes | Limited in Loki |
| Query language | PromQL + SQL + Query Builder | PromQL, LogQL, TraceQL (different per signal) |
| Managed cloud | SigNoz Cloud | Grafana Cloud |
| Best for | Teams wanting a single tool | Teams with existing Prometheus/Grafana investment |
When to choose Grafana LGTM: If you're already running Prometheus and Grafana dashboards, adding Loki for logs and Tempo for traces is a natural extension. The tooling familiarity outweighs the complexity of managing four backends.
When to choose SigNoz: If you're starting from scratch or migrating from Datadog and want a single-pane-of-glass replacement with minimal operational complexity.
Migrating from Datadog to SigNoz
Step 1: Instrument with OpenTelemetry
If you're currently using the Datadog APM agent, you'll need to add OpenTelemetry instrumentation. Most languages have auto-instrumentation packages that handle this with minimal code changes:
# Node.js auto-instrumentation
npm install @opentelemetry/auto-instrumentations-node
# Python auto-instrumentation
pip install opentelemetry-distro && opentelemetry-bootstrap -a install
Step 2: Run parallel for 2–4 weeks
Run both Datadog and SigNoz simultaneously. Validate that SigNoz captures the same traces and metrics before cutting over. Keep Datadog as backup during the evaluation window.
Step 3: Recreate dashboards and alerts
SigNoz supports PromQL for metrics — if you have Grafana-style dashboards from Datadog, they'll need recreation but the logic translates. SigNoz's query builder makes this visual.
Step 4: Cancel Datadog
Datadog has annual contracts. Plan your migration to end before renewal.
Who Uses SigNoz
SigNoz is particularly popular with:
- Engineering teams hitting unexpected Datadog bills — the most common migration trigger
- Startups on a budget — $0 software cost vs. $460/month minimum for 10 Datadog hosts
- Teams with GDPR/data residency requirements — self-hosted SigNoz keeps telemetry data on-premise
- OTel adopters — teams standardizing on OpenTelemetry who don't want to pay Datadog's custom metric surcharge
The Bottom Line
If you're spending more than $1,000/month on Datadog and primarily need traces, metrics, and logs for backend services, SigNoz is worth a 2-week evaluation. The setup time is an afternoon. The OTel instrumentation is portable. The cost difference compounds dramatically at scale.
The honest caveats: SigNoz lacks Datadog's RUM, synthetic monitoring, security monitoring, and mobile APM. If you rely on those features, the calculus changes. For core backend observability, SigNoz is feature-complete and dramatically cheaper.
Building Dashboards in SigNoz
SigNoz ships with pre-built dashboards for common infrastructure metrics (CPU, memory, disk, network via the OTel host metrics receiver). Custom dashboards use a drag-and-drop builder with support for:
- Time series charts, bar charts, and pie charts
- Metrics via PromQL or the visual query builder
- Log-based panels using filter/aggregate on indexed attributes
- Trace-based panels (P50/P95/P99 latency, error rate, throughput by service)
Creating a service latency dashboard:
- Dashboard → New Dashboard → Add Panel
- Select metric type: Traces
- Filter by
service.name = your-service - Aggregate: P95 of
duration_nano - Group by:
http.urlfor per-endpoint breakdown
For teams migrating from Datadog, SigNoz's query builder is visually similar to Datadog's metrics explorer — the mental model transfers.
Alerts and On-Call Integration
SigNoz supports alert rules on any metric, log pattern, or trace condition. Notification channels:
- Slack webhooks
- PagerDuty
- OpsGenie
- Generic webhook (connects to anything)
Example alert: Service error rate > 1%:
# In SigNoz Alerts → New Alert
alert_type: Metric
query: sum(rate(signoz_calls_total{status_code="STATUS_CODE_ERROR"}[5m]))
/ sum(rate(signoz_calls_total[5m])) * 100
condition: > 1
for: 5m
labels:
severity: warning
annotations:
summary: "{{ $labels.service_name }} error rate above 1%"
For on-call workflows, SigNoz integrates with PagerDuty and OpsGenie out of the box — no additional tooling required.
Storage and Retention Planning
ClickHouse is efficient but storage still accumulates. Plan retention based on your ingestion volume:
| Retention | Logs @ 10 GB/day | Traces @ 5 GB/day |
|---|---|---|
| 15 days | 150 GB | 75 GB |
| 30 days | 300 GB | 150 GB |
| 90 days | 900 GB | 450 GB |
SigNoz supports tiered retention: keep high-resolution data for 30 days, downsample to longer retention. Configure in Settings → Retention:
# Example: 30-day full retention, 1-year downsampled
traces:
cold_tier_duration: 30d
logs:
cold_tier_duration: 30d
metrics:
cold_tier_duration: 90d
For cost optimization, store ClickHouse data on object storage (S3, GCS, MinIO) using ClickHouse's S3Queue table engine. This reduces the local SSD requirement significantly for long-retention configurations.
OpenTelemetry Collector Configuration
The OTel Collector acts as a pipeline between your applications and SigNoz. A production-ready collector config handles batching, retry, and sampling:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 100ms
send_batch_size: 10000
memory_limiter:
check_interval: 5s
limit_mib: 1500
tail_sampling:
decision_wait: 10s
policies:
- name: errors-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: slow-traces
type: latency
latency: {threshold_ms: 500}
- name: probabilistic
type: probabilistic
probabilistic: {sampling_percentage: 10}
exporters:
otlp:
endpoint: "signoz-otel-collector:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp]
The tail_sampling processor reduces trace volume by keeping only errors, slow requests (>500ms), and a 10% sample of healthy traces. This alone can reduce trace storage by 70–80% while preserving all actionable data.
Browse all Datadog alternatives at OSSAlt. Related: best open-source monitoring tools, SigNoz vs Grafana comparison.