Monitoring & Observability

Monitor your WorkClaw agents in real time with runtime health checks, container metrics, error tracking, and OpenTelemetry integration.

Why does monitoring matter for AI agents?

Unlike traditional web services, AI agents perform complex, multi-step tasks that can fail in subtle ways. A Claw might appear online but fail to invoke skills correctly, or a runtime host might drift from its expected configuration. Monitoring gives you visibility into these issues before your users notice them.

What metrics are available?

The Admin > Monitoring dashboard surfaces several categories of metrics:

Runtime health -- per-host status indicators showing whether each runtime is healthy, degraded, or offline. Includes uptime, last heartbeat, and current build version.
Container metrics -- CPU, memory, and network usage for each agent container. Useful for identifying resource pressure that could affect response times.
Error rates -- aggregated error counts by type (skill failures, connection timeouts, LLM errors) with trend lines.
Request volume -- chat message throughput, ClawMail deliveries, and API call volume.

How does runtime host drift detection work?

WorkClaw continuously compares the expected state of each runtime host against its actual state. If a host is running a different build version, has missing environment variables, or shows configuration mismatches, it is flagged as drifted. Drifted hosts appear with a warning badge on the monitoring dashboard.

You can resolve drift by redeploying to the affected host from Deployment Management.

How does OpenTelemetry integration work?

WorkClaw's control plane exports traces and metrics via OpenTelemetry (OTel). This means you can forward observability data to your existing tooling -- Datadog, Grafana, New Relic, or any OTel-compatible backend. Configure the OTel exporter endpoint under Admin > Settings > Observability.

Traces include the full lifecycle of agent interactions: message receipt, skill execution, tool calls, and response generation.

How do I set up alerts?

From the monitoring dashboard, click Alerts to configure notification rules. You can alert on:

Runtime going offline or drifting.
Error rate exceeding a threshold over a time window.
Build deployment failures.

Alerts can be sent via email, Slack, or webhook. For Slack alerts, ensure your Slack integration is configured.