Production telemetry runs on three legs: Sentry for errors, OpenTelemetry traces for the hot path, Horizon for queue health. The admin Site Health pill (see Site health & failed jobs) summarizes them at a glance.
Set SENTRY_DSN and unhandled exceptions across the app
flow into Sentry with stack traces, request context, and user/workspace
metadata. The breadcrumb trail captures the last 100 log lines for
every error. Integrated via sentry-laravel.
Useful filters in Sentry:
workspace_id to scope errors to a tenant.agent_id when the error originates in a widget request.
The OTEL exporter ships traces to OTEL_EXPORTER_OTLP_ENDPOINT
— typically Honeycomb or Grafana Cloud Tempo. Spans wrap the hot path:
widget.message.receive — incoming HTTP, validation, JWT verify.rag.curated.match — short-circuit check.rag.embed — query embedding call.rag.vector.search — ANN search.rag.rerank — cross-encoder.rag.prompt.assemble — local CPU work.rag.llm.first_token — time-to-first-token (the headline metric).rag.llm.stream — full stream duration.rag.persist.async — post-stream save.
Each span is tagged with workspace_id, agent_id, conversation_id,
provider (cloudflare / openai), and any cache-hit flags. The big one
is p95 of rag.llm.first_token — that's
your hot-path SLO.
/horizon is the queue dashboard. Required for production —
without it, you're blind to backlogs. Watch:
failed_jobs shows here too.Queues to monitor:
| Queue | What's on it |
|---|---|
default | Misc: usage events, gap detection, audit logs, webhook deliveries. |
crawl | CrawlSourceJob, CrawlPageJob, IngestNotionPageJob, IngestGoogleDocJob. Tends to be the longest queue depth. |
index | IndexDocumentJob, IndexTextSourceJob. Embedding-heavy. |
Standard Laravel logging. Default channels:
stdout — captured by Laravel Cloud / Docker.sentry — error level and above.slack — critical level, posts to ops channel.
Tail logs locally with php artisan pail.
GET /up is the readiness probe — returns 200 with a small
JSON body if the app boots. Use it for load balancer health checks.
For deeper checks, App\Support\PlatformAdminHeader
runs the multi-step health check and exposes the result via the
Inertia shared prop on every admin page.
The handful of metrics that matter most:
Recommended PagerDuty / Slack alerts:
The header pill in the admin panel is a quick visual check that everything is configured. Green is the steady state; if it goes amber, the dropdown tells you exactly which check failed and links to the settings page to fix it. See Site health & failed jobs.