Skip to main content

Observability

Scryon ships a complete observability stack by default — structured logs, metrics, distributed tracing, and error tracking — all wired with privacy guards.

Logs

Every log line carries an MDC-correlated context block:

INFO [reqId=87c3-5d3f userId=449b-cd2 callId=fa71-a20a] ... event=PIPELINE stage=PYANNOTE_STARTED ...
  • Correlation IDs. X-Request-Id is honoured if supplied; otherwise generated. Propagated through async workers by PipelineMdc.
  • Key=value pairs make the lines greppable and easy to parse with Loki / Datadog / CloudWatch.
  • Privacy. SafeLogSanitizer redacts emails, phone numbers, and obvious PII. REDACT_TRANSCRIPTS=true strips transcript text from INFO logs.

Pipeline events

ProcessingEventLogger emits one event per pipeline stage with event=PIPELINE, including:

FieldMeaning
stageE.g. AUDIO_PREPROCESSED, DIARIZED, TRANSCRIBED.
statusSTARTED / COMPLETED / FAILED / SKIPPED.
providerWhen applicable.
durationMsSet on terminal events.
errorCode / errorMessageShort opaque code + sanitized message.

Events are also persisted to call_processing_events for the owner-scoped /api/debug/calls/{callId}/events endpoint (gated by SCRYON_DEBUG_ENDPOINTS_ENABLED).

Metrics

ScryonMetrics exposes Micrometer counters and timers, scraped at /actuator/prometheus.

Selected metrics:

MetricTypeTags
scryon.calls.uploadedcounterdirection
scryon.calls.completedcounter
scryon.calls.failedcounterreason
scryon.audio.preprocessing.durationtimer
scryon.diarization.durationtimerprovider
scryon.transcription.durationtimerprovider
scryon.transcript.alignment.durationtimer
scryon.transcript.normalization.durationtimer
scryon.analysis.durationtimerprovider
scryon.voice.profile.createdcounter
scryon.voice.match.attemptedcounter
scryon.voice.match.outcomecounteroutcome

Distributed tracing

When SCRYON_TRACING_ENABLED=true and MANAGEMENT_TRACING_ENABLED=true, PipelineObservations wraps every stage in a Micrometer Observation that exports as an OpenTelemetry span via OTEL_EXPORTER_OTLP_ENDPOINT. Spans are nested under the request span and carry callId and stage attributes.

A typical trace for one call looks like:

[HTTP POST /api/calls/analyze]
└─ [pipeline.run]
├─ [pipeline.audio_preprocess]
├─ [pipeline.diarize]
│ └─ [http.client GET pyannote /v1/jobs]
├─ [pipeline.transcribe]
│ └─ [http.client POST lemonfox /v1/audio/transcriptions]
├─ [pipeline.align]
├─ [pipeline.normalize]
├─ [pipeline.voice_match]
├─ [pipeline.speaker_resolve]
└─ [pipeline.analyze]
└─ [http.client POST openai /v1/chat/completions]

Error tracking (Sentry)

When SENTRY_DSN is set, exceptions are captured by ScryonErrorReporter with:

  • A privacy-safe beforeSend callback that strips request bodies and PII headers.
  • callId and userId as tags.
  • The pipeline stage as a fingerprint.

Sentry is off when SENTRY_DSN is empty — no events are sent. CI and local dev are silent by default.

Health

EndpointPurposeAuth
GET /api/healthPublic liveness check, returns {status, version}.none
GET /actuator/healthSpring Boot health; details visible only when authorised.per MANAGEMENT_ENDPOINT_HEALTH_SHOW_DETAILS.
GET /actuator/infoBuild info.per actuator exposure.
GET /actuator/prometheusMetrics scrape.per actuator exposure (restrict in production).

Operating recommendations

  • Always scrape Prometheus. Even at a 30s cadence, the alerting headroom is invaluable.
  • Always set SENTRY_DSN in production. The cost-to-value ratio of error capture is unbeatable.
  • Enable tracing during incidents only. OTLP export is enabled by env var so you can flip it without a redeploy if your platform supports rolling env updates.
  • Cap actuator exposure. In hostile networks, restrict /actuator/prometheus behind a reverse proxy.

See Monitoring for production dashboards and alerts.