Skip to main content

Call processing pipeline

The async pipeline transforms an uploaded audio file into a transcribed, diarized, analysed call. Every stage is observable, idempotent, and recoverable.

Stages

Within TRANSCRIBING the worker executes these sub-stages in order:

#StageServiceNotes
1AUDIO_PREPROCESSEDAudioPreprocessingServiceMono · 16 kHz · loudnorm · denoise. Falls back to original audio on failure.
2DIARIZEDDiarizationService (pyannoteAI)Optional. Hints numSpeakers: 2 for phone calls.
3TRANSCRIBEDTranscriptionClient (Lemonfox)Word-level Whisper output. Optionally async via callback.
4TRANSCRIPT_ALIGNEDTranscriptAlignmentServiceWord ⨯ diarization-turn overlap.
5TRANSCRIPT_NORMALIZEDTranscriptNormalizationServiceStable spk_N IDs, segment merging, stutter cleanup.
6VOICE_MATCH_STARTEDVoiceMatchServiceOptional. Pre-labels the authenticated user if a voice profile exists.
7SPEAKER_REFINEDSpeakerNameResolutionServiceNames speakers from metadata + voice match + transcript text.
8ANALYZEDAnalysisClient (OpenAI)Structured business analysis.
9ACTION_ITEMS_EXTRACTEDActionItemServicePersists action items.

Each is wrapped in a ProcessingEventLogger event and a ScryonMetrics timer.

State machine

StatusMeaning
QUEUEDAccepted by the API; awaiting worker pickup.
TRANSCRIBINGOne of the pipeline stages above is in flight.
ANALYZINGThe LLM is producing the analysis.
COMPLETEDAll artifacts persisted; transcripts and analysis are queryable.
FAILEDA non-recoverable error; errorReason carries a short, opaque code.

Transitions are persisted by CallPersistenceService and are guarded so a worker restart never produces a divergent state.

Idempotency

  • Workers claim rows with SELECT ... FOR UPDATE SKIP LOCKED.
  • Every artifact write is keyed by (callId, artifactType) and replaces in place.
  • Re-running the pipeline on a COMPLETED call is a deliberate, idempotent operation that re-uses prior artifacts where possible (see CallProcessingService.runPipeline).

Failure handling

FailureBehaviour
Preprocessing (ffmpeg)Log + skip; use original audio.
DiarizationLog + fall back to Lemonfox built-in diarization (single-speaker if it can't either).
TranscriptionHard fail. The call moves to FAILED.
Alignment / normalizationHard fail.
Voice matchSoft fail; transcript still ships, no VOICE_EMBEDDING label.
Speaker resolutionSoft fail; transcript ships with Speaker N labels.
AnalysisHard fail.
Action itemsSoft fail; transcript and analysis still surfaced.

Stuck rows are reaped by StaleJobSweeper after SCRYON_STALE_JOB_TIMEOUT_MINUTES.

Privacy along the pipeline

StageWhat lives whereWhat we never store
UploadTEMP_AUDIO key (S3) for up to OBJECT_STORAGE_TEMP_AUDIO_TTL_HOURSPermanent audio.
PreprocessingIn-memory bytesLocal temp files (suppressed via multipart threshold).
DiarizationRaw provider response as DIARIZATION_JSON artifactProvider tokens in logs.
TranscriptionRaw Whisper response as RAW_TRANSCRIPT_JSON artifactTranscript text in INFO logs when REDACT_TRANSCRIPTS=true.
NormalizationNORMALIZED_TRANSCRIPT_JSON artifactAnything we'd be ashamed to show the speaker.
Voice matchvoiceMatchScore (single float) on the transcriptEmbedding vectors are kept opaque per provider; we never decode them.
AnalysisANALYSIS_JSON artifactSensitive text in metrics or logs.

See Privacy & security for the full contract.