Skip to main content

Audio preprocessing

A short ffmpeg pipeline runs before audio is sent to diarization and transcription. Good preprocessing is the single highest-leverage fix for poor Whisper accuracy and over-segmented diarization.

Feature flag. SCRYON_AUDIO_PREPROCESSING_ENABLED=true (default). If ffmpeg is missing or fails, the worker falls back to the original audio — the call still completes.

What the pipeline does

Resulting audio is the input to the rest of the pipeline.

Why each step

StepWhy
Mono downmixWhisper expects mono. Cheap to do once.
16 kHz resampleWhisper's native rate. Saves bandwidth + provider cost.
High-pass 80 HzHVAC rumble + AC compressor noise pyannote loved to classify as a separate speaker.
FFT denoiseStationary background noise (fans, traffic) leaks into segments and confuses Whisper.
Loudness normaliseSome Android recorders record at ‑35 dBFS. Whisper's accuracy on low-volume audio collapses.

Tuning knobs

VariableDefaultEffect
SCRYON_AUDIO_DENOISE_ENABLEDtrueTurn the denoise step off entirely.
SCRYON_AUDIO_HIGHPASS_HZ80Raise to ~120 Hz for tinny recordings; lower to 50 Hz to preserve male voice fundamentals.
SCRYON_AUDIO_DENOISE_NR_DB12Strength of noise reduction in dB. Raising past ~20 dB starts clipping quiet speech.
SCRYON_AUDIO_DENOISE_NOISE_FLOOR_DB-25Estimated noise floor. Adjust if you know the source.
SCRYON_AUDIO_PREPROCESSING_OUTPUT_FORMATwavwav is largest but lossless. mp3 halves the upload size.
SCRYON_AUDIO_PREPROCESSING_TIMEOUT_SECONDS60Per-file ffmpeg deadline.

When to turn things off

SymptomTry
Quiet speech disappearsTurn denoiseEnabled off, or lower denoiseNrDb to 6–8 dB.
Loud peaks distortThe loudnorm pass should prevent this; if not, the input is already clipped — re-record.
ffmpeg missing on hostInstall ffmpeg (recommended) or set SCRYON_AUDIO_PREPROCESSING_ENABLED=false.

Failure behaviour

  • ffmpeg missing → log a warning once, fall back to the original audio for all calls.
  • ffmpeg fails on a specific file → log the file's metadata, fall back for just that file.

In both cases the call still completes — just on the unprocessed audio. The metric scryon.audio.preprocessing.fallback{reason=...} is incremented so you can alert on widespread failures.

Code map

ServiceFile
AudioPreprocessingServiceBuilds the filter chain and runs ffmpeg.
ScryonProperties.AudioPreprocessingConfiguration shape.

Telemetry

  • scryon.audio.preprocessing.duration (timer)
  • scryon.audio.preprocessing.fallback{reason} (counter)
  • event=PIPELINE stage=AUDIO_PREPROCESSED status=COMPLETED durationMs=...