Skip to main content

Storage layout

Object storage holds every byte of user content Scryon ever touches. The layout is provider-agnostic — keys look the same whether the backing store is AWS S3, Cloudflare R2, MinIO, or a local filesystem.

Key layout

users/{userId}/calls/{callId}/
├── temp/ # TEMP_AUDIO (ephemeral)
│ └── audio-{originalName}
├── diarization/
│ └── diarization.json # DIARIZATION_JSON
├── transcripts/
│ ├── raw.json # RAW_TRANSCRIPT_JSON
│ └── normalized.json # NORMALIZED_TRANSCRIPT_JSON
└── analysis/
└── analysis.json # ANALYSIS_JSON

Key generation lives in StorageKeys, a single source of truth.

Lifecycle

ArtifactLifetimeSweep
TEMP_AUDIOOBJECT_STORAGE_TEMP_AUDIO_TTL_HOURS (default 24h)StaleTempAudioSweeper
DIARIZATION_JSONPersistent
RAW_TRANSCRIPT_JSONPersistent
NORMALIZED_TRANSCRIPT_JSONPersistent
ANALYSIS_JSONPersistent

Raw audio (TEMP_AUDIO) is the only privacy-sensitive blob and is the only artifact ever deleted by sweep. Everything else is durable and may be re-read on demand by /api/calls/{id}/transcript, /api/calls/{id}/analysis, etc.

Provider abstraction

Implementations sit behind ObjectStorageService:

BeanWhenCode
LocalFileObjectStorageServiceOBJECT_STORAGE_PROVIDER=localWrites under OBJECT_STORAGE_LOCAL_PATH.
S3ObjectStorageServiceOBJECT_STORAGE_PROVIDER=s3Uses the AWS SDK v2; works against any S3-compatible endpoint.

For S3 the endpoint, region, credentials, and pathStyleAccess are configurable. Cloudflare R2, MinIO, Wasabi, and Backblaze B2 all work with OBJECT_STORAGE_PATH_STYLE_ACCESS=true.

Privacy

  • No public keys. Nothing in the bucket is publicly readable. Clients fetch transcripts and analysis through the REST API, which enforces ownership.
  • Presigned URLs are short-lived. When pyannoteAI uploads, it uses a presigned PUT URL we generate just-in-time and discard.
  • No phone numbers or names in keys. Keys are derived from UUIDs only.

Local dev

When running locally with OBJECT_STORAGE_PROVIDER=local the layout under ./var/storage/ is identical to S3, so you can ls -R var/storage to inspect what would be stored in production.

var/storage/users/449b4cd2-.../calls/f0a1d2e3-.../
├── temp/audio-call.m4a
├── diarization/diarization.json
├── transcripts/raw.json
├── transcripts/normalized.json
└── analysis/analysis.json