Skip to main content

Backend onboarding

You are joining the team that owns the Spring Boot service that turns audio into transcripts and analyses. This page gets you from "git clone" to "first PR merged" in about a week.

If you have not yet read the Onboarding overview, start there. This page assumes you have the 30-minute mental model.


What you are signing up for

The backend's job, in one paragraph:

Accept an authenticated multipart audio upload. Persist a call_records row. Run an async pipeline that preprocesses the audio (ffmpeg), diarizes (pyannoteAI), transcribes (Lemonfox/Whisper), aligns + normalises into our v3 schema, resolves speaker names using layered evidence, calls an LLM for analysis, and extracts action items. Expose tight, versioned REST endpoints. Never log PII. Be idempotent.

That is the entire job. Everything else in the codebase is in service of that loop being accurate, private, and boringly reliable.


The tech

LayerChoiceNotes
LanguageJava 21Records, sealed types, pattern matching, virtual threads where it helps.
FrameworkSpring Boot 3.xStandard stack — spring-web, spring-data-jpa, spring-security, spring-actuator.
BuildGradle (Kotlin DSL)Wrapper checked in. JDK 21 is required to build.
DatabasePostgreSQL 16Flyway-managed migrations.
StorageS3-compatibleTigris in prod, MinIO locally.
HTTP clientWebClient (reactive)Non-blocking calls to providers.
ObservabilityMicrometer, OpenTelemetry, SentryPrometheus scrape; OTLP traces; Sentry events scrubbed.
External APIspyannoteAI, Lemonfox/Whisper, OpenAIEach behind a feature flag in dev.
TestsJUnit 5, Mockito, AssertJ, TestcontainersService tests use Testcontainers Postgres.

Day 1 — get it running

1. Prereqs

# macOS
brew install openjdk@21 postgresql@16 ffmpeg
brew services start postgresql@16

# Verify
java -version # should print 21
psql --version # should print 16
ffmpeg -version

You also need an S3-compatible store. The simplest is MinIO via Docker:

docker run -d --name minio -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=minioadmin -e MINIO_ROOT_PASSWORD=minioadmin \
minio/minio server /data --console-address ':9001'

2. Clone and configure

git clone git@github.com:FluxonLabs/scryon-backend.git
cd scryon-backend
cp .env.sample .env.local # then edit

The config surface is documented in Configuration reference. For a first local run, the bare minimum is:

Env varNotes
SCRYON_API_KEYAny random string. Used by clients (and your curls).
SPRING_DATASOURCE_URLjdbc:postgresql://localhost:5432/scryon
SPRING_DATASOURCE_USERNAME / PASSWORDLocal Postgres user
SCRYON_S3_*Point at your MinIO
PYANNOTE_ENABLEDfalse for the first run — uses a stub. Flip to true once you have a key.
SCRYON_TRANSCRIPTION_PROVIDERstub for first run. Switch to lemonfox later.
SCRYON_LLM_PROVIDERstub for first run. Switch to openai later.

3. Boot it

createdb scryon
./gradlew bootRun

Flyway migrates on startup. Look for Successfully applied N migrations in the logs.

Smoke test:

curl -H "X-API-Key: $SCRYON_API_KEY" http://localhost:8080/api/health
# {"status":"UP"}

4. End-to-end with a tiny clip

Generate a 5-second silent WAV and feed it through the pipeline:

ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 5 sample.wav

curl -X POST http://localhost:8080/api/calls/analyze \
-H "X-API-Key: $SCRYON_API_KEY" \
-H "X-Local-User-Id: local-dev" \
-F "file=@sample.wav" \
-F "fileName=sample.wav"
# 202 { "callId": "...", "status": "QUEUED" }

Watch the logs and poll GET /api/calls/{callId} until status: COMPLETED. Open the transcript and analysis endpoints. With stubs you'll get deterministic placeholder content — that's fine.


Day 2 — walk one call end-to-end with logs open

The single most useful thing you can do on day 2 is follow one real call through the entire pipeline with the log filter callId=<id>.

The pipeline:

Each stage:

  • Reads its inputs from the previous stage's persisted artifact (object storage or DB).
  • Writes its output as a new artifact.
  • Is idempotent: if it crashes halfway, the next attempt picks up from the last successful stage.

Read Call processing pipeline once you have followed a real call. The doc will make twice as much sense after.

Key files to open

FileWhy
CallController.javaThe HTTP boundary.
CallIntakeService.javaWhere uploads become DB rows.
AnalysisPipeline.javaThe orchestrator.
SpeakerNameResolutionService.javaThe most subtle code in the codebase. Read it slowly.
LabelSource.javaThe grammar of "why did we pick this name?"
TranscriptNormaliser.javaThe v3 schema; the client contract.
application.yml / application-*.ymlThe full config surface.

Day 3 — privacy + conventions

Read these in order:

  1. Privacy & security — non-negotiable. Memorise the "hard rules" section. They are enforced in code review.
  2. Coding conventions — Java, naming, package, configuration, logging, error handling, transactions, HTTP client.
  3. Database migrations — every schema change is a Flyway file. Naming, rules, local reset.

Privacy-specific gotchas to internalise:

  • Phone numbers are masked (+91 98***45) anywhere they appear in logs or metric labels.
  • Transcript text never appears in metrics — only counts and durations.
  • Sentry events are scrubbed. Before you add a new field to an exception or log.error, check the scrubber config.
  • Voice embeddings never leave the backend. Not in API responses, not in logs.

Day 4 — pick a first PR

Look for issues labelled good first issue on the repo. Good candidates:

  • A new test for an existing edge case in SpeakerNameResolutionService.
  • Adding a missing metric or refining an existing label set.
  • A small application.yml cleanup or a documentation correction.
  • A typo or a renamed constant.

Stay away from these for your first PR:

  • The pipeline orchestrator.
  • Anything that adds a new env var.
  • Anything that adds a new external provider.

PR checklist (memorise)

  • Tests added or updated.
  • No new PII in logs / metrics / Sentry.
  • No new env var without a default and documentation in Configuration reference.
  • Flyway migration named V<n>__<snake_case>.sql and never edited after merge.
  • ./gradlew check passes locally.
  • Commit message describes the why, not the what.

Day 5 — ship it

Submit, iterate on review, merge. Congratulate yourself.


Week 2 — own a slice

Pick one of these and become the local expert:

SliceWhere it lives
DiarizationDiarizationClient, Diarization feature doc
TranscriptionTranscriptionClient, Transcription feature doc
Audio preprocessingAudioPreprocessor, Audio preprocessing
Speaker resolutionSpeakerNameResolutionService, Speaker resolution
Voice embeddingVoiceMatchService, UserVoiceProfileService, Voice embedding
AnalysisAnalysisLlmService, ActionItemExtractor, Analysis
ObservabilityMicrometer config, OTLP, Sentry — Observability

"Own a slice" means: read every file in it, run every test, write a one-pager explaining how it works to someone joining next month.


Week 3 — pair with Android

The backend and the Android app meet at a tight REST contract. Spend a day pairing with someone on the Android team and watching what they have to do to consume your endpoint shapes. You will find at least one thing that should be easier on the client side. Open a PR.

The Android client section is your reading material here, especially Upload pipeline and Status lifecycle.


Week 4 — on-call shadow

Read the Runbook and Troubleshooting cover to cover. Watch the on-call channel for a week. When an alert fires, follow the runbook yourself before asking the primary on-call what they would do. Then compare notes.


Reference shelf

Bookmark these:


Common stumbling blocks

SymptomLikely causeFix
./gradlew bootRun fails with "Flyway checksum mismatch"Someone (probably you) edited a migration after it was appliedNever edit a V<n>__*.sql after merge. Use a new migration. For local dev, dropdb scryon && createdb scryon.
500 on POST /analyze with S3 connection refusedMinIO not running, or SCRYON_S3_ENDPOINT wrongStart MinIO; double-check the endpoint URL (note http:// and the port).
pyannoteAI 401PYANNOTE_API_KEY wrong, or feature not enabledVerify the key in your .env.local; set PYANNOTE_ENABLED=true.
Pipeline never completesStub providers were enabled; the stub returns instantly so check logs for skipped stagesSwitch to real providers, or accept the stub output.
Tests pass locally, fail in CITestcontainers Postgres version mismatch, or a TestPropertySource collisionRun ./gradlew clean test. Check the TestPropertySource is unique per test class.

For anything else, Troubleshooting is more thorough.


What "good" looks like after a month

  • You can describe the entire pipeline without notes.
  • You have shipped at least 5 PRs that touched non-trivial code.
  • You have written or improved one piece of documentation in scryon-docs.
  • You have an opinion about something we should change — and you wrote an ADR or filed an issue.
  • You can take an alert at 2 am and resolve it without paging the rest of the team.

Welcome.