Voice profile setup
The Android app supports opt-in speaker recognition via a voice profile. When enabled on the backend, the user's voice embedding helps the diarization pipeline label them as USER with their display name in transcripts.
Embedding bytes are never returned to the client. The backend stores them server-side only.
Feature gating
The feature is gated on a backend flag (enabled on GET /api/users/me/voice-profile/status):
- When the flag is off — the Settings card is hidden entirely. No microphone permission is requested and no UI is visible.
- When the flag is on — the card appears under Settings → Speaker recognition.
Settings card states
| State | UI |
|---|---|
| No profile yet | CTA Set up voice profile → opens the wizard |
| Profile exists | Provider/model, created/updated dates, sample count, Re-record voice sample + Delete voice profile (with confirm dialog → DELETE /api/users/me/voice-profile) |
Setup wizard flow
- Consent checkbox — must be checked before Upload activates. The boolean is sent verbatim to the backend as the
consentAcceptedmultipart part. - Microphone permission — requested only when the user taps the mic. If previously denied, an Open app settings deep-link is shown.
- Script display — the user reads aloud a short script (target 20–30 s, hard min 15 s, hard max 45 s).
- Recording —
VoiceSampleRecorderusesMediaRecorder, writes tocacheDir/voice_profile/only, auto-stops at the 45 s cap. - Upload —
POST /api/users/me/voice-profile(multipartfile+consentAccepted=true). - Cleanup — temp file deleted on success, cancel, screen dispose, and sign-out.
How it affects transcripts
Transcripts already render speaker info from the existing v2 fields (speakerLabel, role, speakerDisplayName). Once the backend attributes a speaker to the user's profile it sets role=USER and displayName=<name> — the existing chat bubble lights up with the (You) tag. No client change required.
The optional labelSource field on TranscriptSpeakerDto is plumbed for future UI work but does not change rendering today.
Privacy guarantees
| Data | Where it lives |
|---|---|
| Voice sample (temp) | cacheDir/voice_profile/ — deleted immediately after upload or on cancel |
| Voice embedding | Backend only — never returned to the client |
| Consent flag | Sent to backend as consentAccepted=true; stored server-side |
Key files
| File | Role |
|---|---|
data/voiceprofile/VoiceSampleRecorder.kt | MediaRecorder wrapper, cacheDir-only, auto-cleans |
data/repository/ScryonVoiceProfileRepository.kt | getStatus / upload / delete |
viewmodel/VoiceProfileViewModel.kt | Wizard state machine + Settings card backing |
ui/voiceprofile/VoiceProfileSetupScreen.kt | Full-screen wizard |
ui/shell/tabs/SettingsTabScreen.kt | Speaker recognition card (hidden when flag off) |
Backend reference
See Voice profile API and Voice embedding feature for server-side details.