Data Flow

Understand how data flows through urvo during voice conversations

When using urvo, data flows through multiple components during a voice conversation. Understanding this flow is essential for security-conscious organizations.

Overview

This guide explains:

The complete voice pipeline architecture
What data passes through each component
What data is stored on urvo's infrastructure
How to control call recording and transcription storage

Understanding Log Types

urvo generates two distinct types of logs during calls:

Log Type	Description	Visibility
System Logs	Internal operational logs used by urvo for debugging, monitoring, and system health	urvo internal only — never shared with customers
Call Logs	Conversation data including transcripts, recordings, and call metadata	Available to customers via the dashboard

Note: System logs are strictly internal to urvo and are never shared with customers or uploaded to external storage. They contain infrastructure-level data used for urvo's operational purposes only.

Voice Pipeline Architecture

urvo orchestrates a voice pipeline with multiple modular components. Each component handles a specific part of the voice conversation flow.

Complete Pipeline Flow

The following describes the end-to-end flow of a voice call through urvo:

Transport Layer — Audio enters via SIP or Twilio telephony
Speech-to-Text (Transcriber) — User audio is converted to text in real-time
Orchestration Layer — urvo's proprietary models handle endpointing, interruption detection, emotion detection, and backchanneling
Language Model (LLM) — Generates conversational responses based on transcribed user input
Text-to-Speech (Voice) — Converts LLM responses into spoken audio
Transport Layer — Synthesized audio is streamed back to the user

Throughout this pipeline, artifacts are generated: call recordings, transcripts, and call logs.

Pipeline Components

1. Transport Layer

The transport layer handles real-time audio streaming between users and urvo.

Transport Type	Description	Use Case
SIP	Session Initiation Protocol	Traditional phone systems, PBX integration, SIP trunking
Twilio	Twilio telephony integration	PSTN calls, phone numbers, outbound dialing

2. Speech-to-Text (Transcriber)

Converts user audio into text in real-time using streaming recognition. urvo uses its own speech-to-text infrastructure — there is no bring-your-own-key option for transcription.

3. Orchestration Layer

urvo runs proprietary real-time models that make conversations feel natural. These models run exclusively on urvo's infrastructure and are not customizable.

Model	Purpose
Endpointing	Detects when the user finishes speaking using audio-text fusion
Interruption Detection	Distinguishes barge-in from affirmations like "uh-huh"
Background Noise Filtering	Removes ambient sounds in real-time
Background Voice Filtering	Isolates primary speaker from TVs, echoes, and other voices
Backchanneling	Adds natural affirmations ("uh-huh", "yeah", "got it")
Emotion Detection	Analyzes emotional tone and passes it to the LLM
Filler Injection	Adds natural speech patterns ("um", "like", "so")

Note: Orchestration models process data in real-time but do not persist the audio or intermediate results. All processing is ephemeral. Only final transcripts and call logs are stored.

4. Language Model (LLM)

Generates conversational responses based on transcribed user input. You can choose from urvo's available LLMs when configuring your agent.

Note: Bring-your-own-key is not supported for LLMs by default. If you need to use your own API key for a specific LLM, contact support@urvo.io.

5. Text-to-Speech (Voice)

Converts LLM responses into spoken audio. urvo uses its own text-to-speech infrastructure — there is no bring-your-own-key option for voice synthesis.

Default Data Flow

In the default configuration, urvo handles all pipeline components and stores artifacts on urvo's infrastructure.

What Is Stored by Default

Call recordings — Audio recordings of the full conversation
Transcripts — Full transcriptions with timestamps
Call logs — Metadata and component-level details for each call
Product usage metrics — Internal analytics (urvo only, not customer-accessible)
System logs — Operational logs (urvo only, not customer-accessible)

Controlling Data Storage

You can control whether call recordings and transcriptions are stored by adjusting settings on your agent's Configure page.

Disabling Call Recordings

To disable call recordings:

Go to your agent's Configure page
Scroll down to the Advanced section
Turn off "Enable Recordings"

When disabled, urvo will no longer store audio recordings for that agent's calls.

Disabling Transcriptions and Recordings

To disable both transcriptions and recordings:

Go to your agent's Configure page
Scroll down to the Advanced section
Set "Conversations Retention Period" to 0

Setting the retention period to 0 prevents urvo from storing both transcriptions and recordings for that agent's calls.

Custom Storage

If you need call data stored in your own cloud storage, contact support@urvo.io to discuss custom storage options.

What Data Passes Through urvo

The following describes what data is processed and how it is retained:

Data Type	Processing	Retention
Raw audio streams	Real-time routing to Transcriber / Voice	Ephemeral (not stored)
Transcribed text	Orchestration analysis, LLM routing	Call logs (unless disabled)
LLM responses	Filler injection, Voice routing	Call logs (unless disabled)
Emotion metadata	Passed to LLM context	Ephemeral
Call signaling	SIP / telephony management	Metadata only

Artifacts Storage Summary

Artifact	Default Location	Can Be Disabled
Call Recordings	urvo	Yes — Turn off "Enable Recordings" in the Advanced section
Transcripts	urvo	Yes — Set "Conversations Retention Period" to 0
Call Logs	urvo	Yes — Set "Conversations Retention Period" to 0
Product Usage Metrics	urvo	No — Internal to urvo
System Logs	urvo	No — Internal to urvo

Infrastructure Summary

The following summarizes what runs on urvo's infrastructure and what you can control:

Component	Infrastructure	Customizable
Transport	SIP / Twilio	Choose SIP or Twilio
Transcriber	urvo	urvo only
Orchestration	urvo	urvo only
LLM	urvo (multiple providers available)	Choose from available LLMs; contact support for BYOK
Voice	urvo	urvo only
Storage	urvo	Contact support for custom storage

Note: The Orchestration Layer (endpointing, interruption detection, emotion detection, backchanneling, filler injection) is urvo's core technology and runs exclusively on urvo infrastructure. Audio processed by these models is ephemeral and is not stored.

Questions?

If you have questions about urvo's data flow, storage practices, or need custom storage arrangements, reach out to support@urvo.io.