Data Flow

Understand how data flows through urvo during voice conversations

When using urvo, data flows through multiple components during a voice conversation. Understanding this flow is essential for security-conscious organizations.

Overview

This guide explains:

  • The complete voice pipeline architecture
  • What data passes through each component
  • What data is stored on urvo's infrastructure
  • How to control call recording and transcription storage

Understanding Log Types

urvo generates two distinct types of logs during calls:

Log Type Description Visibility
System Logs Internal operational logs used by urvo for debugging, monitoring, and system health urvo internal only — never shared with customers
Call Logs Conversation data including transcripts, recordings, and call metadata Available to customers via the dashboard

Note: System logs are strictly internal to urvo and are never shared with customers or uploaded to external storage. They contain infrastructure-level data used for urvo's operational purposes only.

Voice Pipeline Architecture

urvo orchestrates a voice pipeline with multiple modular components. Each component handles a specific part of the voice conversation flow.

Complete Pipeline Flow

The following describes the end-to-end flow of a voice call through urvo:

  1. Transport Layer — Audio enters via SIP or Twilio telephony
  2. Speech-to-Text (Transcriber) — User audio is converted to text in real-time
  3. Orchestration Layer — urvo's proprietary models handle endpointing, interruption detection, emotion detection, and backchanneling
  4. Language Model (LLM) — Generates conversational responses based on transcribed user input
  5. Text-to-Speech (Voice) — Converts LLM responses into spoken audio
  6. Transport Layer — Synthesized audio is streamed back to the user

Throughout this pipeline, artifacts are generated: call recordings, transcripts, and call logs.

Pipeline Components

1. Transport Layer

The transport layer handles real-time audio streaming between users and urvo.

Transport Type Description Use Case
SIP Session Initiation Protocol Traditional phone systems, PBX integration, SIP trunking
Twilio Twilio telephony integration PSTN calls, phone numbers, outbound dialing

2. Speech-to-Text (Transcriber)

Converts user audio into text in real-time using streaming recognition. urvo uses its own speech-to-text infrastructure — there is no bring-your-own-key option for transcription.

3. Orchestration Layer

urvo runs proprietary real-time models that make conversations feel natural. These models run exclusively on urvo's infrastructure and are not customizable.

Model Purpose
Endpointing Detects when the user finishes speaking using audio-text fusion
Interruption Detection Distinguishes barge-in from affirmations like "uh-huh"
Background Noise Filtering Removes ambient sounds in real-time
Background Voice Filtering Isolates primary speaker from TVs, echoes, and other voices
Backchanneling Adds natural affirmations ("uh-huh", "yeah", "got it")
Emotion Detection Analyzes emotional tone and passes it to the LLM
Filler Injection Adds natural speech patterns ("um", "like", "so")

Note: Orchestration models process data in real-time but do not persist the audio or intermediate results. All processing is ephemeral. Only final transcripts and call logs are stored.

4. Language Model (LLM)

Generates conversational responses based on transcribed user input. You can choose from urvo's available LLMs when configuring your agent.

Note: Bring-your-own-key is not supported for LLMs by default. If you need to use your own API key for a specific LLM, contact support@urvo.io.

5. Text-to-Speech (Voice)

Converts LLM responses into spoken audio. urvo uses its own text-to-speech infrastructure — there is no bring-your-own-key option for voice synthesis.

Default Data Flow

In the default configuration, urvo handles all pipeline components and stores artifacts on urvo's infrastructure.

What Is Stored by Default

  • Call recordings — Audio recordings of the full conversation
  • Transcripts — Full transcriptions with timestamps
  • Call logs — Metadata and component-level details for each call
  • Product usage metrics — Internal analytics (urvo only, not customer-accessible)
  • System logs — Operational logs (urvo only, not customer-accessible)

Controlling Data Storage

You can control whether call recordings and transcriptions are stored by adjusting settings on your agent's Configure page.

Disabling Call Recordings

To disable call recordings:

  1. Go to your agent's Configure page
  2. Scroll down to the Advanced section
  3. Turn off "Enable Recordings"

When disabled, urvo will no longer store audio recordings for that agent's calls.

Disabling Transcriptions and Recordings

To disable both transcriptions and recordings:

  1. Go to your agent's Configure page
  2. Scroll down to the Advanced section
  3. Set "Conversations Retention Period" to 0

Setting the retention period to 0 prevents urvo from storing both transcriptions and recordings for that agent's calls.

Custom Storage

If you need call data stored in your own cloud storage, contact support@urvo.io to discuss custom storage options.

What Data Passes Through urvo

The following describes what data is processed and how it is retained:

Data Type Processing Retention
Raw audio streams Real-time routing to Transcriber / Voice Ephemeral (not stored)
Transcribed text Orchestration analysis, LLM routing Call logs (unless disabled)
LLM responses Filler injection, Voice routing Call logs (unless disabled)
Emotion metadata Passed to LLM context Ephemeral
Call signaling SIP / telephony management Metadata only

Artifacts Storage Summary

Artifact Default Location Can Be Disabled
Call Recordings urvo Yes — Turn off "Enable Recordings" in the Advanced section
Transcripts urvo Yes — Set "Conversations Retention Period" to 0
Call Logs urvo Yes — Set "Conversations Retention Period" to 0
Product Usage Metrics urvo No — Internal to urvo
System Logs urvo No — Internal to urvo

Infrastructure Summary

The following summarizes what runs on urvo's infrastructure and what you can control:

Component Infrastructure Customizable
Transport SIP / Twilio Choose SIP or Twilio
Transcriber urvo urvo only
Orchestration urvo urvo only
LLM urvo (multiple providers available) Choose from available LLMs; contact support for BYOK
Voice urvo urvo only
Storage urvo Contact support for custom storage

Note: The Orchestration Layer (endpointing, interruption detection, emotion detection, backchanneling, filler injection) is urvo's core technology and runs exclusively on urvo infrastructure. Audio processed by these models is ephemeral and is not stored.

Questions?

If you have questions about urvo's data flow, storage practices, or need custom storage arrangements, reach out to support@urvo.io.