VioBuilding your agentEngines & voices

Engines & voices

The Configure tab is where you pick your Vio agent’s engine and its voice. Each choice shows an estimated cost per minute and latency, so you can trade cost against how fast and natural the agent sounds before you ever place a call. For most Indian SMBs the Pipeline engine with a Virtuo Speech (Indian) voice is the right default.

Engines

An engine is how the agent turns a caller’s speech into words, thinks, and speaks back. Vio offers two, shown as cards on the Configure tab.

  • Pipeline — combines a transcriber, a language model, and a voice as three stages. This gives the most control over each part and the lowest cost (~₹2.5–3/min). It is the best default for Indian languages and Hindi-English code-mixing.
  • Realtime — a single speech-to-speech model that listens and speaks in one step. It has the lowest latency and the most natural turn-taking, at a premium cost (~₹15/min).
EngineHow it worksLatencyCost / minBest for
PipelineTranscriber → language model → voice, as three stagesLow~₹2.5–3Indian languages, high call volume, cost-sensitive use
RealtimeOne speech-to-speech model that hears and speaks togetherLowest~₹15The most natural, human-like turn-taking

The per-minute figures above are engine and voice COGS (what a minute of talk costs to run). They are estimates and vary with the voice you pick. For plan and billing detail, see Pricing.

Voices

Below the engine you choose the voice the agent speaks in. Every voice card has a gender label and a Preview play button, so you can listen to the exact voice before selecting it.

  • Virtuo Speech (Indian) — natural Hindi-English code-mix voices built for Indian callers. These handle mixed-language speech the way people actually talk on the phone in India, and are the recommended pairing with the Pipeline engine.
  • English voices — clear, English-first voices for English-only use cases. The premium English voices offer a quality choice: a higher-quality voice that sounds richer, or a faster/cheaper voice model that trims cost and latency.
⚠️

The voice picker only lists voices available on your workspace. If a voice you expect is missing, it hasn’t been enabled for your account yet — contact support.

Which engine should you choose?

  • Start with Pipeline + Virtuo Speech (Indian) if your callers speak Hindi, English, or a mix — it is the cheapest and handles code-mixing well.
  • Switch to Realtime only when the most natural, lowest-latency conversation matters more than cost — for example a premium concierge or high-value sales line.
  • You can change the engine and voice at any time and re-test; nothing else in the agent needs to change.

Frequently asked

Q. Which engine should I choose for a Vio agent? Pick Pipeline for the lowest cost and strong Indian-language support — it fits most SMB use cases. Pick Realtime when you want the lowest latency and the most human turn-taking and can accept the premium per-minute cost.

Q. Can callers interrupt the agent while it’s speaking? Yes, when you enable barge-in. The agent stops talking and listens when the caller speaks over it. Turn it on in Advanced settings.

Q. Why can’t I hear crisp HD audio on a phone call? Ordinary phone lines are narrowband — the network itself compresses audio, so even a high-quality voice sounds like a phone call, not a studio recording. This is normal for any telephone system. See Troubleshooting if audio is genuinely broken rather than just narrowband.

Q. Do Virtuo Speech (Indian) voices speak Hindi and English in the same sentence? Yes. They are built for Hindi-English code-mixing, so the agent can switch mid-sentence the way Indian callers naturally do.

Next

Once the voice sounds right, write what the agent says: Persona & prompt. Then decide what it should collect in Variables.