Dust's transcription engine gets a significant accuracy boost with enhanced metadata support. Builders using audio-heavy agents need to evaluate the impact on their workflows.

Better transcription accuracy with structured metadata support lets agents handle audio-heavy workflows more reliably while eliminating downstream processing steps.
Signal analysis
Dust has upgraded its transcription engine to a newer model with measurably improved accuracy and expanded metadata extraction capabilities. For builders, this means audio input - whether customer calls, interviews, or voice-based interactions - will be converted to text with fewer errors and richer contextual information attached.
The metadata support is the quiet win here. Beyond just converting speech to text, the engine can now extract additional structured data from audio - think speaker identification, timestamps, confidence scores, or domain-specific terms. This matters because agents can now act on audio without needing downstream cleanup steps.
This isn't a minor point release. Accuracy improvements compound across use cases. An agent handling customer support transcription, interview analysis, or compliance recording processing will see measurable reduction in hallucinations and correction overhead.
If you're currently using Dust agents for audio processing, test this immediately against your actual use cases. Pull a representative sample of your audio inputs and compare transcription output quality. Measure error rates, especially on industry-specific terminology or accented speech where older models typically struggled.
The metadata support changes your agent design options. Instead of writing agents that ingest raw transcribed text, you can now structure audio processing to consume metadata directly - routing calls by speaker confidence levels, filtering by timestamp ranges, or triggering different handling paths based on extracted signal strength.
Consider whether this upgrade eliminates tooling you currently bolt on. If you're using separate speaker identification services or metadata enrichment steps, those might now be redundant. Removing dependencies simplifies your agent stack and reduces latency.
This upgrade signals that Dust sees audio processing as a core competency, not a bonus feature. We're seeing the same pattern across platform tooling - transcription, audio understanding, and voice handling are moving from optional integrations to built-in capabilities. This reflects real builder demand.
The investment in metadata extraction specifically suggests Dust is positioning for agent workflows that treat audio as structured data, not just speech-to-text conversion. This is more sophisticated than commodity transcription services. It means the platform is betting that builders want agents that reason about audio at a deeper level.
Competitors will need to match this. If you're evaluating Dust against other agent platforms, audio transcription quality and metadata richness should be explicit evaluation criteria moving forward.
Better accuracy doesn't mean perfect accuracy. Transcription engines still struggle with background noise, overlapping speakers, and domain-specific terminology. The upgrade reduces errors, but doesn't eliminate the need for error handling in your agent logic.
Different use cases see different gains. A customer support call in English with clear audio will see larger accuracy improvements than a heavily-accented international call with background noise. Test on samples that match your actual traffic patterns.
Metadata extraction quality depends heavily on audio quality and recording standards. A professionally recorded interview will yield rich, accurate metadata. A phone call on a bad connection will extract less reliable metadata. Design your agents to degrade gracefully when metadata confidence is low.
Best use cases
Open the scenarios below to see where this shift creates the clearest practical advantage.
One concise email with the releases, workflow changes, and AI dev moves worth paying attention to.
More updates in the same lane.
Discover how to enable Basic and Enhanced Branded Calling through Twilio Console to enhance your brand's visibility.
Cohere has unveiled 'Cohere Transcribe', an open-source transcription model that enhances AI speech recognition accuracy.
Mistral AI has released Voxtral TTS, an open-source text-to-speech model, providing developers with free access to its capabilities for various applications.