Weaviate adds Gemini Embedding 2 audio capabilities to multi2vec-google, expanding multimodal vector search beyond text and images. Replication and backup improvements included.

Builders can now index audio natively alongside text and images, simplifying multimodal search architecture while improving replication and backup reliability.
Signal analysis
Here at Lead AI Dot Dev, we're tracking the evolution of vector database capabilities, and Weaviate's latest release marks a meaningful shift toward comprehensive multimodal support. Version 1.36.6 introduces audio embedding support directly into the multi2vec-google module, which means developers can now index and search audio content alongside text and image vectors using Google's Gemini Embedding 2 Multimodal model.
This isn't a minor feature addition - audio as a first-class searchable modality fundamentally changes what builders can do with vector search. Previously, handling audio required external preprocessing pipelines or custom integrations. Now it's native to Weaviate's embedding pipeline.
The release also addresses infrastructure concerns: async replication includes binary encoding improvements for better performance under load, and backup mechanisms got enhancements that reduce operational friction. These are the kinds of changes that matter most in production environments where scale and reliability aren't optional.
Audio multimodal support removes a major friction point for builders working with heterogeneous data sources. Think about a knowledge system that needs to index customer support calls, documentation videos, and written FAQs - previously you'd either transcribe the audio or maintain separate search indexes. Now a single vector space can handle all three.
The Gemini Embedding 2 Multimodal model itself is significant here. Google's multimodal embeddings are trained on aligned text-image-audio datasets, which means cross-modal retrieval becomes viable. You could embed a user query in text and retrieve relevant audio segments, or vice versa. That capability wasn't practically available to Weaviate users before.
On the operational side, the replication improvements matter because they affect query performance and consistency guarantees. Binary encoding optimization typically translates to lower latency in distributed setups - critical for applications where vector search latency compounds through the stack. Backup enhancements reduce the blast radius if something goes wrong with your vector index.
If you're actively using Weaviate in production, evaluate whether audio content exists in your data ecosystem that's currently inaccessible to your vector search. Customer support recordings, training videos, product demos, interviews - these often contain valuable semantic content that text-only indexes miss.
For new projects: if multimodal search is even a possibility in your roadmap, this release removes a technical barrier. You can now build audio indexing into your initial architecture rather than bolting it on later. The cost is minimal - it's configuration, not infrastructure.
The replication and backup improvements are stability plays. If you're running Weaviate clusters, test the upgraded version in staging before production rollout. The binary encoding changes could affect compatibility with existing replicas, so plan your deployment carefully.
Thank you for listening, Lead AI Dot Dev
Best use cases
Open the scenarios below to see where this shift creates the clearest practical advantage.
One concise email with the releases, workflow changes, and AI dev moves worth paying attention to.
More updates in the same lane.
Discover how to enable Basic and Enhanced Branded Calling through Twilio Console to enhance your brand's visibility.
Cohere has unveiled 'Cohere Transcribe', an open-source transcription model that enhances AI speech recognition accuracy.
Mistral AI has released Voxtral TTS, an open-source text-to-speech model, providing developers with free access to its capabilities for various applications.