
Enterprises constructing voice-enabled workflows have had restricted choices for production-grade transcription: closed APIs with knowledge residency dangers, or open fashions that commerce accuracy for deployability. Cohere’s new open-weight ASR mannequin, Transcribe, is constructed to compete on all 4 key differentiators — contextual accuracy, latency, management and price.
Cohere says that Transcribe outperforms present leaders on accuracy — and in contrast to closed APIs, it may possibly run on a company’s personal infrastructure.
Cohere, which could be accessed through an API or in Cohere’s Mannequin Vault as cohere-transcribe-03-2026, has 2 billion parameters and is licensed underneath Apache-2.0. The corporate mentioned Transcribe has a mean phrase error price (WER) of simply 5.42%, so it makes fewer errors than related fashions.
It’s skilled on 14 languages: English, French, German, Italian, Spanish, Greek, Dutch, Polish, Portuguese, Chinese language, Japanese, Korean, Vietnamese and Arabic. The corporate did not specify which Chinese language dialect the mannequin was skilled on.
Cohere mentioned it skilled the mannequin “with a deliberate focus on minimizing WER, whereas holding manufacturing readiness top-of-mind.” In accordance to Cohere, the consequence is a mannequin that enterprises can plug immediately into voice-powered automations, transcription pipelines, and audio search workflows.
Self-hosted transcription for manufacturing pipelines
Till just lately, enterprise transcription has been a trade-off — closed APIs supplied accuracy however locked in knowledge; open fashions supplied management however lagged on efficiency. Not like Whisper, which launched as a analysis mannequin underneath MIT license, Transcribe is out there for business use from launch and might run on a company’s personal native GPU infrastructure. Early customers flagged the commercial-ready open-weight method as significant for enterprise deployments.
Organizations can carry Transcribe to their very own native situations, since Cohere mentioned the mannequin has a extra manageable inference footprint for native GPUs. The corporate mentioned they had been ready to do that as a result of the mannequin “extends the Pareto frontier, delivering state-of-the-art accuracy (low WER) whereas sustaining best-in-class throughput (excessive RTFx) inside the 1B+ parameter mannequin cohort.”
How Transcribe stacks up
Transcribe outperformed speech-model stalwarts, together with Whisper from OpenAI, which powers the voice characteristic of ChatGPT, and ElevenLabs, which many large retail manufacturers deploy. It at the moment tops the Hugging Face ASR leaderboard, main with a mean phrase error price of 5.42%, outperforming Whisper Giant v3 at 7.44%, ElevenLabs Scribe v2 at 5.83%, and Qwen3-ASR-1.7B at 5.76%.
Primarily based on different datasets examined by Hugging Face, Transcribe additionally carried out properly. The AMI dataset, which measures assembly understanding and dialogue evaluation, Transcribe logged a rating of 8.15%. For the Voxpopuli dataset that assessments understanding of various accents, the mannequin scored 5.87%, overwhelmed solely by Zoom Scribe.
Early customers have flagged accuracy and native deployment as the standout elements — notably for groups which have been routing audio knowledge via external APIs and wish to carry that workload in-house.
For engineering groups constructing RAG pipelines or agent workflows with audio inputs, Transcribe affords a path to production-grade transcription with out the knowledge residency and latency penalties of closed APIs.
Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.