ElevenLabs, an AI start-up that merely elevated a $180 million mega funding round, has really been principally understood for its sound technology experience. The enterprise took an motion in yet another technical directions by releasing its preliminary standalone speech-to-text model referred to as Scribe.
The beginning-up, valued at $3.3 billion, has really assisted quite a few varied different enterprise in giving speech-to-text options by way of its substantial assortment of voices. Nonetheless, the enterprise is at present searching for to enter speech discovery and tackle the similarity Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI’s Murmur variations.
ElevenLabs’ Scribe model sustains over 99 languages at launch. The enterprise classifies over 25 languages in very good precision group for the model the place phrases mistake value is far lower than 5%. This itemizing consists of English (asserted precision value of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Gloss, Portuguese, Spanish, and Vietnamese. Numerous different languages are positioned in varied classifications with excessive (5-10% phrase mistake value), nice (10 to twenty% phrase mistake value), and modest (25 to 50%) phrase mistake costs.
The enterprise claimed that the model outshined Google Gemini 2.0 Flash and Murmur Large V3 all through quite a few languages in FLEURS & & Widespread Voice customary examinations.

ElevenLabs had really created the speech-to-text half for its AI conversational consultant system, which was launched in 2014. Nonetheless, that is the very first time the company is releasing a standalone speech detection model In a dialogue with TechCrunch final month, chief govt officer Mati Staniszewski spoke about enhancing speech discovery variations.
“We intend to understand what’s being claimed by you in a dialogue significantly better. We’re coping with means to relocate removed from simply creating materials and understanding and recording speech,” Staniszewski claimed again then. “Plenty of individuals state that speech-to-text is a set concern. Nonetheless, for quite a few languages, it’s somewhat unfavourable. We assume we are able to assemble much better speech discovery variations since we’ve inside teams to annotate info and supply us quick feedback.”
The model likewise has intelligent audio speaker diarization to tell you that’s speaking, timestamp at phrase diploma for exact captions, and auto-tagging audio events like goal market gigglings. The beginning-up is giving a technique for purchasers to straight document video clip materials to incorporate captions or subtitles in its workshop.
Scribe presently simply offers with pre-recorded audio layouts. The enterprise claimed it would actually launch a low-latency real-time variation of the model shortly. That means it isn’t but environment friendly for satisfying transcriptions or voice note-taking.
ElevenLabs is valuing Scribe at $0.40 for an hour of recorded sound. Whereas the worth is reasonably priced, some of its rivals offer a lower price for audio transcriptions presently with some operate distinction.