Artificial intelligence is undoubtedly the hottest area of tech today, with venture capital dollars flowing into startups in the space at unprecedented levels.
Within the vast space, voice AI startups have emerged as a standout, attracting the attention of investors globally, Crunchbase data shows. Over the past 12-18 months, several voice AI companies have seen their valuations triple — a signal of accelerating market demand and perceived long-term worth.
One example of a voice AI company that has seen a massive valuation jump this year is ElevenLabs, which allows creators, enterprises and others to use AI software to replicate voices in dozens of languages. The Brooklyn, New York-based startup went from achieving unicorn status with an $80 million Series B raise in January 2024 to being valued at about $3.3 billion one year later with a $180 million Series C co-led by Iconiq Capital and Andreessen Horowitz. Other backers include Sequoia Capital, Valor Equity Partners, New Enterprise Associates and Endeavor Catalyst.
And on Sept. 8, ElevenLabs announced it will sell secondary shares to provide liquidity options for employees via a tender offer that would double the company’s valuation to $6.6 billion. In a LinkedIn post, ElevenLabs’ Carles Reina revealed that ElevenLabs had “passed $200M in ARR in 2.5 years.”
Appetite for acquisitions
Voice also remains an attractive segment for ambitious acquirers. In July, Meta acquired PlayAI, a startup that uses AI to generate human-sounding voices, for an undisclosed amount. Founded in 2023, PlayAI had raised a known $5.1 million, per Crunchbase data.
The PlayAI team’s “work in creating natural voices, along with a platform for easy voice creation” was a great match for Meta’s “work and road map, across AI Characters, Meta AI, Wearables and audio content creation,” according to an internal memo viewed by Bloomberg.
Tom Hulme, managing partner and head of Europe at GV, believes that budding voice AI companies are ripe for acquisition because while companies may need speech-to-text, text-to-speech, intent recognition and conversational AI, building those capabilities in-house “can take years.”
“As CEOs realize that natural language and voice are essential to deliver the best product experience at the largest possible scale in the biggest markets, they’ll often conclude that it’s much faster to acquire proven technology and teams, so one could expect acquisition opportunities to arise,” Hulme told Crunchbase News.
Controlled growth
The growing investment in voice AI isn’t surprising when you look at the rapid confluence of multiple fast-developing technologies — primarily LLMs and real-time voice recognition, according to Hulme.
“Speech recognition is finally achieving human-level accuracy, LLMs are better at understanding context and intent, while microphones are literally in every device and platform we use,” he said.
As a firm, GV has invested in several companies that fall under the voice AI category, including Nothing, Neuralink, Vocode and Synthesia.
“One of the things that drew us to [these] companies … is the founders’ fundamental belief in the opportunity in natural language and voice as a user interface,” Hulme added. “These companies are tackling different pieces of the conversational computing puzzle, but they share a vision of making humans’ interactions with machines truly natural and as low friction as possible.”
Another factor that makes voice AI startups so attractive is that natural language can be considered to be humans’ main API for development, noted Hulme. And that includes understanding the world around us and communication.
“WhatsApp users are sending millions of voice messages every day — that’s human behavior telling us how they want to communicate with technology in a frictionless way,” he said. “And LLMs have been trained on the internet, which is predominantly natural language, so it makes sense that natural language and voice are the most elegant way to interact with them.”
Jordan Crook, a partner with Betaworks, said her firm has invested in models, middleware, applications, agents and even hardware as it relates to voice AI. Notably, it backed Granola, an AI notepad that transcribes and summarizes meetings from your device.
“As a subset of our portfolio, many of those companies are experiencing tailwinds in usage and capability,” she wrote via email. “So it’s clear that after more than a decade of TTS/STT (text-to-speech/speech-to-text) being available, the current crop of audio-aware models have unlocked actual utility and mainstream usage of voice as an interface.”
Customer conversations
Voice AI startups of all sizes continue to raise venture funding. Customer support in particular is a growing area.
Loman AI, an Austin, Texas-based 24/7 AI-powered phone system for restaurants, recently announced that it raised a $3.5 million seed round led by Next Coast Ventures.
The company says it has driven “tens of millions” in order volume since its 2024 launch. Loman touts that its AI phone agent “answers every call,” takes pickup and delivery orders, books reservations, fields guest questions and syncs directly with leading POS and reservation systems. The result, it claims, is that restaurants see higher revenue from recaptured calls and “smart upsells,” while also cutting labor costs.
In June, Maven AGI, a startup that builds enterprise AI agents for customer support, raised a $50 million Series B led by Dell Technologies Capital. Founded in 2023, the Boston-based company has raised a total of $78 million in funding, per Crunchbase data. In a recent blog post, founder and CTO Sami Shalabi wrote that Maven’s voice AI agents for live calls can “understand context and respond naturally in any situation.”
He added: “Maven Voice is also the first to bring voice-to-voice AI into real-world production for faster responses, more natural interactions, and tone that stays intact.”
A ‘universal remote’ for the digital world
Then there are those companies that are working behind the scenes to help other AI companies grow their offerings. One example lies in AssemblyAI, which is an applied AI startup that builds advanced speech-to-text and audio intelligence models. It aims to make it easy for developers to add voice features, such as transcription and voice recognition, to their apps. For example, voice AI apps such as Granola and Fireflies.ai use AssemblyAI’s technology to power their features.
Founded in 2017, it has raised nearly $160 million to date, per Crunchbase data. Backers include Y Combinator, Accel, Insight Partners and Smith Point Capital, among others.
AssemblyAI’s technology has a variety of use cases, according to CEO and founder Dylan Fox. It’s used by contact centers and sales teams to transcribe and analyze customer calls, summarize conversations and detect key moments. As mentioned above, its tech powers features such as real-time subtitles, voice assistants and searchable transcripts for companies such as Granola, Veed and Zoom. In the healthcare space, it automatically generates patient visit notes from recorded conversations. It also creates captions and transcripts for videos, podcasts and meetings.
“It’s very clear that there is a big market opportunity for what we’re doing,” Fox told Crunchbase News in an interview. “For the first couple of years, the tech was bad, the market was small, and it took time before things started to really click and come together.
“And there’s still a huge surface area of stuff that is unexplored and untapped, because the text still isn’t good enough for a lot of stuff,” he added. “So there’s still so much room to grow.”
Usage to AssemblyAI’s API has grown over 250% year over year, according to Fox, who notes that the company has thousands of paying customers and over half a million developers on its platform currently.
Looking ahead, Fox believes another big use case for AssemblyAI’s technology is real-time voice agents that people can talk to over the phone and plug into hardware.
“We work closely with companies like LiveKit, and there are so many in that space that are just taking off,” he said
For GV’s Hulme, one of the most exciting trends he believes is underway in the growth of voice AI is that “we’re returning to humanity’s most natural form of communication.”
After decades of adapting ourselves to technology, “technology is finally adapting to us,” he said.
“Voice and natural language represent the ultimate accessibility hack, democratizing access to computational power for everyone who can think and communicate … It’s worth keeping an eye on because voice is becoming a type of universal remote for the digital world,” Hulme told Crunchbase News. “Whether it’s Big Tech companies or new startups, there are many players jockeying for advantages at the conversational layer.”
Related reading:
Illustration: Dom Guzman

Stay up to date with recent funding rounds, acquisitions, and more with the Crunchbase Daily.
67.1K Followers