In 2016, Hammad Syed and Mahmoud Felfel, a former WhatsApp engineer, developed a Chrome extension designed to convert Medium articles into audio. This text-to-speech tool gained recognition on Product Hunt and eventually inspired a full-fledged business.
“A larger opportunity emerged in helping individuals and organizations produce realistic audio content for their applications,” Syed shared with TechCrunch. “With our technology, users can quickly create high-quality speech experiences without having to develop their own models.”
Their company, PlayAI (formerly PlayHT), positions itself as an "AI voice interface." Customers can select from predesigned voices or clone voices and integrate text-to-speech features into their applications using PlayAI’s API. Additional controls allow users to tweak the voices’ cadence, tone, and intonation.
PlayAI offers various features, including a “playground” for uploading files to generate audio, and a dashboard for creating professional narrations and voiceovers. Recently, it ventured into AI agents capable of automating tasks like handling customer calls.
One notable innovation from PlayAI is PlayNote, a tool that converts PDFs, videos, images, songs, and other media into podcast-style content, summaries, debates, or children’s stories. Similar to Google’s NotebookLM, PlayNote processes uploaded files or URLs into scripts and utilizes multiple AI models to produce the final output.
A hands-on test showed promising results. The “podcast” setting, for example, delivered clips comparable in quality to NotebookLM, and the ability to process images and videos led to unique outputs. In one instance, an image of a chicken mole dish inspired a five-minute podcast script.
Despite its capabilities, PlayNote—and PlayAI in general—isn’t without flaws. As with many AI tools, it occasionally generates errors or misinterpretations. For instance, attempting to reformat legal documents might yield subpar results. Additionally, PlayNote’s flexibility can lead to strange creations, such as reimagining the Musk vs. OpenAI lawsuit as a bedtime story.
PlayAI’s advanced voice model, PlayDialog, uses conversational history and context to deliver speech with natural flow, appropriate tone, and varied pacing, making conversations feel more lifelike.
However, the company has faced criticism over its approach to safety. Its voice-cloning tool relies on users affirming they have consent to clone a voice but lacks stringent enforcement mechanisms. Testing revealed ease in cloning public figures’ voices, such as Kamala Harris, raising concerns about potential misuse in scams or deepfakes.
PlayAI asserts that it blocks offensive, sexual, racist, or threatening content, yet testing revealed instances where inappropriate content slipped through without warnings. In response to reports of unauthorized cloning, PlayAI claims to swiftly ban offending users and remove cloned voices. The company also points to safeguards like premium pricing for its highest-quality clones, which may deter misuse.
Syed defends the company’s ethics, citing mechanisms to trace content origin and take corrective actions when necessary. Nonetheless, weak moderation could lead to legal challenges, particularly in states like Tennessee, where unauthorized voice recordings are legally restricted.
PlayAI’s data sources for training its models remain somewhat opaque. The company claims to use open datasets, licensed materials, and proprietary data but avoids user data for model training. However, the reliance on public data could potentially lead to copyright disputes, a growing issue for AI vendors.
Voice cloning has also drawn criticism from voice actors concerned about job displacement and the misuse of their digital likenesses. While some startups have agreements with unions like SAG-AFTRA to ensure ethical practices, controversies persist, especially concerning the use of clones for deceased individuals.
Syed emphasizes PlayAI’s commitment to exclusivity, ensuring that users own the voice clones they create. Still, the company faces mounting competition from both startups and tech giants like Amazon, Microsoft, and Google, as well as established players like ElevenLabs, which reportedly has a $3 billion valuation.
Despite these challenges, PlayAI continues to attract investors. Recently, it secured $21 million in a seed funding round led by 500 Startups and Kindred Ventures. The funds will be used to advance its generative AI models and expand its 40-person team, aiming to deliver faster and more sophisticated voice solutions for businesses.
Post a Comment