OpenAI already has an AI capable of cloning voices

OpenAI is perfecting its voice engine technology for voice cloning. This expansion of the company’s existing text-to-speech API, which took about two years to develop, allows users to upload a 15-second speech sample to create a copy of that voice.

OpenAI’s development of this tool began at the end of 2022 and was initially used to generate its voice Text-to-Speech API, ChatGPT Voice and Read Aloud. It was implemented later Create ChatGPT voice in mobile apps. Throughout 2023, a small group has been using this tool to explore possible use cases, and they have now shared some of the results, although they have not yet announced when it will be available to the public.

“We hope to start a dialogue about the responsible use of synthetic voices and how society can adapt to these new capabilities.” “Based on these conversations and the results of these small tests, we will make a more informed decision about whether and how we implement this technology at scale.” they announced on their blog.

What possible uses does Voice Engine offer?

On the OpenAI blog they have shared some of the developments they have done together partner. The first of them was with Age of learning, an education technology company focused on children’s academic success, has used technology to provide reading assistance for children and non-readers using emotive voices that sound natural and represent a wider variety of speakers than is possible with predefined voices. With GPT-4, they developed real-time, personalized responses to engage with students.

Another test they conducted concerns the platform Hey Gen, dedicated to translating content such as videos and podcasts so that creators and companies can reach wider audiences around the world while maintaining the fluidity and authenticity of their own voices. The company works closely with its enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos. HeyGen used Voice Engine to perform this Video translation, which allows a speaker’s voice to be translated into multiple languages, reaching a global audience. A special feature of the voice engine is that it retains the original speaker’s native accent when translating; For example, if you generate English from an audio sample of a French speaker, the resulting speech retains the French accent.

Read Also:  Sony will solve the problem with the terrible names of its products

They have also collaborated with Dimagi, a company that provides services in remote environments with the goal of reaching global communities. The company is Develop tools to enable community health workers to provide a wide range of essential services, as advice for breastfeeding mothers. To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 Provide interactive feedback in each employee’s primary languageincluding Swahili or more informal languages ​​such as Sheng, a mixed code language popular in Kenya.

Not only did they share some of the uses of the OpenAi language tool, but they also used the statement to point out: “We believe that any widespread use of synthetic voice technology should be accompanied by voice authentication capabilities that verify that the original speaker knowingly adds their voice to the service, as well as a blocked voice list that detects voices that are too similar and prevents the creation of prominent people«.

Recent Articles

Related News

Leave A Reply

Please enter your comment!
Please enter your name here