Microsoft creates an AI speech tool so realistic they decide not to release it

Microsoft researchers claim VALL-E 2 has achieved “human parity" in speech generation, meaning that whatever it says is indistinguishable from a human voice.

Published on: Jul 17, 2024, 13:58:34 IST

By Abhyjith K. Ashokan

Prefer HTon Google

Share via

Copy link

Microsoft has created VALL-E 2, a text-to-speech AI tool that is so realistic that they have decided not to release it to the public, fearing misuse of the ability to impersonate other people’s voices.

A Microsoft logo is seen on an office building in New York City, US. (Reuters)

“VALL-E 2 is purely a research project,” Microsoft’s researchers wrote. “Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public.”

Also Read: Working in IT sector? This is how companies are evaluating employees in appraisals

The tech giant’s researchers say that VALL-E 2 has achieved “human parity" in speech generation, which means that whatever the AI says cannot be distinguished from a real human voice.

What can VALL-E 2 be used for?

VALL-E 2 could be used to synthesize speech while maintaining speaker identity. This could lend itself for use in educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on, according to Microsoft Research.

The AI can replicate a person's voice for all the above functions with high accuracy, by using just a few seconds of audio of a person speaking.

How does VALL-E 2 manage to be so realistic?

The AI becomes so realistic by using two aspects of its code: These are known as “Repetition Aware Sampling” and “Grouped Code Modeling.”

Also Read: Kiran Mazumdar-Shaw on Karnataka Kannadigas reservation Bill: ‘Must not affect…’

Repetition aware sampling helps the AI to cut down on monotonous speech by recognising small units of language like words or syllables to prevent their repetition and sound more natural.

Grouped code modeling reduces the sequence length and allows the AI to process lesser units of speech to speed up speech generation and reduce the challenge of processing long sentences.

Also Read: Google invests in Indian open-source app Namma Yatri that rivals Ola, Uber: Report

ABOUT THE AUTHOR
Abhyjith K. Ashokan
Deeply passionate about writing, Abhyjith works as a business journalist covering corporates, markets, the economy, and policy - forces that in many ways, shape the world and pave the path for intriguing storytelling. For him, breaking news is a high that only gets matched by the adventures of the open road; both of which he deems essential to what matters at the end of the day. The story.Read More

Stay updated with the latest Business News on Petrol Price, Gold Rate, Silver Rates, Diesel Prices along with Income Tax Calculator