close_game
close_game

Microsoft creates an AI speech tool so realistic they decide not to release it

Jul 17, 2024 01:58 PM IST

Microsoft researchers claim VALL-E 2 has achieved “human parity" in speech generation, meaning that whatever it says is indistinguishable from a human voice.

Microsoft has created VALL-E 2, a text-to-speech AI tool that is so realistic that they have decided not to release it to the public, fearing misuse of the ability to impersonate other people’s voices.

A Microsoft logo is seen on an office building in New York City, US. (Reuters)
A Microsoft logo is seen on an office building in New York City, US. (Reuters)

“VALL-E 2 is purely a research project,” Microsoft’s researchers wrote. “Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public.”

Also Read: Working in IT sector? This is how companies are evaluating employees in appraisals

The tech giant’s researchers say that VALL-E 2 has achieved “human parity" in speech generation, which means that whatever the AI says cannot be distinguished from a real human voice.

What can VALL-E 2 be used for?

VALL-E 2 could be used to synthesize speech while maintaining speaker identity. This could lend itself for use in educational learning, entertainment, journalistic, self-authored content, accessibility features, interactive voice response systems, translation, chatbot, and so on, according to Microsoft Research.

The AI can replicate a person's voice for all the above functions with high accuracy, by using just a few seconds of audio of a person speaking.

How does VALL-E 2 manage to be so realistic?

The AI becomes so realistic by using two aspects of its code: These are known as “Repetition Aware Sampling” and “Grouped Code Modeling.”

Also Read: Kiran Mazumdar-Shaw on Karnataka Kannadigas reservation Bill: ‘Must not affect…’

Repetition aware sampling helps the AI to cut down on monotonous speech by recognising small units of language like words or syllables to prevent their repetition and sound more natural.

Grouped code modeling reduces the sequence length and allows the AI to process lesser units of speech to speed up speech generation and reduce the challenge of processing long sentences.

Also Read: Google invests in Indian open-source app Namma Yatri that rivals Ola, Uber: Report

Stay updated with the...
See more
Stay updated with the latest Business News on Petrol Price, Gold Rate, Income Tax Calculator along with Breaking News Events and Latest News Updates on Hindustan Times.
SHARE THIS ARTICLE ON
Share this article
SHARE
Story Saved
Live Score
Saved Articles
Following
My Reads
Sign out
New Delhi 0C
Sunday, September 15, 2024
Start 14 Days Free Trial Subscribe Now
Follow Us On