You’re reading Entrepreneur India, an international franchise of Entrepreneur Media.
A new AI tool by Microsoft has garnered quite the attention. Vall-E’s AI Text To Speech system (TTS) can take a three second recording of a person and then convert written words into a speech in that person’s voice. What is the most frightful and astonishing part about the tool is its accuracy.
According to Microsoft Vall-E is a ‘neural codec language model’. Unlike other voice generators Vall-E uses a different approach to attain higher accuracy. The TTS training data was measured to 60,000 hours of English speech. The company claims its data is hundred times larger than other existing systems in the market.
What makes this AI tool stand out according to Microsoft’s claims is that it was allowed to “significantly outperform”. Vall-E does not require any specific data to be fed into the system, just a 3 second audio recording and a text prompt.
The coolest feature about the tool is its ability to preserve the emotions of the speaker. Microsoft has demonstrated the tools ability on the GitHub page. The 3 second audio can be recorded in any tone such as angry,sleepy, disgusted, ets and Vall-E will recite the text in the same tone.The tool can be of great help to those who lost their voice or ability to speak. Furthermore, the tool can help add an emotion to regular artificial intelligence voice commands.
One of the growing concerns around the tool is due to its accuracy. The tool can be misused to spook voice identifications or impersonation of speakers. After witnessing the problems caused with fake news and the wide spread misinformation Vall-E may just help scammers up their game.
Leave a Reply