Google announces its first ever direct speech-to-speech translator known as “Translatotron”. While translation, this system offers to maintain the speaker’s voice and tempo.
According to Ye Jia and Ron Weiss, AI experts at Google, this system works on a sequence-to-sequence basis. Here, the input will be taking source from spectrograms which are visual representation of frequencies. Input is processed in spectrograms and out will also be on spectrograms of the content that needs translation. This process can work on any language which means verbal translation from one language to another.
Coding and Encoding of Translatotron
This system works on two separate highly skillful components. One is the speaker encoder that helps to determine and maintain the source of the speaker’s voice synthesizing speech. Another is vocoder that helps convert output spectrograms in waveforms.
‘Translatotron’ is able to retain the original voice of the speaker. This does not make the translation speech sound unusual and jarring”, says Jia and Weiss. They further added that they are looking forward to further research on this end-to-end direct translation system. At the moment, Google’s newly developed translation system is slightly slumping as compared to the current cascade system.
Currently, Google is keenly working on adding accents and more languages to translation system. They also intend to add region-based pronunciation feature in the system for real -time translation process.
The giant search engine, Google says that ‘Translatotron ‘shows the higher feasibility of end to end translation of the direct speech. It is the first ever end to end translator that offers direct speech translation from one language to another.