If you want to clone a voice, you will know that it often requires collecting massive amounts of datasets that contain hours of recorded speech. These datasets then are used to train a new voice model, but with this Github project, this can all be history.
Clone a voice project
This github projects show a remarkable method to perform Real-Time Voice Cloning, and it comes with a Toolbox that enables anyone to clone a voice from as little as five seconds of sample audio.
This Github repository was open sourced this June as an implementation of the paper Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. The project was developed by Corentin Jemine.
The goal of their project as described in their paper
The goal of this work is to build a TTS system which can generate natural speech for a variety of speakers in a data efficient manner. We specifically address a zero-shot learning setting, where a few seconds of un-transcribed reference audio from a target speaker is used to synthesize new speech in that speaker’s voice, without updating any model parameters
Clone a voice, how does it work?
First you begin with the user, the user has to input a short voice sample into the toolbox, once that has been performed, the model will be generated during playback time. The toolbox then immediately delivers a text-to-speech model in the style of the sampled voice.
Cloning voices can have consequences
Currently, voice recordings are often documentary evidence in important court cases. That is why it is important for all of us to know that voice cloning software is freely available, so that it is clear to everyone that such technology exists. This means that in the future audio evidence may lose its value as voice counterfeiting is possible. What do you think?