Clone a voice in 5 seconds

If you want to clone a voice, you will know that it often requires collecting massive amounts of datasets that contain hours of recorded speech. These datasets then are used to train a new voice model, but with this Github project, this can all be history.

Clone a voice project

This github projects show a remarkable method to perform Real-Time Voice Cloning, and it comes with a Toolbox that enables anyone to clone a voice from as little as five seconds of sample audio.

Voice cloning tool
Voice cloning image from the paper

This Github repository was open sourced this June as an implementation of the paper Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. The project was developed by Corentin Jemine.

The goal of their project as described in their paper

The goal of this work is to build a TTS system which can generate natural speech for a variety of speakers in a data efficient manner. We specifically address a zero-shot learning setting, where a few seconds of un-transcribed reference audio from a target speaker is used to synthesize new speech in that speaker’s voice, without updating any model parameters

Clone a voice, how does it work?

First you begin with the user, the user has to input a short voice sample into the toolbox, once that has been performed, the model will be generated during playback time. The toolbox then immediately delivers a text-to-speech model in the style of the sampled voice.

Cloning voices can have consequences

Currently, voice recordings are often documentary evidence in important court cases. That is why it is important for all of us to know that voice cloning software is freely available, so that it is clear to everyone that such technology exists. This means that in the future audio evidence may lose its value as voice counterfeiting is possible. What do you think?

Share This Message