In October 2016, Google Deepmind released WaveNet, which represented a revolution in the quality of audio synthesis. Instead of using traditional vocoding or spectrogram inversion techniques, WaveNet directly generated an audio waveform sample by sample using a deep neural net. Unfortunately, WaveNet was impractical to use in production because its sample-by-sample generation meant that it took minutes to produce a second of audio. Recently, DeepMind came out with a technique whereby a trained WaveNet is used to train a student a student network that is capable of producing audio in parallel using a technique called inverse autoregressive flows. The resulting student model is capable of producing audio just as good as that produced by the original WaveNet but about 3000 times faster.

This exciting technique could also be applied to autoregressive models of other modalities, eliminating one of the main disadvantages of autoregressive models compared with generative adversarial networks and variational autoencoders (the other two prominent families of deep generative models).


Grant Reaber

Grant is interested in deep generative models of audio and other high-dimensional signals. He is especially fascinated by the potential of such models to create new kinds of sound and imagery that did not exist before. His new startup, Respeecher, focuses on the transformation of speech, and its first product allows one person to speak with the voice of another. Grant has an extensive background in mathematics and computer science and holds a Ph.D. in Philosophy from the University of Aberdeen.

Event Timeslots (1)

Track B (Lower Floor)
-
Grant Reaber