Variational autoencoders (or VAE) have become one of the most popular unsupervised learning techniques for modeling complex data distributions, such as images and audio. In this workshop we’ll start with a general introduction to VAEs and then review some of the interesting modifications, such as VQ-VAE (which has been shown to learn rundimentary phoneme-level language model from raw audio without any supervision), beta-VAEs (with disentangled codes) and maybe some others. We’ll then go ahead and extract latent codes from audio (using pretrained models) and play around with them (visualize, perturb, generate and listen, etc.)

Dmytro Bielievtsov

Professional GPU abuser. Machine learning engineer. For the last couple of years I’ve been mainly focusing on modelling human speech and the problem of speech-to-speech mapping in particular. My previous work was mostly connected with electrophysiological and f-MRI data analysis, models of brain dynamics, behavioral data analysis etc.

Grant Reaber

Grant is interested in deep generative models of audio and other high-dimensional signals. He is especially fascinated by the potential of such models to create new kinds of sound and imagery that did not exist before. His new startup, Respeecher, focuses on the transformation of speech, and its first product allows one person to speak with the voice of another. Grant has an extensive background in mathematics and computer science and holds a Ph.D. in Philosophy from the University of Aberdeen.

Event Timeslots (1)

Workshop Room B
Dmytro Bielievtsov, Grant Reaber