A few weeks ago, Janos and I attended the Deep Learning Summer School at the University of Montreal. Various well-known researchers covered topics related to deep learning, from reinforcement learning to computational neuroscience (see the list of speakers with slides and videos). Here are a few ideas that I found interesting in the talks (this list is far from exhaustive):

#### Cross-modal learning (Antonio Torralba)

You can do transfer learning in convolutional neural nets by freezing the parameters in some layers and retraining others on a different domain for the same task (paper). For example, if you have a neural net for scene recognition trained on real images of bedrooms, you could reuse the same architecture to recognize drawings of bedrooms. The last few layers represent abstractions like “bed” or “lamp”, which apply to drawings just as well as to real images, while the first few layers represent textures, which would differ between the two data modalities of real images and drawings. More generally, the last few layers are task-dependent and modality-independent, while the first few layers are the opposite.

#### Importance weighted autoencoders (Ruslan Salakhutdinov)

The variational autoencoder (VAE) is a popular generative model that constructs an autoencoder out of a generative network (encoder) and recognition network (decoder). It then trains these networks to optimize a variational approximation of the posterior distribution by maximizing a lower bound on the log likelihood. IWAE is a variation that tightens the variational lower bound by relaxing the assumptions about the form of the posterior distribution . While the VAE maximizes a lower bound based on a single sample from the recognition distribution, the IWAE lower bound uses a weighted average over several samples. Applying importance weighting over several samples avoids the failure mode where the VAE objective penalizes models that produce even a few samples through the recognition network that don’t fit the posterior from the generative network, and taking several samples allows for better approximation of the posterior and thus a tighter lower bound.(The IWAE paper also gives a more intuitive introduction to VAE than the original paper, in my opinion.)

#### Variations on RNNs (Yoshua Bengio)

This talk mentioned a few recurrent neural network (RNN) models that were unfamiliar to me. Variational RNNs introduce some elements of variational autoencoders into RNNs by adding latent variables (z) into the top hidden layer (paper). The RNN internal structure is entirely deterministic besides the output probability model, so it can be helpful to inject a higher-level source of noise to model highly structured data (e.g. speech). This was further extended with multiresolution RNNs, which are variational and hierarchical (paper). Another interesting model is real-time recurrent learning, a more biologically plausible alternative to backpropagation through time, where gradients are computed in an online feedforward manner without revisiting past history backwards. The originally proposed version involves a fairly inefficient exact computation of parameter gradients, while a more efficient recent approach approximates the forward gradient instead (paper).

Some other talks I really liked but ran out of steam to write about: Joelle Pineau’s intro to reinforcement learning, Pieter Abbeel on deep reinforcement learning, Shakir Mohamed on deep generative models, Surya Ganguli on neuroscience and deep learning.

(Part 2 will be a guest post by Janos on policy gradients – coming soon!)