Highlights from the Deep Learning Summer School

A few weeks ago, Janos and I attended the Deep Learning Summer School at the University of Montreal. Various well-known researchers covered topics related to deep learning, from reinforcement learning to computational neuroscience (see the list of speakers with slides and videos). Here are a few ideas that I found interesting in the talks (this list is far from exhaustive):

Cross-modal learning (Antonio Torralba)

You can do transfer learning in convolutional neural nets by freezing the parameters in some layers and retraining others on a different domain for the same task (paper). For example, if you have a neural net for scene recognition trained on real images of bedrooms, you could reuse the same architecture to recognize drawings of bedrooms. The last few layers represent abstractions like “bed” or “lamp”, which apply to drawings just as well as to real images, while the first few layers represent textures, which would differ between the two data modalities of real images and drawings. More generally, the last few layers are task-dependent and modality-independent, while the first few layers are the opposite.


Importance weighted autoencoders (Ruslan Salakhutdinov)

vaeThe variational autoencoder (VAE) is a popular generative model that constructs an autoencoder out of a generative network (encoder) and recognition network (decoder). It then trains these networks to optimize a variational approximation of the posterior distribution by maximizing a lower bound on the log likelihood. IWAE is a variation that tightens the variational lower bound by relaxing the assumptions about the form of the posterior distribution . While the VAE maximizes a lower bound based on a single sample from the recognition distribution, the IWAE lower bound uses a weighted average over several samples. Applying importance weighting over several samples avoids the failure mode where the VAE objective penalizes models that produce even a few samples through the recognition network that don’t fit the posterior from the generative network, and taking several samples allows for better approximation of the posterior and thus a tighter lower bound.(The IWAE paper also gives a more intuitive introduction to VAE than the original paper, in my opinion.)

Variations on RNNs (Yoshua Bengio)

hierarchical rnnThis talk mentioned a few recurrent neural network (RNN) models that were unfamiliar to me. Variational RNNs introduce some elements of variational autoencoders into RNNs by adding latent variables (z) into the top hidden layer (paper). The RNN internal structure is entirely deterministic besides the output probability model, so it can be helpful to inject a higher-level source of noise to model highly structured data (e.g. speech). This was further extended with multiresolution RNNs, which are variational and hierarchical (paper). Another interesting model is real-time recurrent learning, a more biologically plausible alternative to backpropagation through time, where gradients are computed in an online feedforward manner without revisiting past history backwards. The originally proposed version involves a fairly inefficient exact computation of parameter gradients, while a more efficient recent approach approximates the forward gradient instead (paper).

Some other talks I really liked but ran out of steam to write about: Joelle Pineau’s intro to reinforcement learning, Pieter Abbeel on deep reinforcement learning, Shakir Mohamed on deep generative models, Surya Ganguli on neuroscience and deep learning.


Clopen AI: Openness in different aspects of AI development

1-clopen-setThere has been a lot of discussion about the appropriate level of openness in AI research in the past year – the OpenAI announcement, the blog post Should AI Be Open?, a response to the latter, and Nick Bostrom’s thorough paper Strategic Implications of Openness in AI development.

There is disagreement on this question within the AI safety community as well as outside it. Many people are justifiably afraid of concentrating power to create AGI and determine its values in the hands of one company or organization. Many others are concerned about the information hazards of open-sourcing AGI and the resulting potential for misuse. In this post, I argue that some sort of compromise between openness and secrecy will be necessary, as both extremes of complete secrecy and complete openness seem really bad. The good news is that there isn’t a single axis of openness vs secrecy – we can make separate judgment calls for different aspects of AGI development, and develop a set of guidelines.

Information about AI development can be roughly divided into two categories – technical and strategic. Technical information includes research papers, data, source code (for the algorithm, objective function), etc. Strategic information includes goals, forecasts and timelines, the composition of ethics boards, etc. Bostrom argues that openness about strategic information is likely beneficial both in terms of short- and long-term impact, while openness about technical information is good on the short-term, but can be bad on the long-term due to increasing the race condition. We need to further consider the tradeoffs of releasing different kinds of technical information.

Sharing papers and data is both more essential for the research process and less potentially dangerous than sharing code, since it is hard to reconstruct the code from that information alone. For example, it can be difficult to reproduce the results of a neural network algorithm based on the research paper, given the difficulty of tuning the hyperparameters and differences between computational architectures.

Releasing all the code required to run an AGI into the world, especially before it’s been extensively debugged, tested, and safeguarded against bad actors, would be extremely unsafe. Anyone with enough computational power could run the code, and it would be difficult to shut down the program or prevent it from copying itself all over the Internet.

However, releasing none of the source code is also a bad idea. It would currently be impractical, given the strong incentives for AI researchers to share at least part of the code for recognition and replicability. It would also be suboptimal, since sharing some parts of the code is likely to contribute to safety. For example, it would make sense to open-source the objective function code without the optimization code, which would reveal what is being optimized for but not how. This could make it possible to verify whether the objective is sufficiently representative of society’s values – the part of the system that would be the most understandable and important to the public anyway.

It is rather difficult to verify to what extent a company or organization is sharing their technical information on AI development, and enforce either complete openness or secrecy. There is not much downside to specifying guidelines for what is expected to be shared and what isn’t. Developing a joint set of openness guidelines on the short and long term would be a worthwhile endeavor for the leading AI companies today.

(Cross-posted to the FLI blog and Approximately Correct. Thanks to Jelena Luketina and Janos Kramar for their detailed feedback on this post!)

New AI safety research agenda from Google Brain

Google Brain just released an inspiring research agenda, Concrete Problems in AI Safety, co-authored by researchers from OpenAI, Berkeley and Stanford. This document is a milestone in setting concrete research objectives for keeping reinforcement learning agents and other AI systems robust and beneficial. The problems studied are relevant both to near-term and long-term AI safety, from cleaning robots to higher-stakes applications. The paper takes an empirical focus on avoiding accidents as modern machine learning systems become more and more autonomous and powerful.

Reinforcement learning is currently the most promising framework for building artificial agents – it is thus especially important to develop safety guidelines for this subfield of AI. The research agenda describes a comprehensive (though likely non-exhaustive) set of safety problems, corresponding to where things can go wrong when building AI systems:

  • Mis-specification of the objective function by the human designer. Two common pitfalls when designing objective functions are negative side-effects and reward hacking (also known as wireheading), which are likely to happen by default unless we figure out how to guard against them. One of the key challenges is specifying what it means for an agent to have a low impact on the environment while achieving its objectives effectively.

  • Extrapolation from limited information about the objective function. Even with a correct objective function, human supervision is likely to be costly, which calls for scalable oversight of the artificial agent.

  • Extrapolation from limited training data or using an inadequate model. We need to develop safe exploration strategies that avoid irreversibly bad outcomes, and build models that are robust to distributional shift – able to fail gracefully in situations that are far outside the training data distribution.

The AI research community is increasingly focusing on AI safety in recent years, and Google Brain’s agenda is part of this trend. It follows on the heels of the Safely Interruptible Agents paper from Google DeepMind and the Future of Humanity Institute, which investigates how to avoid unintended consequences from interrupting or shutting down reinforcement learning agents. We at FLI are super excited that industry research labs at Google and OpenAI are spearheading and fostering collaboration on AI safety research, and look forward to the outcomes of this work.

(Cross-posted from the FLI blog.)

Using humility to counteract shame

u0sm9wx“Pride is not the opposite of shame, but its source. True humility is the only antidote to shame.”

Uncle Iroh, “Avatar: The Last Airbender”


Shame is one of the trickiest emotions to deal with. It is difficult to think about, not to mention discuss with others, and gives rise to insidious ugh fields and negative spirals. Shame often underlies other negative emotions without making itself apparent – anxiety or anger at yourself can be caused by unacknowledged shame about the possibility of failure. It can stack on top of other emotions – e.g. you start out feeling upset with someone, and end up being ashamed of yourself for feeling upset, and maybe even ashamed of feeling ashamed if meta-shame is your cup of tea. The most useful approach I have found against shame is invoking humility.

What is humility, anyway? It is often defined as a low view of your own importance, and tends to be conflated with modesty. Another common definition that I find more useful is acceptance of your own flaws and shortcomings. This is more compatible with confidence, and helpful irrespective of your level of importance or comparison to other people. What humility feels like to me on a system 1 level is a sense of compassion and warmth towards yourself while fully aware of your imperfections (while focusing on imperfections without compassion can lead to beating yourself up). According to LessWrong, “to be humble is to take specific actions in anticipation of your own errors”, which seems more like a possible consequence of being humble than a definition.

Humility is a powerful tool for psychological well-being and instrumental rationality that is more broadly applicable than just the ability to anticipate errors by seeing your limitations more clearly. I can summon humility when I feel anxious about too many upcoming deadlines, or angry at myself for being stuck on a rock climbing route, or embarrassed about forgetting some basic fact in my field that I am surely expected to know by the 5th year of grad school.

While humility comes naturally to some people, others might find it useful to explicitly build an identity as a humble person. How can you invoke this mindset? One way is through negative visualization or pre-hindsight, considering how your plans could fail, which can be time-consuming and usually requires system 2. A faster and less effortful way is to is to imagine a person, real or fictional, who you consider to be humble. I often bring to mind my grandfather, or Uncle Iroh from the Avatar series, sometimes literally repeating the above quote in my head, sort of like an affirmation. I don’t actually agree that humility is the only antidote to shame, but it does seem to be one of the most effective.

(Cross-posted to LessWrong. Thanks to Janos Kramar for his feedback on this post.)

Introductory resources on AI safety research

Reading list to get up to speed on the main ideas in the field. The resources are selected for relevance and/or brevity, and the list is not meant to be comprehensive. [Updated on 15 August 2017.]


For a popular audience:

Cade Metz, 2017. New York Times: Teaching A.I. Systems to Behave Themselves

FLI. AI risk background and FAQ. At the bottom of the background page, there is a more extensive list of resources on AI safety.

Tim Urban, 2015. Wait But Why: The AI Revolution. An accessible introduction to AI risk forecasts and arguments (with cute hand-drawn diagrams, and a few corrections from Luke Muehlhauser).

OpenPhil, 2015. Potential risks from advanced artificial intelligence. An overview of AI risks and timelines, possible interventions, and current actors in this space.

For a more technical audience:

Stuart Russell:

  • The long-term future of AI (longer version), 2015. A video of Russell’s classic talk, discussing why it makes sense for AI researchers to think about AI safety, and going over various misconceptions about the issues.
  • Concerns of an AI pioneer, 2015. An interview with Russell on the importance of provably aligning AI with human values, and the challenges of value alignment research.
  • On Myths and Moonshine, 2014. Russell’s response to the “Myth of AI” question on Edge.org, which draws an analogy between AI research and nuclear research, and points out some dangers of optimizing a misspecified utility function.

Scott Alexander, 2015. No time like the present for AI safety work. An overview of long-term AI safety challenges, e.g. preventing wireheading and formalizing ethics.

Victoria Krakovna, 2015. AI risk without an intelligence explosion. An overview of long-term AI risks besides the (overemphasized) intelligence explosion / hard takeoff scenario, arguing why intelligence explosion skeptics should still think about AI safety.

Stuart Armstrong, 2014. Smarter Than Us: The Rise Of Machine Intelligence. A short ebook discussing potential promises and challenges presented by advanced AI, and the interdisciplinary problems that need to be solved on the way there.

Technical overviews

Soares and Fallenstein, 2017. Aligning Superintelligence with Human Interests: A Technical Research Agenda

Amodei, Olah, et al, 2016. Concrete Problems in AI safety. Research agenda focusing on accident risks that apply to current ML systems as well as more advanced future AI systems.

Jessica Taylor et al, 2016. Alignment for Advanced Machine Learning Systems

FLI, 2015. A survey of research priorities for robust and beneficial AI

Jacob Steinhardt, 2015. Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems. A taxonomy of AI safety issues that require ordinary vs extraordinary engineering to address.

Nate Soares, 2015. Safety engineering, target selection, and alignment theory. Identifies and motivates three major areas of AI safety research.

Nick Bostrom, 2014. Superintelligence: Paths, Dangers, Strategies. A seminal book outlining long-term AI risk considerations.

Steve Omohundro, 2007. The basic AI drives. A classic paper arguing that sufficiently advanced AI systems are likely to develop drives such as self-preservation and resource acquisition independently of their assigned objectives.

Technical work

Value learning:

Smitha Milli et al. Should robots be obedient? Obedience to humans may sound like a great thing, but blind obedience can get in the way of learning human preferences.

William Saunders et al, 2017. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. (blog post)

Amin, Jiang, and Singh, 2017. Repeated Inverse Reinforcement Learning. Separates the reward function into a task-specific component and an intrinsic component. In a sequence of task, the agent learns the intrinsic component while trying to avoid surprising the human.

Dylan Hadfield-Menell et al, 2016. Cooperative inverse reinforcement learning. Defines value learning as a cooperative game where the human tries to teach the agent about their reward function, rather than giving optimal demonstrations like in standard IRL.

Owain Evans et al, 2016. Learning the Preferences of Ignorant, Inconsistent Agents.

Reward gaming / wireheading:

Tom Everitt et al, 2017. Reinforcement learning with a corrupted reward channel. A formalization of the reward misspecification problem in terms of true and corrupt reward, a proof that RL agents cannot overcome reward corruption, and a framework for giving the agent extra information to overcome reward corruption. (blog post)

Amodei and Clark, 2016. Faulty Reward Functions in the Wild. An example of reward function gaming in a boat racing game, where the agent gets a higher score by going in circles and hitting the same targets than by actually playing the game.

Everitt and Hutter, 2016. Avoiding Wireheading with Value Reinforcement Learning. An alternative to RL that reduces the incentive to wirehead.

Laurent Orseau, 2015. Wireheading. An investigation into how different types of artificial agents respond to opportunities to wirehead (unintended shortcuts to maximize their objective function).

Interruptibility / corrigibility:

Dylan Hadfield-Menell et al. The Off-Switch Game. This paper studies the interruptibility problem as a game between human and robot, and investigates which incentives the robot could have to allow itself to be switched off.

El Mahdi El Mhamdi et al, 2017. Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning.

Orseau and Armstrong, 2016. Safely interruptible agents. Provides a formal definition of safe interruptibility and shows that off-policy RL agents are more interruptible than on-policy agents. (blog post)

Nate Soares et al, 2015. Corrigibility. Designing AI systems without incentives to resist corrective modifications by their creators.

Scalable oversight:

Christiano, Leike et al, 2017. Deep reinforcement learning from human preferences. Communicating complex goals to AI systems using human feedback (comparing pairs of agent trajectory segments).

David Abel et al. Agent-Agnostic Human-in-the-Loop Reinforcement Learning.


Armstrong and Levinstein, 2017. Low Impact Artificial Intelligences. An intractable but enlightening definition of low impact for AI systems.

Babcock, Kramar and Yampolskiy, 2017. Guidelines for Artificial Intelligence Containment.

Scott Garrabrant et al, 2016. Logical Induction. A computable algorithm for the logical induction problem.

Note: I did not include literature on less neglected areas of the field like safe exploration, distributional shift, adversarial examples, or interpretability (see e.g. Concrete Problems or the CHAI bibliography for extensive references on these topics).

Collections of technical works

CHAI bibliography

MIRI publications

FHI publications

FLI grantee publications (scroll down)

Paul Christiano. AI control. A blog on designing safe, efficient AI systems (approval-directed agents, aligned reinforcement learning agents, etc).

If there are any resources missing from this list that you think are a must-read, please let me know! If you want to go into AI safety research, check out these guidelines and the AI Safety Syllabus.

(Thanks to Ben Sancetta, Taymon Beal and Janos Kramar for their feedback on this post.)

To contribute to AI safety, consider doing AI research

Among those concerned about risks from advanced AI, I’ve encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway.

I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:

  1. Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
  2. Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
  3. Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.

To elaborate on #1, one of the prevailing arguments against taking long-term AI safety seriously is that not enough experts in the AI field are worried. Several prominent researchers have commented on the potential risks (Stuart Russell, Bart Selman, Murray Shanahan, Shane Legg, and others), and more are concerned but keep quiet for reputational reasons. An accomplished, strategically outspoken and/or well-connected expert can make a big difference in the attitude distribution in the AI field and the level of familiarity with the actual concerns (which are not about malevolence, sentience, or marching robot armies). Having more informed skeptics who have maybe even read Superintelligence, and fewer uninformed skeptics who think AI safety proponents are afraid of Terminators, would produce much needed direct and productive discussion on these issues. As the proportion of informed and concerned researchers in the field approaches critical mass, the reputational consequences for speaking up will decrease.

A year after FLI’s Puerto Rico conference, the subject of long-term AI safety is no longer taboo among AI researchers, but remains rather controversial. Addressing AI risk on the long term will require safety work to be a significant part of the field, and close collaboration between those working on safety and capability of advanced AI. Stuart Russell makes the apt analogy that “just as nuclear fusion researchers consider the problem of containment of fusion reactions as one of the primary problems of their field, issues of control and safety will become central to AI as the field matures”. If more people who are already concerned about AI safety join the field, we can make this happen faster, and help wisdom win the race with capability.

(Cross-posted to LessWrong. Thanks to Janos Kramar for his help with editing this post.)

2015-16 New Year review

2015 progress


  • Finished paper on the Selective Bayesian Forest Classifier algorithm
  • Made an R package for SBFC (beta)
  • Worked at Google on unsupervised learning for the Knowledge Graph with Moshe Looks during the summer (paper)
  • Joined the HIPS research group at Harvard CS and started working with the awesome Finale Doshi-Velez
  • Ratio of coding time to writing time was too high overall


  • Co-organized two meetings to brainstorm biotechnology risks
  • Co-organized two Machine Learning Safety meetings
  • Gave a talk at the Shaping Humanity’s Trajectory workshop at EA Global
  • Helped organize NIPS symposium on societal impacts of AI

Rationality / effectiveness:

  • Extensive use of FollowUpThen for sending reminders to future selves
  • Mapped out my personal bottlenecks
  • Sleep:
    • Tracked insomnia (26% of nights) and sleep time (average 1:30am, stayed up past 1am on 31% of nights)
    • Started working on sleep hygiene
    • Stopped using melatonin (found it ineffective)

Random cool things I did:

  • Improv class
  • Aerial silks class
  • Climbed out of a glacial abyss (moulin)
  • Placed second at Toastmasters area speech contest

2015 prediction outcomes

Out of the 17 predictions I made a year ago, 5 were true, and the rest were false.

  1. Submit the SBFC paper for publication (95%)
  2. Submit another paper besides SBFC (40%)
  3. Present SBFC results at a conference (JSM, ICML or NIPS) (40%) – presented at a workshop (NESS)
  4. Get a new external fellowship to replace my expiring NSERC fellowship (50%)
  5. Skim at least 20 research papers in machine learning (70%) – probably a lot more
  6. Write at least 12 blog posts (70%) – wrote 9 posts
  7. Climb a 5.12 without rope cheating (50%) – no longer endorsed at this level
  8. Lead climb a 5.11a (50%) – no longer endorsed at this level
  9. Do 10 pullups in a row (60%) – no longer endorsed at this level
  10. Meditate at least 150 times (80%) – 206 times
  11. Record at least 150 new thoughts (70%) – recorded 62, no longer endorsed at this level
  12. Make at least 100 Anki cards by the end of the year (70%)
  13. Read at least 10 books (60%) – read 4 books, no longer endorsed at this level
  14. Attend Burning Man (90%)
  15. Boston will have a second rationalist house by the end of the year (30%)
  16. FLI will hire a full-time project manager or administrator (80%) – no, but we now have a full time website editor…
  17. FLI will start a project on biotech safety (70%) – had some meetings, but no concrete action plan yet


  • low predictions, 30-60%: 0/8 = 0% (super overconfident)
  • high predictions, 70-95%: 5/9 = 56% (overconfident)

(Yikes! Worse than last year…)


  • I forgot about most of these goals after a few months – will need a recurring reminder for next year.
  • All 3 physical goals ended up disendorsed – I think I set those way too high. My climbing habits got disrupted by moving to California in summer and a hand injury, so I’m still trying to return to my spring 2014 skill level.

2016 goals and predictions

Given the overconfidence of last year’s predictions, toning it down for next year.


  1. Finish PhD thesis (70%)
  2. Write at least 12 blog posts (40%)
  3. Meditate at least 200 days (50%)
  4. Exercise at least 200 days (50%)
  5. Do at least 5 pullups in a row (40%)
  6. Record at least 50 new thoughts (50%)
  7. Stay up at most 20% of the nights (40%)
  8. Do at least 10 pomodoros per week on average (50%)


  1. At least one paper accepted for publication (70%)
  2. I will get at least one fellowship (40%)
  3. Insomnia at most 20% of nights (20%)
  4. FLI will co-organize at least 3 AI safety workshops (50%)