Category Archives: conferences

AI Safety Highlights from NIPS 2016

This year’s Neural Information Processing Systems conference was larger than ever, with almost 6000 people attending, hosted in a huge convention center in Barcelona, Spain. The conference started off with two exciting announcements on open-sourcing collections of environments for training and testing general AI capabilities – the DeepMind Lab and the OpenAI Universe. Among other things, this is promising for testing safety properties of ML algorithms. OpenAI has already used their Universe environment to give an entertaining and instructive demonstration of reward hacking that illustrates the challenge of designing robust reward functions.

I was happy to see a lot of AI-safety-related content at NIPS this year. The ML and the Law symposium and Interpretable ML for Complex Systems workshop focused on near-term AI safety issues, while the Reliable ML in the Wild workshop also covered long-term problems. Here are some papers relevant to long-term AI safety:

Inverse Reinforcement Learning

Cooperative Inverse Reinforcement Learning (CIRL) by Hadfield-Menell, Russell, Abbeel, and Dragan (main conference). This paper addresses the value alignment problem by teaching the artificial agent about the human’s reward function, using instructive demonstrations rather than optimal demonstrations like in classical IRL (e.g. showing the robot how to make coffee vs having it observe coffee being made). (3-minute video)


ssrlGeneralizing Skills with Semi-Supervised Reinforcement Learning by Finn, Yu, Fu, Abbeel, and Levine (Deep RL workshop). This work addresses the scalable oversight problem by proposing the first tractable algorithm for semi-supervised RL. This allows artificial agents to robustly learn reward functions from limited human feedback. The algorithm uses an IRL-like approach to infer the reward function, using the agent’s own prior experiences in the supervised setting as an expert demonstration.

interactive-irlTowards Interactive Inverse Reinforcement Learning by Armstrong and Leike (Reliable ML workshop). This paper studies the incentives of an agent that is trying to learn about the reward function while simultaneously maximizing the reward. The authors discuss some ways to reduce the agent’s incentive to manipulate the reward learning

Should Robots Have Off Switches? by Milli, Hadfield-Menell, and Russell (Reliable ML workshop). This poster examines some adverse effects of incentivizing artificial agents to be compliant in the off-switch game (a variant of CIRL).

Safe exploration

safemdpSafe Exploration in Finite Markov Decision Processes with Gaussian Processes by Turchetta, Berkenkamp, and Krause (main conference). This paper develops a reinforcement learning algorithm called Safe MDP that can explore an unknown environment without getting into irreversible situations, unlike classical RL approaches.intrinsic_fear

Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear by Lipton, Gao, Li, Chen, and Deng (Reliable ML workshop). This work addresses the ‘Sisyphean curse’ of DQN algorithms forgetting past experiences, as they become increasingly unlikely under a new policy, and therefore eventually repeating catastrophic mistakes. The paper introduces an approach called ‘intrinsic fear’, which maintains a model for how likely different states are to lead to a catastrophe within some number of steps.


Most of these papers were related to inverse reinforcement learning – while IRL is a promising approach, it would be great to see more varied safety material at the next NIPS (fingers crossed for some innovative contributions from Rocket AI!). There were some more safety papers on other topics at UAI this summer: Safely Interruptible Agents (formalizing what it means to incentivize an agent to obey shutdown signals) and A Formal Solution to the Grain of Truth Problem (providing a broad theoretical framework for multiple agents learning to predict each other in arbitrary computable games).

(Cross-posted to Approximately Correct and the FLI blog. Thanks to Jan Leike, Zachary Lipton, and Janos Kramar for providing feedback on this post.)

OpenAI unconference on machine learning

Last weekend, I attended OpenAI’s self-organizing conference on machine learning (SOCML 2016), meta-organized by Ian Goodfellow (thanks Ian!). It was held at OpenAI’s new office, with several floors of large open spaces. The unconference format was intended to encourage people to present current ideas alongside with completed work. The schedule mostly consisted of 2-hour blocks with broad topics like “reinforcement learning” and “generative models”, guided by volunteer moderators. I especially enjoyed the sessions on neuroscience and AI and transfer learning, which had smaller and more manageable groups than the crowded popular sessions, and diligent moderators who wrote down the important points on the whiteboard. Overall, I had more interesting conversation but also more auditory overload at SOCML than at other conferences.

To my excitement, there was a block for AI safety along with the other topics. The safety session became a broad introductory Q&A, moderated by Nate Soares, Jelena Luketina and me. Some topics that came up: value alignment, interpretability, adversarial examples, weaponization of AI.


AI safety discussion group (image courtesy of Been Kim)

One value alignment question was how to incorporate a diverse set of values that represents all of humanity in the AI’s objective function. We pointed out that there are two complementary problems: 1) getting the AI’s values to be in the small part of values-space that’s human-compatible, and 2) averaging over that space in a representative way. People generally focus on the ways in which human values differ from each other, which leads them to underestimate the difficulty of the first problem and overestimate the difficulty of the second. We also agreed on the importance of allowing for moral progress by not locking in the values of AI systems.

Nate mentioned some alternatives to goal-optimizing agents – quantilizers and approval-directed agents. We also discussed the limitations of using blacklisting/whitelisting in the AI’s objective function: blacklisting is vulnerable to unforeseen shortcuts and usually doesn’t work from a security perspective, and whitelisting hampers the system’s ability to come up with creative solutions (e.g. the controversial move 37 by AlphaGo in the second game against Sedol).

Been Kim brought up the recent EU regulation on the right to explanation for algorithmic decisions. This seems easy to game due to lack of good metrics for explanations. One proposed metric was that a human would be able to predict future model outputs from the explanation. This might fail for better-than-human systems by penalizing creative solutions if applied globally, but seems promising as a local heuristic.

Ian Goodfellow mentioned the difficulties posed by adversarial examples: an imperceptible adversarial perturbation to an image can make a convolutional network misclassify it with very high confidence. There might be some kind of No Free Lunch theorem where making a system more resistant to adversarial examples would trade off with performance on non-adversarial data.

We also talked about dual-use AI technologies, e.g. advances in deep reinforcement learning for robotics that could end up being used for military purposes. It was unclear whether corporations or governments are more trustworthy with using these technologies ethically: corporations have a profit motive, while governments are more likely to weaponize the technology.


More detailed notes by Janos coming soon! For a detailed overview of technical AI safety research areas, I highly recommend reading Concrete Problems in AI Safety.

Cross-posted to the FLI blog.

Highlights from the Deep Learning Summer School

A few weeks ago, Janos and I attended the Deep Learning Summer School at the University of Montreal. Various well-known researchers covered topics related to deep learning, from reinforcement learning to computational neuroscience (see the list of speakers with slides and videos). Here are a few ideas that I found interesting in the talks (this list is far from exhaustive):

Cross-modal learning (Antonio Torralba)

You can do transfer learning in convolutional neural nets by freezing the parameters in some layers and retraining others on a different domain for the same task (paper). For example, if you have a neural net for scene recognition trained on real images of bedrooms, you could reuse the same architecture to recognize drawings of bedrooms. The last few layers represent abstractions like “bed” or “lamp”, which apply to drawings just as well as to real images, while the first few layers represent textures, which would differ between the two data modalities of real images and drawings. More generally, the last few layers are task-dependent and modality-independent, while the first few layers are the opposite.


Importance weighted autoencoders (Ruslan Salakhutdinov)

vaeThe variational autoencoder (VAE) is a popular generative model that constructs an autoencoder out of a generative network (encoder) and recognition network (decoder). It then trains these networks to optimize a variational approximation of the posterior distribution by maximizing a lower bound on the log likelihood. IWAE is a variation that tightens the variational lower bound by relaxing the assumptions about the form of the posterior distribution . While the VAE maximizes a lower bound based on a single sample from the recognition distribution, the IWAE lower bound uses a weighted average over several samples. Applying importance weighting over several samples avoids the failure mode where the VAE objective penalizes models that produce even a few samples through the recognition network that don’t fit the posterior from the generative network, and taking several samples allows for better approximation of the posterior and thus a tighter lower bound.(The IWAE paper also gives a more intuitive introduction to VAE than the original paper, in my opinion.)

Variations on RNNs (Yoshua Bengio)

hierarchical rnnThis talk mentioned a few recurrent neural network (RNN) models that were unfamiliar to me. Variational RNNs introduce some elements of variational autoencoders into RNNs by adding latent variables (z) into the top hidden layer (paper). The RNN internal structure is entirely deterministic besides the output probability model, so it can be helpful to inject a higher-level source of noise to model highly structured data (e.g. speech). This was further extended with multiresolution RNNs, which are variational and hierarchical (paper). Another interesting model is real-time recurrent learning, a more biologically plausible alternative to backpropagation through time, where gradients are computed in an online feedforward manner without revisiting past history backwards. The originally proposed version involves a fairly inefficient exact computation of parameter gradients, while a more efficient recent approach approximates the forward gradient instead (paper).

Some other talks I really liked but ran out of steam to write about: Joelle Pineau’s intro to reinforcement learning, Pieter Abbeel on deep reinforcement learning, Shakir Mohamed on deep generative models, Surya Ganguli on neuroscience and deep learning.

Highlights and impressions from NIPS conference on machine learning

This year’s NIPS was an epicenter of the current enthusiasm about AI and deep learning – there was a visceral sense of how quickly the field of machine learning is progressing, and two new AI startups were announced. Attendance has almost doubled compared to the 2014 conference (I hope they make it multi-track next year), and several popular workshops were standing room only. Given that there were only 400 accepted papers and almost 4000 people attending, most people were there to learn and socialize. The conference was a socially intense experience that reminded me a bit of Burning Man – the overall sense of excitement, the high density of spontaneous interesting conversations, the number of parallel events at any given time, and of course the accumulating exhaustion.

Some interesting talks and posters

Sergey Levine’s robotics demo at the crowded Deep Reinforcement Learning workshop (we showed up half an hour early to claim spots on the floor). This was one of the talks that gave me a sense of fast progress in the field. The presentation started with videos from this summer’s DARPA robotics challenge, where the robots kept falling down while trying to walk or open a door. Levine proceeded to outline his recent work on guided policy search, alternating between trajectory optimization and supervised training of the neural network, and granularizing complex tasks. He showed demos of robots successfully performing various high-dexterity tasks, like opening a door, screwing on a bottle cap, or putting a coat hanger on a rack. Impressive!

Generative image models using a pyramid of adversarial networks by Denton & Chintala. Generating realistic-looking images using one neural net as a generator and another as an evaluator – the generator tries to fool the evaluator by making the image indistinguishable from a real one, while the evaluator tries to tell real and generated images apart. Starting from a coarse image, successively finer images are generated using the adversarial networks from the coarser images at the previous level of the pyramid. The resulting images were mistaken for real images 40% of the time in the experiment, and around 80% of them looked realistic to me when staring at the poster.

Path-SGD by Salakhutdinov et al, a scale-invariant version of the stochastic gradient descent algorithm. Standard SGD uses the L2 norm in as the measure of distance in the parameter space, and rescaling the weights can have large effects on optimization speed. Path-SGD instead regularizes the maximum norm of incoming weights into any unit, minimizing the max-norm over all rescalings of the weights. The resulting norm (called a “path regularizer”) is shown to be invariant to weight rescaling. Overall a principled approach with good empirical results.

End-to-end memory networks by Sukhbaatar et al (video), an extension of memory networks – neural networks that learn to read and write to a memory component. Unlike traditional memory networks, the end-to-end version eliminates the need for supervision at each layer. This makes the method applicable to a wider variety of domains – it is competitive both with memory networks for question answering and with LSTMs for language modeling. It was fun to see the model perform basic inductive reasoning about locations, colors and sizes of objects.

Neural GPUs (video), Deep visual analogy-making (video), On-the-job learning, and many others.

Algorithms Among Us symposium (videos)

A highlight of the conference was the Algorithms Among Us symposium on the societal impacts of machine learning, which I helped organize along with others from FLI. The symposium consisted of 3 panels and accompanying talks – on near-term AI impacts, timelines to general AI, and research priorities for beneficial AI. The symposium organizers (Adrian Weller, Michael Osborne and Murray Shanahan) gathered an impressive array of AI luminaries with a variety of views on the subject, including Cynthia Dwork from Microsoft, Yann LeCun from Facebook, Andrew Ng from Baidu, and Shane Legg from DeepMind. All three panel topics generated lively debate among the participants.

Andrew Ng took his famous statement that “worrying about general AI is like worrying about overpopulation on Mars” to the next level, namely “overpopulation on Alpha Centauri” (is Mars too realistic these days?). His main argument was that even superforecasters can’t predict anything 5 years into the future, so any predictions on longer time horizons are useless. This seemed like an instance of the all-too-common belief that “we don’t know, therefore we are safe”. As Murray pointed out, having complete uncertainty past a 5-year horizon means that you can’t rule out reaching general AI in 20 years either. Encouragingly, Ng endorsed long-term AI safety research, saying that it’s not his cup of tea but someone should be working on it.

With regards to roadmapping the remaining milestones to general AI, Yann LeCun gave an apt analogy of traveling through mountains in the fog – there are some you can see, and an unknown number hiding in the fog. He also argued that advanced AI is unlikely to be human-like, and cautioned against anthropomorphizing it.

In the research priorities panel, Shane Legg gave some specific recommendations – goal system stability, interruptibility, sandboxing / containment, and formalization of various thought experiments (e.g. in Superintelligence). He pointed out that AI safety is both overblown and underemphasized – while the risks from advanced AI are not imminent the way they are usually portrayed in the media, more thought and resources need to be devoted to the challenging research problems involved.

One question that came up during the symposium is the importance of interpretability for AI systems, which is actually the topic of my current research project. There was some disagreement about the tradeoff between effectiveness and interpretability. LeCun thought that the main advantage of interpretability is increased robustness, and improvements to transfer learning should produce that anyway, without decreases in effectiveness. Percy Liang argued that transparency is needed to explain to the rest of the world what machine learning systems are doing, which is increasingly important in many applications. LeCun also pointed out that machine learning systems that are usually considered transparent, such as decision trees, aren’t necessarily so. There was also disagreement about what interpretability means in the first place – as Cynthia Dwork said, we need a clearer definition before making any conclusions. It seems that more work is needed both on defining interpretability and on figuring out how to achieve it without sacrificing effectiveness.

Overall, the symposium was super interesting and gave a lot of food for thought (here’s a more detailed summary by Ariel from FLI). Thanks to Adrian, Michael and Murray for their hard work in putting it together.

AI startups

It was exciting to see two new AI startups announced at NIPS – OpenAI, led by Ilya Sutskever and backed by Musk, Altman and others, and Geometric Intelligence, led by Zoubin Ghahramani and Gary Marcus.

OpenAI is a non-profit with a mission to democratize AI research and keep it beneficial for humanity, and a whopping $1Bn in funding pledged. They believe that it’s safer to have AI breakthroughs happening in a non-profit, unaffected by financial interests, rather than monopolized by for-profit corporations. The intent to open-source the research seems clearly good in the short and medium term, but raises some concerns in the long run when getting closer to general AI. As an OpenAI researcher emphasized in an interview, “we are not obligated to share everything – in that sense the name of the company is a misnomer”, and decisions to open-source the research would in fact be made on a case-by-case basis.

While OpenAI plans to focus on deep learning in their first few years, Geometric Intelligence is developing an alternative approach to deep learning that can learn more effectively from less data. Gary Marcus argues that we need to learn more from how human minds acquire knowledge in order to build advanced AI (an inspiration for the venture was observing his toddler learn about the world). I’m looking forward to what comes out of the variety of approaches taken by these new companies and other research teams.

(Cross-posted on the FLI blog. Thanks to Janos Kramar for his help with editing this post.)

Future of Life Institute’s recent milestones in AI safety

In January, many months of effort by FLI’s founders and volunteers finally came to fruition – the Puerto Rico conference, open letter and grants program announcement took place in rapid succession. The conference was a resounding success according to many of the people there, who were impressed with the quality of the ideas presented and the way it was organized. There were opportunities for the attendees to engage with each other at different levels of structure, from talks to panels to beach breakout groups. The relaxed Caribbean atmosphere seemed to put everyone at ease, and many candid and cooperative conversations happened between attendees with rather different backgrounds and views.

It was fascinating to observe many of the AI researchers get exposed to various AI safety ideas for the first time. Stuart Russell’s argument that the variables that are not accounted for by the objective function tend to be pushed to extreme values, Nick Bostrom’s presentation on takeoff speeds and singleton/multipolar scenarios, and other key ideas were received quite well. One attending researcher summed it up along these lines: “It is so easy to obsess about the next building block towards general AI, that we often forget to ask ourselves a key question – what happens when we succeed?”.

A week after the conference, the open letter outlining the research priorities went public. The letter and research document were the product of many months of hard work and careful thought by Daniel Dewey, Max Tegmark, Stuart Russell, and others. It was worded in optimistic and positive terms – the most negative word in the whole thing was “pitfalls”. Nevertheless, the media’s sensationalist lens twisted the message into things like “experts pledge to rein in AI research” to “warn of a robot uprising” and “protect mankind from machines”, invariably accompanied by a Terminator image or a Skynet reference. When the grants program was announced soon afterwards, the headlines became “Elon Musk donates…” to “keep killer robots at bay”, “keep AI from turning evil”, you name it. Those media portrayals shared a key misconception of the underlying concerns, that AI has to be “malevolent” to be dangerous, while the most likely problematic scenario in our minds is a misspecified general AI system with beneficial or neutral objectives. While a few reasonable journalists actually bothered to get in touch with FLI and discuss the ideas behind our efforts, most of the media herd stampeded ahead under the alarmist Terminator banner.

The open letter expresses a joint effort by the AI research community to step up to the challenge of advancing AI safety as responsible scientists. My main worry about this publicity angle is that this might be the first major exposure to AI safety concerns for many people, including AI researchers who would understandably feel attacked and misunderstood by the media’s framing of their work. It is really unfortunate to have some researchers turned away from the cause of keeping AI beneficial and safe without even engaging with the actual concerns and arguments.

I am sometimes asked by reporters whether there has been too much emphasis on the superintelligence concerns that is “distracting” from the more immediate AI impacts like the automation of jobs and autonomous weapons. While the media hype is certainly not helpful towards making progress on either the near-term or long-term concerns, there is a pervasive false dichotomy here, as both of these domains are in dire need of more extensive research. The near-term economic and legal issues are already on the horizon, while the design and forecasting of general AI is a complex interdisciplinary research challenge that will likely take decades, so it is of utmost importance to begin the work as soon as possible.

The grants program on AI safety, fueled by Elon Musk’s generous donation, is now well under way, with the initial proposals due March 1. The authors of the best initial proposals will be invited to submit a more detailed full proposal by May 17. I hope that our program will help kickstart the emerging subfield of AI safety, stimulate open discussion of the ideas among the AI experts, and broaden the community of researchers working on these important and difficult questions. Stuart Russell put it well in his talk at the Puerto Rico conference: “Solving this problem should be an intrinsic part of the field, just as containment is a part of fusion research. It isn’t ‘Ethics of AI’, it’s common sense!”.