New AI safety research agenda from Google Brain

Google Brain just released an inspiring research agenda, Concrete Problems in AI Safety, co-authored by researchers from OpenAI, Berkeley and Stanford. This document is a milestone in setting concrete research objectives for keeping reinforcement learning agents and other AI systems robust and beneficial. The problems studied are relevant both to near-term and long-term AI safety, from cleaning robots to higher-stakes applications. The paper takes an empirical focus on avoiding accidents as modern machine learning systems become more and more autonomous and powerful.

Reinforcement learning is currently the most promising framework for building artificial agents – it is thus especially important to develop safety guidelines for this subfield of AI. The research agenda describes a comprehensive (though likely non-exhaustive) set of safety problems, corresponding to where things can go wrong when building AI systems:

  • Mis-specification of the objective function by the human designer. Two common pitfalls when designing objective functions are negative side-effects and reward hacking (also known as wireheading), which are likely to happen by default unless we figure out how to guard against them. One of the key challenges is specifying what it means for an agent to have a low impact on the environment while achieving its objectives effectively.

  • Extrapolation from limited information about the objective function. Even with a correct objective function, human supervision is likely to be costly, which calls for scalable oversight of the artificial agent.

  • Extrapolation from limited training data or using an inadequate model. We need to develop safe exploration strategies that avoid irreversibly bad outcomes, and build models that are robust to distributional shift – able to fail gracefully in situations that are far outside the training data distribution.

The AI research community is increasingly focusing on AI safety in recent years, and Google Brain’s agenda is part of this trend. It follows on the heels of the Safely Interruptible Agents paper from Google DeepMind and the Future of Humanity Institute, which investigates how to avoid unintended consequences from interrupting or shutting down reinforcement learning agents. We at FLI are super excited that industry research labs at Google and OpenAI are spearheading and fostering collaboration on AI safety research, and look forward to the outcomes of this work.

(Cross-posted from the FLI blog.)

Using humility to counteract shame

u0sm9wx“Pride is not the opposite of shame, but its source. True humility is the only antidote to shame.”

Uncle Iroh, “Avatar: The Last Airbender”

 

Shame is one of the trickiest emotions to deal with. It is difficult to think about, not to mention discuss with others, and gives rise to insidious ugh fields and negative spirals. Shame often underlies other negative emotions without making itself apparent – anxiety or anger at yourself can be caused by unacknowledged shame about the possibility of failure. It can stack on top of other emotions – e.g. you start out feeling upset with someone, and end up being ashamed of yourself for feeling upset, and maybe even ashamed of feeling ashamed if meta-shame is your cup of tea. The most useful approach I have found against shame is invoking humility.

What is humility, anyway? It is often defined as a low view of your own importance, and tends to be conflated with modesty. Another common definition that I find more useful is acceptance of your own flaws and shortcomings. This is more compatible with confidence, and helpful irrespective of your level of importance or comparison to other people. What humility feels like to me on a system 1 level is a sense of compassion and warmth towards yourself while fully aware of your imperfections (while focusing on imperfections without compassion can lead to beating yourself up). According to LessWrong, “to be humble is to take specific actions in anticipation of your own errors”, which seems more like a possible consequence of being humble than a definition.

Humility is a powerful tool for psychological well-being and instrumental rationality that is more broadly applicable than just the ability to anticipate errors by seeing your limitations more clearly. I can summon humility when I feel anxious about too many upcoming deadlines, or angry at myself for being stuck on a rock climbing route, or embarrassed about forgetting some basic fact in my field that I am surely expected to know by the 5th year of grad school.

While humility comes naturally to some people, others might find it useful to explicitly build an identity as a humble person. How can you invoke this mindset? One way is through negative visualization or pre-hindsight, considering how your plans could fail, which can be time-consuming and usually requires system 2. A faster and less effortful way is to is to imagine a person, real or fictional, who you consider to be humble. I often bring to mind my grandfather, or Uncle Iroh from the Avatar series, sometimes literally repeating the above quote in my head, sort of like an affirmation. I don’t actually agree that humility is the only antidote to shame, but it does seem to be one of the most effective.

(Cross-posted to LessWrong. Thanks to Janos Kramar for his feedback on this post.)

Introductory resources on AI safety research

At a recent AI safety meetup, people asked for a reading list to get up to speed on the main ideas in the field. The resources are selected for relevance and/or brevity, and the list is not meant to be comprehensive.

Motivation

For a popular audience:

FLI: AI risk background and FAQ. At the bottom of the background page, there is a more extensive list of resources on AI safety.

Tim Urban, Wait But Why: The AI Revolution. An accessible introduction to AI risk forecasts and arguments (with cute hand-drawn diagrams, and a few corrections from Luke Muehlhauser).

GiveWell: Potential risks from advanced artificial intelligence. An overview of AI risks and timelines, possible interventions, and current actors in this space.

Stuart Armstrong. Smarter Than Us: The Rise Of Machine Intelligence. A short ebook discussing potential promises and challenges presented by advanced AI, and the interdisciplinary problems that need to be solved on the way there.

For a more technical audience:

Stuart Russell:

  • The long-term future of AI (longer version). A video of Russell’s classic talk, discussing why it makes sense for AI researchers to think about AI safety, and going over various misconceptions about the issues.
  • Concerns of an AI pioneer. An interview with Russell on the importance of provably aligning AI with human values, and the challenges of value alignment research.
  • On Myths and Moonshine. Russell’s response to the “Myth of AI” question on Edge.org, which draws an analogy between AI research and nuclear research, and points out some dangers of optimizing a misspecified utility function.

Scott Alexander: No time like the present for AI safety work. An overview of long-term AI safety challenges, e.g. preventing wireheading and formalizing ethics.

Victoria Krakovna: AI risk without an intelligence explosion. An overview of long-term AI risks besides the (overemphasized) intelligence explosion / hard takeoff scenario, arguing why intelligence explosion skeptics should still think about AI safety.

Technical overviews

Amodel, Olah et al: Concrete Problems in AI safety

Taylor et al (MIRI): Alightment for Advanced Machine Learning Systems

FLI: A survey of research priorities for robust and beneficial AI

MIRI: Aligning Superintelligence with Human Interests: A Technical Research Agenda

Jacob Steinhardt: Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems. A taxonomy of AI safety issues that require ordinary vs extraordinary engineering to address.

Nate Soares: Safety engineering, target selection, and alignment theory. Identifies and motivates three major areas of AI safety research.

Nick Bostrom: Superintelligence: Paths, Dangers, Strategies. A seminal book outlining long-term AI risk considerations.

Technical work

Steve Omohundro: The basic AI drives. Argues that sufficiently advanced AI systems are likely to develop drives such as self-preservation and resource acquisition independently of their assigned objectives.

Paul Christiano: AI control. A blog on designing safe, efficient AI systems (approval-directed agents, aligned reinforcement learning agents, etc).

MIRI: Corrigibility. Designing AI systems without incentives to resist corrective modifications by their creators.

Laurent Orseau: Wireheading. An investigation into how different types of artificial agents respond to wireheading opportunities (unintended shortcuts to maximize their objective function).

Collections of papers

MIRI publications

FHI publications

If there are any resources missing from this list that you think are a must-read, please let me know! If you want to go into AI safety research, check out these guidelines and the AI Safety Syllabus.

(Thanks to Ben Sancetta, Taymon Beal and Janos Kramar for their feedback on this post.)

To contribute to AI safety, consider doing AI research

Among those concerned about risks from advanced AI, I’ve encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway.

I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:

  1. Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
  2. Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
  3. Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.

To elaborate on #1, one of the prevailing arguments against taking long-term AI safety seriously is that not enough experts in the AI field are worried. Several prominent researchers have commented on the potential risks (Stuart Russell, Bart Selman, Murray Shanahan, Shane Legg, and others), and more are concerned but keep quiet for reputational reasons. An accomplished, strategically outspoken and/or well-connected expert can make a big difference in the attitude distribution in the AI field and the level of familiarity with the actual concerns (which are not about malevolence, sentience, or marching robot armies). Having more informed skeptics who have maybe even read Superintelligence, and fewer uninformed skeptics who think AI safety proponents are afraid of Terminators, would produce much needed direct and productive discussion on these issues. As the proportion of informed and concerned researchers in the field approaches critical mass, the reputational consequences for speaking up will decrease.

A year after FLI’s Puerto Rico conference, the subject of long-term AI safety is no longer taboo among AI researchers, but remains rather controversial. Addressing AI risk on the long term will require safety work to be a significant part of the field, and close collaboration between those working on safety and capability of advanced AI. Stuart Russell makes the apt analogy that “just as nuclear fusion researchers consider the problem of containment of fusion reactions as one of the primary problems of their field, issues of control and safety will become central to AI as the field matures”. If more people who are already concerned about AI safety join the field, we can make this happen faster, and help wisdom win the race with capability.

(Cross-posted to LessWrong. Thanks to Janos Kramar for his help with editing this post.)

2015-16 New Year review

2015 progress

Research:

  • Finished paper on the Selective Bayesian Forest Classifier algorithm
  • Made an R package for SBFC (beta)
  • Worked at Google on unsupervised learning for the Knowledge Graph with Moshe Looks during the summer (paper)
  • Joined the HIPS research group at Harvard CS and started working with the awesome Finale Doshi-Velez
  • Ratio of coding time to writing time was too high overall

FLI:

  • Co-organized two meetings to brainstorm biotechnology risks
  • Co-organized two Machine Learning Safety meetings
  • Gave a talk at the Shaping Humanity’s Trajectory workshop at EA Global
  • Helped organize NIPS symposium on societal impacts of AI

Rationality / effectiveness:

  • Extensive use of FollowUpThen for sending reminders to future selves
  • Mapped out my personal bottlenecks
  • Sleep:
    • Tracked insomnia (26% of nights) and sleep time (average 1:30am, stayed up past 1am on 31% of nights)
    • Started working on sleep hygiene
    • Stopped using melatonin (found it ineffective)

Random cool things I did:

  • Improv class
  • Aerial silks class
  • Climbed out of a glacial abyss (moulin)
  • Placed second at Toastmasters area speech contest

2015 prediction outcomes

Out of the 17 predictions I made a year ago, 5 were true, and the rest were false.

  1. Submit the SBFC paper for publication (95%)
  2. Submit another paper besides SBFC (40%)
  3. Present SBFC results at a conference (JSM, ICML or NIPS) (40%) – presented at a workshop (NESS)
  4. Get a new external fellowship to replace my expiring NSERC fellowship (50%)
  5. Skim at least 20 research papers in machine learning (70%) – probably a lot more
  6. Write at least 12 blog posts (70%) – wrote 9 posts
  7. Climb a 5.12 without rope cheating (50%) – no longer endorsed at this level
  8. Lead climb a 5.11a (50%) – no longer endorsed at this level
  9. Do 10 pullups in a row (60%) – no longer endorsed at this level
  10. Meditate at least 150 times (80%) – 206 times
  11. Record at least 150 new thoughts (70%) – recorded 62, no longer endorsed at this level
  12. Make at least 100 Anki cards by the end of the year (70%)
  13. Read at least 10 books (60%) – read 4 books, no longer endorsed at this level
  14. Attend Burning Man (90%)
  15. Boston will have a second rationalist house by the end of the year (30%)
  16. FLI will hire a full-time project manager or administrator (80%) – no, but we now have a full time website editor…
  17. FLI will start a project on biotech safety (70%) – had some meetings, but no concrete action plan yet

Calibration:

  • low predictions, 30-60%: 0/8 = 0% (super overconfident)
  • high predictions, 70-95%: 5/9 = 56% (overconfident)

(Yikes! Worse than last year…)

Conclusions:

  • I forgot about most of these goals after a few months – will need a recurring reminder for next year.
  • All 3 physical goals ended up disendorsed – I think I set those way too high. My climbing habits got disrupted by moving to California in summer and a hand injury, so I’m still trying to return to my spring 2014 skill level.

2016 goals and predictions

Given the overconfidence of last year’s predictions, toning it down for next year.

Resolutions:

  1. Finish PhD thesis (70%)
  2. Write at least 12 blog posts (40%)
  3. Meditate at least 200 days (50%)
  4. Exercise at least 200 days (50%)
  5. Do at least 5 pullups in a row (40%)
  6. Record at least 50 new thoughts (50%)
  7. Stay up at most 20% of the nights (40%)
  8. Do at least 10 pomodoros per week on average (50%)

Predictions:

  1. At least one paper accepted for publication (70%)
  2. I will get at least one fellowship (40%)
  3. Insomnia at most 20% of nights (20%)
  4. FLI will co-organize at least 3 AI safety workshops (50%)

Highlights and impressions from NIPS conference on machine learning

This year’s NIPS was an epicenter of the current enthusiasm about AI and deep learning – there was a visceral sense of how quickly the field of machine learning is progressing, and two new AI startups were announced. Attendance has almost doubled compared to the 2014 conference (I hope they make it multi-track next year), and several popular workshops were standing room only. Given that there were only 400 accepted papers and almost 4000 people attending, most people were there to learn and socialize. The conference was a socially intense experience that reminded me a bit of Burning Man – the overall sense of excitement, the high density of spontaneous interesting conversations, the number of parallel events at any given time, and of course the accumulating exhaustion.

Some interesting talks and posters

Sergey Levine’s robotics demo at the crowded Deep Reinforcement Learning workshop (we showed up half an hour early to claim spots on the floor). This was one of the talks that gave me a sense of fast progress in the field. The presentation started with videos from this summer’s DARPA robotics challenge, where the robots kept falling down while trying to walk or open a door. Levine proceeded to outline his recent work on guided policy search, alternating between trajectory optimization and supervised training of the neural network, and granularizing complex tasks. He showed demos of robots successfully performing various high-dexterity tasks, like opening a door, screwing on a bottle cap, or putting a coat hanger on a rack. Impressive!

Generative image models using a pyramid of adversarial networks by Denton & Chintala. Generating realistic-looking images using one neural net as a generator and another as an evaluator – the generator tries to fool the evaluator by making the image indistinguishable from a real one, while the evaluator tries to tell real and generated images apart. Starting from a coarse image, successively finer images are generated using the adversarial networks from the coarser images at the previous level of the pyramid. The resulting images were mistaken for real images 40% of the time in the experiment, and around 80% of them looked realistic to me when staring at the poster.

Path-SGD by Salakhutdinov et al, a scale-invariant version of the stochastic gradient descent algorithm. Standard SGD uses the L2 norm in as the measure of distance in the parameter space, and rescaling the weights can have large effects on optimization speed. Path-SGD instead regularizes the maximum norm of incoming weights into any unit, minimizing the max-norm over all rescalings of the weights. The resulting norm (called a “path regularizer”) is shown to be invariant to weight rescaling. Overall a principled approach with good empirical results.

End-to-end memory networks by Sukhbaatar et al (video), an extension of memory networks – neural networks that learn to read and write to a memory component. Unlike traditional memory networks, the end-to-end version eliminates the need for supervision at each layer. This makes the method applicable to a wider variety of domains – it is competitive both with memory networks for question answering and with LSTMs for language modeling. It was fun to see the model perform basic inductive reasoning about locations, colors and sizes of objects.

Neural GPUs (video), Deep visual analogy-making (video), On-the-job learning, and many others.

Algorithms Among Us symposium (videos)

A highlight of the conference was the Algorithms Among Us symposium on the societal impacts of machine learning, which I helped organize along with others from FLI. The symposium consisted of 3 panels and accompanying talks – on near-term AI impacts, timelines to general AI, and research priorities for beneficial AI. The symposium organizers (Adrian Weller, Michael Osborne and Murray Shanahan) gathered an impressive array of AI luminaries with a variety of views on the subject, including Cynthia Dwork from Microsoft, Yann LeCun from Facebook, Andrew Ng from Baidu, and Shane Legg from DeepMind. All three panel topics generated lively debate among the participants.

Andrew Ng took his famous statement that “worrying about general AI is like worrying about overpopulation on Mars” to the next level, namely “overpopulation on Alpha Centauri” (is Mars too realistic these days?). His main argument was that even superforecasters can’t predict anything 5 years into the future, so any predictions on longer time horizons are useless. This seemed like an instance of the all-too-common belief that “we don’t know, therefore we are safe”. As Murray pointed out, having complete uncertainty past a 5-year horizon means that you can’t rule out reaching general AI in 20 years either. Encouragingly, Ng endorsed long-term AI safety research, saying that it’s not his cup of tea but someone should be working on it.

With regards to roadmapping the remaining milestones to general AI, Yann LeCun gave an apt analogy of traveling through mountains in the fog – there are some you can see, and an unknown number hiding in the fog. He also argued that advanced AI is unlikely to be human-like, and cautioned against anthropomorphizing it.

In the research priorities panel, Shane Legg gave some specific recommendations – goal system stability, interruptibility, sandboxing / containment, and formalization of various thought experiments (e.g. in Superintelligence). He pointed out that AI safety is both overblown and underemphasized – while the risks from advanced AI are not imminent the way they are usually portrayed in the media, more thought and resources need to be devoted to the challenging research problems involved.

One question that came up during the symposium is the importance of interpretability for AI systems, which is actually the topic of my current research project. There was some disagreement about the tradeoff between effectiveness and interpretability. LeCun thought that the main advantage of interpretability is increased robustness, and improvements to transfer learning should produce that anyway, without decreases in effectiveness. Percy Liang argued that transparency is needed to explain to the rest of the world what machine learning systems are doing, which is increasingly important in many applications. LeCun also pointed out that machine learning systems that are usually considered transparent, such as decision trees, aren’t necessarily so. There was also disagreement about what interpretability means in the first place – as Cynthia Dwork said, we need a clearer definition before making any conclusions. It seems that more work is needed both on defining interpretability and on figuring out how to achieve it without sacrificing effectiveness.

Overall, the symposium was super interesting and gave a lot of food for thought (here’s a more detailed summary by Ariel from FLI). Thanks to Adrian, Michael and Murray for their hard work in putting it together.

AI startups

It was exciting to see two new AI startups announced at NIPS – OpenAI, led by Ilya Sutskever and backed by Musk, Altman and others, and Geometric Intelligence, led by Zoubin Ghahramani and Gary Marcus.

OpenAI is a non-profit with a mission to democratize AI research and keep it beneficial for humanity, and a whopping $1Bn in funding pledged. They believe that it’s safer to have AI breakthroughs happening in a non-profit, unaffected by financial interests, rather than monopolized by for-profit corporations. The intent to open-source the research seems clearly good in the short and medium term, but raises some concerns in the long run when getting closer to general AI. As an OpenAI researcher emphasized in an interview, “we are not obligated to share everything – in that sense the name of the company is a misnomer”, and decisions to open-source the research would in fact be made on a case-by-case basis.

While OpenAI plans to focus on deep learning in their first few years, Geometric Intelligence is developing an alternative approach to deep learning that can learn more effectively from less data. Gary Marcus argues that we need to learn more from how human minds acquire knowledge in order to build advanced AI (an inspiration for the venture was observing his toddler learn about the world). I’m looking forward to what comes out of the variety of approaches taken by these new companies and other research teams.

(Cross-posted on the FLI blog. Thanks to Janos Kramar for his help with editing this post.)

Risks from general artificial intelligence without an intelligence explosion

“An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind.”

– Computer scientist I. J. Good, 1965

Artificial intelligence systems we have today can be referred to as narrow AI – they perform well at specific tasks, like playing chess or Jeopardy, and some classes of problems like Atari games. Many experts predict that general AI, which would be able to perform most tasks humans can, will be developed later this century, with median estimates around 2050. When people talk about long term existential risk from the development of general AI, they commonly refer to the intelligence explosion (IE) scenario. AI risk skeptics often argue against AI safety concerns along the lines of “Intelligence explosion sounds like science-fiction and seems really unlikely, therefore there’s not much to worry about”. It’s unfortunate when AI safety concerns are rounded down to worries about IE. Unlike I. J. Good, I do not consider this scenario inevitable (though relatively likely), and I would expect general AI to present an existential risk even if I knew for sure that intelligence explosion were impossible.

Here are some dangerous aspects of developing general AI, besides the IE scenario:

  1. Human incentives. Researchers, companies and governments have professional and economic incentives to build AI that is as powerful as possible, as quickly as possible. There is no particular reason to think that humans are the pinnacle of intelligence – if we create a system without our biological constraints, with more computing power, memory, and speed, it could become more intelligent than us in important ways. The incentives are to continue improving AI systems until they hit physical limits on intelligence, and those limitations (if they exist at all) are likely to be above human intelligence in many respects.
  2. Convergent instrumental goals. Sufficiently advanced AI systems would by default develop drives like self-preservation, resource acquisition, and preservation of their objective functions, independent of their objective function or design. This was outlined in Omohundro’s paper and more concretely formalized in a recent MIRI paper. Humans routinely destroy animal habitats to acquire natural resources, and an AI system with any goal could always use more data centers or computing clusters.
  3. Unintended consequences. As in the stories of Sorcerer’s Apprentice and King Midas, you get what you asked for, but not what you wanted. This already happens with narrow AI, like in the frequently cited example from the Bird & Layzell paper: a genetic algorithm was supposed to design an oscillator using a configurable circuit board, and instead designed a makeshift radio that used signal from neighboring computers to produce the requisite oscillating pattern. Unintended consequences produced by a general AI, more opaque and more powerful than a narrow AI, would likely be far worse.
  4. Value learning is hard. Specifying common sense and ethics in computer code is no easy feat. As argued by Stuart Russell, given a misspecified value function that omits variables that turn out to be important to humans, an optimization process is likely to set these unconstrained variables to extreme values. Think of what would happen if you asked a self-driving car to get you to the airport as fast as possible, without assigning value to obeying speed limits or avoiding pedestrians. While researchers would have incentives to build in some level of common sense and understanding of human concepts that is needed for commercial applications like household robots, that might not be enough for general AI.
  5. Value learning is insufficient. Even an AI system with perfect understanding of human values and goals would not necessarily adopt them. Humans understand the “goals” of the evolutionary process that generated us, but don’t internalize them – in fact, we often “wirehead” our evolutionary reward signals, e.g. by eating sugar.
  6. Containment is hard. A general AI system with access to the internet would be able to hack thousands of computers and copy itself onto them, thus becoming difficult or impossible to shut down – this is a serious problem even with present-day computer viruses. When developing an AI system in the vicinity of general intelligence, it would be important to keep it cut off from the internet. Large scale AI systems are likely to be run on a computing cluster or on the cloud, rather than on a single machine, which makes isolation from the internet more difficult. Containment measures would likely pose sufficient inconvenience that many researchers would be tempted to skip them.

Some believe that if intelligence explosion does not occur, AI progress will occur slowly enough that humans can stay in control. Given that human institutions like academia or governments are fairly slow to respond to change, they may not be able to keep up with an AI that attains human-level or superhuman intelligence over months or even years. Humans are not famous for their ability to solve coordination problems. Even if we retain control over AI’s rate of improvement, it would be easy for bad actors or zealous researchers to let it go too far – as Geoff Hinton recently put it, “the prospect of discovery is too sweet”.

As a machine learning researcher, I care about whether my field will have a positive impact on humanity in the long term. The challenges of AI safety are numerous and complex (for a more technical and thorough exposition, see Jacob Steinhardt’s essay), and cannot be rounded off to a single scenario. I look forward to a time when disagreements about AI safety no longer derail into debates about IE, and instead focus on other relevant issues we need to figure out.

(Cross-posted on the FLI blog. Thanks to Janos Kramar for his help with editing this post.)