At this year’s EA Global London conference, Jan Leike and I ran a discussion session on the machine learning approach to AI safety. We explored some of the assumptions and considerations that come up as we reflect on different research agendas. Slides for the discussion can be found here.
The discussion focused on two topics. The first topic examined assumptions made by the ML safety approach as a whole, based on the blog post Conceptual issues in AI safety: the paradigmatic gap. The second topic zoomed into specification problems, which both of us work on, and compared our approaches to these problems.
Something I often hear in the machine learning community and media articles is “Worries about superintelligence are a distraction from the *real* problem X that we are facing today with AI” (where X = algorithmic bias, technological unemployment, interpretability, data privacy, etc). This competitive attitude gives the impression that immediate and longer-term safety concerns are in conflict. But is there actually a tradeoff between them?
We can make this question more specific: what resources might these two types of efforts be competing for?
Long-term AI safety is an inherently speculative research area, aiming to ensure safety of advanced future systems despite uncertainty about their design or algorithms or objectives. It thus seems particularly important to have different research teams tackle the problems from different perspectives and under different assumptions. While some fraction of the research might not end up being useful, a portfolio approach makes it more likely that at least some of us will be right.
In this post, I look at some dimensions along which assumptions differ, and identify some underexplored reasonable assumptions that might be relevant for prioritizing safety research. (In the interest of making this breakdown as comprehensive and useful as possible, please let me know if I got something wrong or missed anything important.)
There has been a lot of discussion about the appropriate level of openness in AI research in the past year – the OpenAI announcement, the blog post Should AI Be Open?, a response to the latter, and Nick Bostrom’s thorough paper Strategic Implications of Openness in AI development.
There is disagreement on this question within the AI safety community as well as outside it. Many people are justifiably afraid of concentrating power to create AGI and determine its values in the hands of one company or organization. Many others are concerned about the information hazards of open-sourcing AGI and the resulting potential for misuse. In this post, I argue that some sort of compromise between openness and secrecy will be necessary, as both extremes of complete secrecy and complete openness seem really bad. The good news is that there isn’t a single axis of openness vs secrecy – we can make separate judgment calls for different aspects of AGI development, and develop a set of guidelines.
Among those concerned about risks from advanced AI, I’ve encountered people who would be interested in a career in AI research, but are worried that doing so would speed up AI capability relative to safety. I think it is a mistake for AI safety proponents to avoid going into the field for this reason (better reasons include being well-positioned to do AI safety work, e.g. at MIRI or FHI). This mistake contributed to me choosing statistics rather than computer science for my PhD, which I have some regrets about, though luckily there is enough overlap between the two fields that I can work on machine learning anyway.
I think the value of having more AI experts who are worried about AI safety is far higher than the downside of adding a few drops to the ocean of people trying to advance AI. Here are several reasons for this:
- Concerned researchers can inform and influence their colleagues, especially if they are outspoken about their views.
- Studying and working on AI brings understanding of the current challenges and breakthroughs in the field, which can usefully inform AI safety work (e.g. wireheading in reinforcement learning agents).
- Opportunities to work on AI safety are beginning to spring up within academia and industry, e.g. through FLI grants. In the next few years, it will be possible to do an AI-safety-focused PhD or postdoc in computer science, which would hit two birds with one stone.
“An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind.”
– Computer scientist I. J. Good, 1965
Artificial intelligence systems we have today can be referred to as narrow AI – they perform well at specific tasks, like playing chess or Jeopardy, and some classes of problems like Atari games. Many experts predict that general AI, which would be able to perform most tasks humans can, will be developed later this century, with median estimates around 2050. When people talk about long term existential risk from the development of general AI, they commonly refer to the intelligence explosion (IE) scenario. AI risk skeptics often argue against AI safety concerns along the lines of “Intelligence explosion sounds like science-fiction and seems really unlikely, therefore there’s not much to worry about”. It’s unfortunate when AI safety concerns are rounded down to worries about IE. Unlike I. J. Good, I do not consider this scenario inevitable (though relatively likely), and I would expect general AI to present an existential risk even if I knew for sure that intelligence explosion were impossible.