At a recent AI safety meetup, people asked for a reading list to get up to speed on the main ideas in the field. The resources are selected for relevance and/or brevity, and the list is not meant to be comprehensive.
For a popular audience:
GiveWell: Potential risks from advanced artificial intelligence. An overview of AI risks and timelines, possible interventions, and current actors in this space.
Stuart Armstrong. Smarter Than Us: The Rise Of Machine Intelligence. A short ebook discussing potential promises and challenges presented by advanced AI, and the interdisciplinary problems that need to be solved on the way there.
For a more technical audience:
- The long-term future of AI (longer version). A video of Russell’s classic talk, discussing why it makes sense for AI researchers to think about AI safety, and going over various misconceptions about the issues.
- Concerns of an AI pioneer. An interview with Russell on the importance of provably aligning AI with human values, and the challenges of value alignment research.
- On Myths and Moonshine. Russell’s response to the “Myth of AI” question on Edge.org, which draws an analogy between AI research and nuclear research, and points out some dangers of optimizing a misspecified utility function.
Scott Alexander: No time like the present for AI safety work. An overview of long-term AI safety challenges, e.g. preventing wireheading and formalizing ethics.
Victoria Krakovna: AI risk without an intelligence explosion. An overview of long-term AI risks besides the (overemphasized) intelligence explosion / hard takeoff scenario, arguing why intelligence explosion skeptics should still think about AI safety.
Amodel, Olah et al: Concrete Problems in AI safety
Taylor et al (MIRI): Alightment for Advanced Machine Learning Systems
Jacob Steinhardt: Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems. A taxonomy of AI safety issues that require ordinary vs extraordinary engineering to address.
Nate Soares: Safety engineering, target selection, and alignment theory. Identifies and motivates three major areas of AI safety research.
Nick Bostrom: Superintelligence: Paths, Dangers, Strategies. A seminal book outlining long-term AI risk considerations.
Steve Omohundro: The basic AI drives. Argues that sufficiently advanced AI systems are likely to develop drives such as self-preservation and resource acquisition independently of their assigned objectives.
Paul Christiano: AI control. A blog on designing safe, efficient AI systems (approval-directed agents, aligned reinforcement learning agents, etc).
MIRI: Corrigibility. Designing AI systems without incentives to resist corrective modifications by their creators.
Laurent Orseau: Wireheading. An investigation into how different types of artificial agents respond to wireheading opportunities (unintended shortcuts to maximize their objective function).
Collections of papers
(Thanks to Ben Sancetta, Taymon Beal and Janos Kramar for their feedback on this post.)