Update: for a more detailed introduction to specification gaming, check out the DeepMind Safety Research blog post!
Various examples (and lists of examples) of unintended behaviors in AI systems have appeared in recent years. One interesting type of unintended behavior is finding a way to game the specified objective: generating a solution that literally satisfies the stated objective but fails to solve the problem according to the human designer’s intent. This occurs when the objective is poorly specified, and includes reinforcement learning agents hacking the reward function, evolutionary algorithms gaming the fitness function, etc.
While ‘specification gaming’ is a somewhat vague category, it is particularly referring to behaviors that are clearly hacks, not just suboptimal solutions. A classic example is OpenAI’s demo of a reinforcement learning agent in a boat racing game going in circles and repeatedly hitting the same reward targets instead of actually playing the game.
Since such examples are currently scattered across several lists, I have put together a master list of examples collected from the various existing sources. This list is intended to be comprehensive and up-to-date, and serve as a resource for AI safety research and discussion. If you know of any interesting examples of specification gaming that are missing from the list, please submit them through this form.
Thanks to Gwern Branwen, Catherine Olsson, Alex Irpan, and others for collecting and contributing examples!
The notion of “gaming” and “hack” suggests the AI system knows the user’s intent but decides to violate it anyway by sticking to the letter of the objective function. I think that this is likely to be misleading for the lay person. Instead, we should think of these as errors in specifying the objective, period.
LikeLiked by 3 people
Thanks Stuart! I certainly agree that these behaviors are caused by errors in specifying the objective (I’ve added a sentence in the post to clarify this). Gaming / hacking by humans is similarly caused by poorly designed incentive systems.
I see your point that “gaming” can be interpreted as understanding the designer’s intent but deciding to violate it anyway, though I’m not sure it has to be interpreted that way. For example, schoolchildren who are optimizing for grades might not realize that they are not satisfying the intended objective of school.
Do you have a better term in mind for these sorts of degenerate behaviors that completely fail to satisfy the intended objective? Maybe something like “shortcuts” or “literal solutions”?
“Monkey Paw” problems?
LikeLiked by 1 person
This is related to the economics term Perverse Incentive
Norbert Wiener warned of this in 1950’s: he called it the dangers of literal minded machines. he explicitly used the Monkey’s Paw story to make the point. I apply it to autonomy today: “Literal-mindedness creates the risk that a system can’t tell if its model of the world is the world it is actually in (Wiener, 1950). As a result, the system will do the right thing [in the sense that the actions are appropriate given its model of the world], when it is in a different world [producing quite unintended and potentially harmful effects]. This pattern underlies all of the coordination breakdowns between people and automation.” chapter 11, p. 157 Woods and Hollnagel 2006 Joint Cognitive Systems: Patterns. https://www.researchgate.net/publication/284173496_Chapter_11_On_People_and_Computers_in_JCSs_at_Work
These are essentially programming bugs where the programmer did not set up the optimization problem properly. There are many lists online of typically programming errors (and advice on how to avoid them). See, for example, https://www.iiitd.edu.in/~jalote/papers/CommonBugs.pdf. Similarly, there are online resources for learning how to correctly formulate optimization problems for standard linear and integer programming packages (e.g., CPLEX and Gurobi). See for example, https://pubsonline.informs.org/doi/pdf/10.1287/ited.7.2.153.
It is interesting to ask why these optimization errors are qualitatively different. Here are two thoughts. First, these problems are not expressed in a standard high level optimization framework like CPLEX. This can lead to problems with incomplete sandboxing of the optimizer (so that it is allowed to access parts of the environment that it should not be able to touch). Second, specifying the objective in terms of rewards may be a bad programming language. Many of the errors result from incorrect rewards that were added to “help” the learner. Maybe there are better ways to specify the desired behavior than to use reward functions?
Our field is still learning how to formulate problems well, and this list will be very useful for this purpose. As we go forward, I hope we will create better tools for debugging our optimizations and for monitoring their behavior.
LikeLiked by 2 people
Reblogged this on Remove The End Justifies The Means! and commented:
errors in specifying the objective
Pingback: Measuring and avoiding side effects using relative reachability | Deep Safety
This is interesting from the perspective of trying to prove something in legal proceedings. I am a barrister and have written an extensive chapter (chapter 6) on how software code has been instrumental in injuring and killing people (Electronic Evidence, Stephen Mason and Daniel Seng, editors (4th edn, Institute of Advanced Legal Studies for the SAS Humanities Digital Library, School of Advanced Study, University of London, 2017) – open source at http://ials.sas.ac.uk/digital/humanities-digital-library/observing-law-ials-open-book-service-law/electronic-evidence).
I also wrote a paper on AI recently: ‘Artificial intelligence: Oh really? And why judges and lawyers are central to the way we live now – but they don’t know it’, Computer and Telecommunications Law Review, 2017, Volume 23, Issue 8, 213-225 (available on Westlaw for those with access – most university libraries have a subscription).
As a lay person in all of this, I am worried by the common law legal presumption that computers are reliable. A presumption can be challenged, but you need to have a good reason for challenging the other side hen they say their computer system benefits from the presumption. Yet software licences always contain a clause along the lines of the following: ‘The Licensee acknowledges that software in general is not error free and agrees that the existence of such errors shall not constitute a breach of this Licence.’
The law tells us that programmers are perfect, and contract lawyers insert a clause in software licences to indicate software always has faults, which is nearer the truth.
If you are the opposing party to the presumption that a computer is reliable, life is incredibly difficult.
With the list of unintended behaviours in AI systems (for which I thank you), what is your opinion (and that of any other contributor) about how to change the minds of the judges and lawyers? I have been trying for years, unsuccessfully.
LikeLiked by 1 person
Pingback: The Naughty AIs That Gamed The System | Hackaday
Pingback: A catalog of creative cheats evolved by means of machine-learning programs / Boing Boing - Breaking News, CNN, BBC, Nairaland.com
Pingback: How machine learning systems sometimes surprise us – TechCrunch
Pingback: AIs Are Getting Better At Playing Video Games ... By Cheating | Kotaku Australia
Pingback: AI, it turns out, can solve any problem | Mind Matters
Pingback: Monthly Links | Zen Mischief
Pingback: Dispute over reaction prediction puts machine learning’s pitfalls in spotlight | Research – Science Present
Pingback: 2018-19 New Year review | Victoria Krakovna
Pingback: Tweehonder dollar – Ionica Smeets
Pingback: The case that AI threatens humanity, explained in 500 words – The Real News Nowadays
Pingback: Learning preferences by looking at the world – My Blog
Pingback: I, Black Box: Explainable Artificial Intelligence and the Limits of Human Deliberative Processes
Pingback: The problem with the trolley problem – Yakanak News
Pingback: AN #69 Stuart Russell 新书-为何我们需要替换人工智能标准模型？ – AGI BAT
Pingback: AN #75 用学到的游戏模型解决 Atari 和围棋问题以及一位 MIRI 成员的想法 – AGI BAT
Pingback: Retrospective on the specification gaming examples list | Victoria Krakovna
Pingback: Artificial intelligence as a central banker | VOX, CEPR Policy Portal – voxeu.org - AI+ NEWS
Pingback: Os hackers de IA que estão chegando – Neotel Segurança Digital
Pingback: BASALT: A Benchmark for Learning from Human Feedback - MKAI
Pingback: BASALT: A Benchmark for Learning from Human Feedback – Robotics Content & News
Pingback: Drugs, robots and the pursuit of pleasure – why experts are worried about AIs becoming addicts - Times News UK
Pingback: AIs could become reward junkies — and experts are worried - w3techy
Pingback: The Role of Cooperation in Responsible AI Development – Own Your AI