Specification gaming examples in AI

Update: for a more detailed introduction to specification gaming, check out the DeepMind Safety Research blog post!

Various examples (and lists of examples) of unintended behaviors in AI systems have appeared in recent years. One interesting type of unintended behavior is finding a way to game the specified objective: generating a solution that literally satisfies the stated objective but fails to solve the problem according to the human designer’s intent. This occurs when the objective is poorly specified, and includes reinforcement learning agents hacking the reward function, evolutionary algorithms gaming the fitness function, etc.

While ‘specification gaming’ is a somewhat vague category, it is particularly referring to behaviors that are clearly hacks, not just suboptimal solutions. A classic example is OpenAI’s demo of a reinforcement learning agent in a boat racing game going in circles and repeatedly hitting the same reward targets instead of actually playing the game.


Since such examples are currently scattered across several lists, I have put together a master list of examples collected from the various existing sources. This list is intended to be comprehensive and up-to-date, and serve as a resource for AI safety research and discussion. If you know of any interesting examples of specification gaming that are missing from the list, please submit them through this form.

Thanks to Gwern Branwen, Catherine Olsson, Alex Irpan, and others for collecting and contributing examples!

32 thoughts on “Specification gaming examples in AI

  1. Stuart Russell

    The notion of “gaming” and “hack” suggests the AI system knows the user’s intent but decides to violate it anyway by sticking to the letter of the objective function. I think that this is likely to be misleading for the lay person. Instead, we should think of these as errors in specifying the objective, period.

    Liked by 3 people

    1. Victoria Krakovna Post author

      Thanks Stuart! I certainly agree that these behaviors are caused by errors in specifying the objective (I’ve added a sentence in the post to clarify this). Gaming / hacking by humans is similarly caused by poorly designed incentive systems.

      I see your point that “gaming” can be interpreted as understanding the designer’s intent but deciding to violate it anyway, though I’m not sure it has to be interpreted that way. For example, schoolchildren who are optimizing for grades might not realize that they are not satisfying the intended objective of school.

      Do you have a better term in mind for these sorts of degenerate behaviors that completely fail to satisfy the intended objective? Maybe something like “shortcuts” or “literal solutions”?


    2. David Woods

      Norbert Wiener warned of this in 1950’s: he called it the dangers of literal minded machines. he explicitly used the Monkey’s Paw story to make the point. I apply it to autonomy today: “Literal-mindedness creates the risk that a system can’t tell if its model of the world is the world it is actually in (Wiener, 1950). As a result, the system will do the right thing [in the sense that the actions are appropriate given its model of the world], when it is in a different world [producing quite unintended and potentially harmful effects]. This pattern underlies all of the coordination breakdowns between people and automation.” chapter 11, p. 157 Woods and Hollnagel 2006 Joint Cognitive Systems: Patterns. https://www.researchgate.net/publication/284173496_Chapter_11_On_People_and_Computers_in_JCSs_at_Work


  2. tdietterich

    These are essentially programming bugs where the programmer did not set up the optimization problem properly. There are many lists online of typically programming errors (and advice on how to avoid them). See, for example, https://www.iiitd.edu.in/~jalote/papers/CommonBugs.pdf. Similarly, there are online resources for learning how to correctly formulate optimization problems for standard linear and integer programming packages (e.g., CPLEX and Gurobi). See for example, https://pubsonline.informs.org/doi/pdf/10.1287/ited.7.2.153.

    It is interesting to ask why these optimization errors are qualitatively different. Here are two thoughts. First, these problems are not expressed in a standard high level optimization framework like CPLEX. This can lead to problems with incomplete sandboxing of the optimizer (so that it is allowed to access parts of the environment that it should not be able to touch). Second, specifying the objective in terms of rewards may be a bad programming language. Many of the errors result from incorrect rewards that were added to “help” the learner. Maybe there are better ways to specify the desired behavior than to use reward functions?

    Our field is still learning how to formulate problems well, and this list will be very useful for this purpose. As we go forward, I hope we will create better tools for debugging our optimizations and for monitoring their behavior.

    Liked by 2 people

  3. Pingback: Measuring and avoiding side effects using relative reachability | Deep Safety

  4. Stephen Mason

    This is interesting from the perspective of trying to prove something in legal proceedings. I am a barrister and have written an extensive chapter (chapter 6) on how software code has been instrumental in injuring and killing people (Electronic Evidence, Stephen Mason and Daniel Seng, editors (4th edn, Institute of Advanced Legal Studies for the SAS Humanities Digital Library, School of Advanced Study, University of London, 2017) – open source at http://ials.sas.ac.uk/digital/humanities-digital-library/observing-law-ials-open-book-service-law/electronic-evidence).
    I also wrote a paper on AI recently: ‘Artificial intelligence: Oh really? And why judges and lawyers are central to the way we live now – but they don’t know it’, Computer and Telecommunications Law Review, 2017, Volume 23, Issue 8, 213-225 (available on Westlaw for those with access – most university libraries have a subscription).
    As a lay person in all of this, I am worried by the common law legal presumption that computers are reliable. A presumption can be challenged, but you need to have a good reason for challenging the other side hen they say their computer system benefits from the presumption. Yet software licences always contain a clause along the lines of the following: ‘The Licensee acknowledges that software in general is not error free and agrees that the existence of such errors shall not constitute a breach of this Licence.’
    The law tells us that programmers are perfect, and contract lawyers insert a clause in software licences to indicate software always has faults, which is nearer the truth.
    If you are the opposing party to the presumption that a computer is reliable, life is incredibly difficult.
    With the list of unintended behaviours in AI systems (for which I thank you), what is your opinion (and that of any other contributor) about how to change the minds of the judges and lawyers? I have been trying for years, unsuccessfully.
    Stephen Mason

    Liked by 1 person

  5. Pingback: The Naughty AIs That Gamed The System | Hackaday

  6. Pingback: A catalog of creative cheats evolved by means of machine-learning programs / Boing Boing - Breaking News, CNN, BBC, Nairaland.com

  7. Pingback: How machine learning systems sometimes surprise us – TechCrunch

  8. Pingback: AIs Are Getting Better At Playing Video Games ... By Cheating | Kotaku Australia

  9. Pingback: AI, it turns out, can solve any problem | Mind Matters

  10. Pingback: Monthly Links | Zen Mischief

  11. Pingback: Dispute over reaction prediction puts machine learning’s pitfalls in spotlight | Research – Science Present

  12. Pingback: 2018-19 New Year review | Victoria Krakovna

  13. Pingback: Tweehonder dollar – Ionica Smeets

  14. Pingback: The case that AI threatens humanity, explained in 500 words – The Real News Nowadays

  15. Pingback: Learning preferences by looking at the world – My Blog

  16. Pingback: I, Black Box: Explainable Artificial Intelligence and the Limits of Human Deliberative Processes

  17. Pingback: The problem with the trolley problem – Yakanak News

  18. Pingback: AN #69 Stuart Russell 新书-为何我们需要替换人工智能标准模型? – AGI BAT

  19. Pingback: AN #75 用学到的游戏模型解决 Atari 和围棋问题以及一位 MIRI 成员的想法 – AGI BAT

  20. Pingback: Retrospective on the specification gaming examples list | Victoria Krakovna

  21. Pingback: Artificial intelligence as a central banker | VOX, CEPR Policy Portal – voxeu.org - AI+ NEWS

  22. Pingback: Os hackers de IA que estão chegando – Neotel Segurança Digital

  23. Pingback: BASALT: A Benchmark for Learning from Human Feedback - MKAI

  24. Pingback: BASALT: A Benchmark for Learning from Human Feedback – Robotics Content & News

  25. Pingback: Drugs, robots and the pursuit of pleasure – why experts are worried about AIs becoming addicts - Times News UK

  26. Pingback: AIs could become reward junkies — and experts are worried - w3techy

  27. Pingback: The Role of Cooperation in Responsible AI Development – Own Your AI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s