Specification gaming examples in AI

Various examples (and lists of examples) of unintended behaviors in AI systems have appeared in recent years. One interesting type of unintended behavior is finding a way to game the specified objective: generating a solution that literally satisfies the stated objective but fails to solve the problem according to the human designer’s intent. This occurs when the objective is poorly specified, and includes reinforcement learning agents hacking the reward function, evolutionary algorithms gaming the fitness function, etc.

While ‘specification gaming’ is a somewhat vague category, it is particularly referring to behaviors that are clearly hacks, not just suboptimal solutions. A classic example is OpenAI’s demo of a reinforcement learning agent in a boat racing game going in circles and repeatedly hitting the same reward targets instead of actually playing the game.

coast_runners

Since such examples are currently scattered across several lists, I have put together a master list of examples collected from the various existing sources. This list is intended to be comprehensive and up-to-date, and serve as a resource for AI safety research and discussion. If you know of any interesting examples of specification gaming that are missing from the list, please submit them through this form.

Thanks to Gwern Branwen, Catherine Olsson, Alex Irpan, and others for collecting and contributing examples!

Advertisements

7 thoughts on “Specification gaming examples in AI

  1. Stuart Russell

    The notion of “gaming” and “hack” suggests the AI system knows the user’s intent but decides to violate it anyway by sticking to the letter of the objective function. I think that this is likely to be misleading for the lay person. Instead, we should think of these as errors in specifying the objective, period.

    Liked by 3 people

    Reply
    1. Victoria Krakovna Post author

      Thanks Stuart! I certainly agree that these behaviors are caused by errors in specifying the objective (I’ve added a sentence in the post to clarify this). Gaming / hacking by humans is similarly caused by poorly designed incentive systems.

      I see your point that “gaming” can be interpreted as understanding the designer’s intent but deciding to violate it anyway, though I’m not sure it has to be interpreted that way. For example, schoolchildren who are optimizing for grades might not realize that they are not satisfying the intended objective of school.

      Do you have a better term in mind for these sorts of degenerate behaviors that completely fail to satisfy the intended objective? Maybe something like “shortcuts” or “literal solutions”?

      Like

      Reply
  2. tdietterich

    These are essentially programming bugs where the programmer did not set up the optimization problem properly. There are many lists online of typically programming errors (and advice on how to avoid them). See, for example, https://www.iiitd.edu.in/~jalote/papers/CommonBugs.pdf. Similarly, there are online resources for learning how to correctly formulate optimization problems for standard linear and integer programming packages (e.g., CPLEX and Gurobi). See for example, https://pubsonline.informs.org/doi/pdf/10.1287/ited.7.2.153.

    It is interesting to ask why these optimization errors are qualitatively different. Here are two thoughts. First, these problems are not expressed in a standard high level optimization framework like CPLEX. This can lead to problems with incomplete sandboxing of the optimizer (so that it is allowed to access parts of the environment that it should not be able to touch). Second, specifying the objective in terms of rewards may be a bad programming language. Many of the errors result from incorrect rewards that were added to “help” the learner. Maybe there are better ways to specify the desired behavior than to use reward functions?

    Our field is still learning how to formulate problems well, and this list will be very useful for this purpose. As we go forward, I hope we will create better tools for debugging our optimizations and for monitoring their behavior.

    Liked by 2 people

    Reply
  3. Pingback: Measuring and avoiding side effects using relative reachability | Deep Safety

  4. Stephen Mason

    This is interesting from the perspective of trying to prove something in legal proceedings. I am a barrister and have written an extensive chapter (chapter 6) on how software code has been instrumental in injuring and killing people (Electronic Evidence, Stephen Mason and Daniel Seng, editors (4th edn, Institute of Advanced Legal Studies for the SAS Humanities Digital Library, School of Advanced Study, University of London, 2017) – open source at http://ials.sas.ac.uk/digital/humanities-digital-library/observing-law-ials-open-book-service-law/electronic-evidence).
    I also wrote a paper on AI recently: ‘Artificial intelligence: Oh really? And why judges and lawyers are central to the way we live now – but they don’t know it’, Computer and Telecommunications Law Review, 2017, Volume 23, Issue 8, 213-225 (available on Westlaw for those with access – most university libraries have a subscription).
    As a lay person in all of this, I am worried by the common law legal presumption that computers are reliable. A presumption can be challenged, but you need to have a good reason for challenging the other side hen they say their computer system benefits from the presumption. Yet software licences always contain a clause along the lines of the following: ‘The Licensee acknowledges that software in general is not error free and agrees that the existence of such errors shall not constitute a breach of this Licence.’
    The law tells us that programmers are perfect, and contract lawyers insert a clause in software licences to indicate software always has faults, which is nearer the truth.
    If you are the opposing party to the presumption that a computer is reliable, life is incredibly difficult.
    With the list of unintended behaviours in AI systems (for which I thank you), what is your opinion (and that of any other contributor) about how to change the minds of the judges and lawyers? I have been trying for years, unsuccessfully.
    Stephen Mason

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s