Blog posts

Specification gaming: the flip side of AI ingenuity. Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg. DeepMind Blog, April 2020 (cross-posted to DeepMind Safety Research Blog, Alignment Forum).

Designing agent incentives to avoid side effects. Victoria Krakovna, Ramana Kumar, Laurent Orseau, Alexander Turner. DeepMind Safety Research Blog, March 2019.

Specifying AI safety problems in simple environments. Jan Leike, Victoria Krakovna, Laurent Orseau. DeepMind Blog, November 2017.


Avoiding Side Effects By Considering Future Tasks. Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg. NeurIPS workshop on Safety and Robustness in Decision Making. December 2019.

Modeling AGI Safety Frameworks with Causal Influence Diagrams. Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg. IJCAI AI Safety workshop. June 2019.

Penalizing Side Effects Using Stepwise Relative Reachability. Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg. IJCAI AI Safety workshop. February 2019 (version 2), June 2018 (version 1). (arXiv, version 2 blog postversion 1 blog post, code)

AI Safety Gridworlds. Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg. November 2017. (arXiv, blog post, code)

Reinforcement Learning with a Corrupted Reward Channel. Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg. IJCAI AI and Autonomy track. May 2017. (arXiv, demo, code)

Building Interpretable Models: From Bayesian Networks to Neural Networks. Viktoriya Krakovna (PhD thesis). September 2016.

Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. Viktoriya Krakovna, Finale Doshi-Velez.

Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests. Viktoriya Krakovna, Chenguang Dai, Jun S. Liu. Statistics and Its Interface, Volume 11 Number 3. September 2018. (arXiv (older version)R packagecode)

A Minimalistic Approach to Sum-Product Network Learning for Real Applications. Viktoriya Krakovna, Moshe Looks. International Conference for Learning Representations (ICLR) workshop track. May 2016. (arXiv, OpenReview, poster)

A generalized-zero-preserving method for compact encoding of concept lattices. Matthew Skala, Victoria Krakovna, Janos Kramar, Gerald Penn. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1512–1521, Uppsala, Sweden. July 2010.