How undesired goals can arise with correct rewards. Rohin Shah, Victoria Krakovna, Vikrant Varma, Zachary Kenton. DeepMind Blog, Oct 2022 (more detailed post at DeepMind Safety Research Blog).
ELK contest submission: route understanding through the human ontology. Received a prize in the category “Train a reporter that is useful to an auxiliary AI“.
Optimization concepts in the Game of Life. Victoria Krakovna and Ramana Kumar. Alignment Forum, October 2021.
Specification gaming: the flip side of AI ingenuity. Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik, Matthew Rahtz, Tom Everitt, Ramana Kumar, Zac Kenton, Jan Leike, Shane Legg. DeepMind Blog, April 2020 (cross-posted to DeepMind Safety Research Blog, Alignment Forum). (AN summary)
Specifying AI safety problems in simple environments. Jan Leike, Victoria Krakovna, Laurent Orseau. DeepMind Blog, November 2017.
Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals. Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton. Arxiv, Oct 2022.
Avoiding Side Effects By Considering Future Tasks. Victoria Krakovna, Laurent Orseau, Richard Ngo, Miljan Martic, Shane Legg. Neural Information Processing Systems, December 2020. (arXiv, code, AN summary)
Avoiding Tampering Incentives in Deep RL via Decoupled Approval. Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg. ArXiv, November 2020. (blog post, AN summary)
Modeling AGI Safety Frameworks with Causal Influence Diagrams. Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg. IJCAI AI Safety workshop, June 2019. (AN summary)
Penalizing Side Effects Using Stepwise Relative Reachability. Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg. IJCAI AI Safety workshop, February 2019 (version 2), June 2018 (version 1). (arXiv, version 2 blog post, version 1 blog post, code, AN summary of version 1)
Reinforcement Learning with a Corrupted Reward Channel. Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg. IJCAI AI and Autonomy track, May 2017. (arXiv, demo, code)
Building Interpretable Models: From Bayesian Networks to Neural Networks. Victoria Krakovna (PhD thesis). September 2016.
Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. Victoria Krakovna, Finale Doshi-Velez.
- International Conference on Machine Learning (ICML) Workshop on Human Interpretability in Machine Learning (WHI), June 2016. (arXiv)
- Neural Information Processing Systems Workshop on Intepretable Machine Learning for Complex Systems, Dec 2016. (arXiv, poster)
Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests. Victoria Krakovna, Chenguang Dai, Jun S. Liu. Statistics and Its Interface, Volume 11 Number 3, September 2018. (arXiv (older version), R package, code)
A Minimalistic Approach to Sum-Product Network Learning for Real Applications. Victoria Krakovna, Moshe Looks. International Conference for Learning Representations (ICLR) workshop track, May 2016. (arXiv, OpenReview, poster)
A generalized-zero-preserving method for compact encoding of concept lattices. Matthew Skala, Victoria Krakovna, Janos Kramar, Gerald Penn. Association for Computational Linguistics (ACL), Sweden, July 2010.