Category Archives: rationality

2023-24 New Year review

2023 review

Life updates

We received a special gift for New Year’s – Michael (“Misha”) arrived just in time to be born in 2023! Daniel is already getting the hang of rocking his brother and singing him lullabies.

Continue reading →

2022-23 New Year review

2 Replies

This is an annual post reviewing the last year and setting goals for next year. Overall, this was a reasonably good year with some challenges (the invasion of Ukraine and being sick a lot). Some highlights in this review are improving digital habits, reviewing sleep data from the Oura ring since 2019 and calibration of predictions since 2014, an updated set of Lights habits, the unreasonable effectiveness of nasal spray against colds, and of course baby pictures.

2022 review

Life updates

I am very grateful that my immediate family is in the West, and my relatives both in Ukraine and Russia managed to stay safe and avoid being drawn into the war on either side. In retrospect, it was probably good that my dad died in late 2021 and not a few months later when Kyiv was under attack, so we didn’t have to figure out how to get a bedridden cancer patient out of a war zone. It was quite surreal that the city that I had visited just a few months back was now under fire, and the people I had met there were now in danger. The whole thing was pretty disorienting and made it hard to focus on work for a while. I eventually mostly stopped checking the news and got back to normal life with some background guilt about not keeping up with what’s going on in the homeland.

Work

My work focused on threat models and inner alignment this year:

Made an overview talk on Paradigms of AI alignment: components and enablers and gave the talk in a few places.
Coauthored Goal Misgeneralization: why correct rewards aren’t enough for correct goals paper and the associated DeepMind blog post
Did a survey of DeepMind alignment team opinions on AGI ruin arguments, which received a lot of interest on the alignment forum.
Wrote a post on Refining the Sharp Left Turn threat model
Contributed to DeepMind alignment posts on Clarifying AI x-risk and Threat model literature review
Coauthored a prize-winning submission to the Eliciting Latent Knowledge contest: Route understanding through the human ontology.

Continue reading →

2020-21 New Year review

5 Replies

This is an annual post reviewing the last year and setting goals and predictions for next year. 2020 brought a combination of challenges from living in a pandemic and becoming a parent. Other highlights include not getting sick, getting a broader perspective on my life through decluttering, and going back to Ukraine for the first time. (This post was written in bits and pieces over the past two months.)

2020 review

Life updates

Janos and I had a son, Daniel, on Nov 11. He arrived almost 3 weeks later than expected (apparently he was waiting to be born on my late grandfather’s birthday), and has been a great source of cuddles, sound effects and fragmented sleep ever since.

Some work things also went well this year – I had a paper accepted at NeurIPS, and was promoted to senior research scientist. Also, I did not get covid, and survived half a year of working from home (much credit goes to the great company of my housemates). Overall, a lot of things to be grateful for.

Continue reading →

2019-20 New Year review

4 Replies

This is an annual post reviewing the last year and making resolutions and predictions for next year. This year’s edition features sleep tracking, intermittent fasting, overcommitment busting, and evaluating calibration for all annual predictions since 2014.

2019 review

AI safety research:

Wrote an updated version of the relative reachability paper, including an ablation study on design choices.
Coauthored a paper on modeling AGI Safety frameworks with causal influence diagrams, accepted to IJCAI workshop.
Wrote a paper on avoiding side effects by considering future tasks, accepted to NeurIPS workshop.
Co-ran a subteam of the safety team focusing on agent incentive design (ongoing).

AI safety outreach:

Co-organized FLI’s Beneficial AGI conference in Puerto Rico, a more long-term focused sequel to the original Puerto Rico conference and the Asilomar conference. This year I was the program chair for the technical safety track of the conference.
Co-organized the ICLR AI safety workshop, Safe Machine Learning: Specification, Robustness and Assurance. This was my first time running a paper reviewing process.
Gave a talk at the IJCAI AI safety workshop on specification, robustness an assurance problems.
Took part in the DeepMind podcast episode on AI safety (“I, robot”).

Continue reading →

2018-19 New Year review

3 Replies

2018 progress

Research / AI safety:

Wrote a paper on measuring side effects using relative reachability in May, and presented the results at the ICML GoalsRL workshop and the AI safety summer school. Since then, some new approaches have come out using my method as a baseline :).
Made a list of 30 specification gaming examples in AI (assembled from several existing lists). Since the list was posted in April, 16 new examples have been contributed through the form (thanks everyone!). The list received some attention on Twitter, and I was interviewed about it by Wired and the Times.
Was in the top 30% of NeurIPS reviewers.
Gave talks at the Oxford AI Society, EA Global London, etc.
Got involved in organizing the upcoming ICLR AI safety workshop, Safe Machine Learning: Specification, Robustness and Assurance.

Rationality / effectiveness:

Attended the CFAR mentoring workshop in Prague, and started running rationality training sessions with Janos at our group house.
Started using work cycles – focused work blocks (e.g. pomodoros) with built-in reflection prompts. I think this has increased my productivity and focus to some degree. The prompt “how will I get started?” has been surprisingly helpful given its simplicity.
Stopped eating processed sugar for health reasons at the end of 2017 and have been avoiding it ever since.
- This has been surprisingly easy, especially compared to my earlier attempts to eat less sugar. I think there are two factors behind this: avoiding sugar made everything taste sweeter (so many things that used to taste good now seem inedibly sweet), and the mindset shift from “this is a luxury that I shouldn’t indulge in” to “this is not food”.
- Unfortunately, I can’t make any conclusions about the effects on my mood variables because of some issues with my data recording process :(.
Declining levels of insomnia (excluding jetlag):
- 22% of nights in the first half of 2017, 16% in the second half of 2017, 16% in the first half of 2018, 10% in the second half of 2018.
- This is probably an effect of the sleep CBT program I did in 2017, though avoiding sugar might be a factor as well.
Made some progress on reducing non-research commitments (talks, reviewing, organizing, etc).
- Set up some systems for this: a spreadsheet to keep track of requests to do things (with 0-3 ratings for workload and 0-2 ratings for regret) and a form to fill out whenever I’m thinking of accepting a commitment.
- My overall acceptance rate for commitments has gone down a bit from 29% in 2017 to 24% in 2018. The average regret per commitment went down from 0.66 in 2017 to 0.53 in 2018.
- However, since the number of requests has gone up, I ended up with more things to do overall: 12 commitments with a total of 23 units of workload in 2017 vs 19 commitments with a total of 33 units of workload in 2018. (1 unit of workload ~ 5 hours)

Continue reading →

2017-18 New Year review

4 Replies

2017 progress

Research/career:

Coauthored RL with reward corruption paper and presented the results at the U Toronto CS department, Workshop on Reliable AI, and Women in ML workshop.
Coauthored AI Safety Gridworlds paper.
Gave a talk on Interpretability for AI safety at the NIPS Interpretable ML Symposium.
Gave a lot of people career advice on getting into AI safety research.

FLI / other AI safety:

Coorganized the Beneficial AI conference in Asilomar and gave a talk summarizing the work of FLI grantees presented at the Asilomar workshop.
Cowrote the project examples document for the new grants program.
Spoke at the Tokyo AI & Society symposium (my first conference in Asia).
Started a new public Facebook group AI Safety Open Discussion.

Continue reading →

Takeaways from self-tracking data

3 Replies

I’ve been collecting data about myself on a daily basis for the past 3 years. Half a year ago, I switched from using 42goals (which I only remembered to fill out once every few days) to a Google form emailed to me daily (which I fill out consistently because I check email often). Now for the moment of truth – a correlation matrix!

The data consists of “mood variables” (anxiety, tiredness, and “zoneout” – how distracted / spacey I’m feeling), “action variables” (exercise and meditation) and sleep variables (hours of sleep, sleep start/end time, insomnia). There are 5 binary variables (meditation, exercise, evening/morning insomnia, headache) and the rest are ordinal or continuous. Almost all the variables have 6 months of data, except that I started tracking anxiety 5 months ago and zoneout 2 months ago.

Continue reading →

2016-17 New Year review

5 Replies

2016 progress

Research / career:

Got a job at DeepMind as a research scientist in AI safety.
Presented MiniSPN paper at ICLR workshop.
Finished RNN interpretability paper and presented at ICML and NIPS workshops.
Attended the Deep Learning Summer School.
Finished and defended PhD thesis.
Moved to London and started working at DeepMind.

FLI:

Talk and panel (moderator) at Effective Altruism Global X Boston
Talk and panel at the Governance of Emerging Technologies conference at ASU
Talk and panel at Brain Bar Budapest
AI safety session at OpenAI unconference
Talk and panel at Effective Altruism Global X Oxford
Talk and panel at Cambridge Catastrophic Risk Conference run by CSER

Continue reading →

Using humility to counteract shame

2015-16 New Year review

5 Replies

2015 progress

Research:

Finished paper on the Selective Bayesian Forest Classifier algorithm
Made an R package for SBFC (beta)
Worked at Google on unsupervised learning for the Knowledge Graph with Moshe Looks during the summer (paper)
Joined the HIPS research group at Harvard CS and started working with the awesome Finale Doshi-Velez
Ratio of coding time to writing time was too high overall

FLI:

Co-organized two meetings to brainstorm biotechnology risks
Co-organized two Machine Learning Safety meetings
Gave a talk at the Shaping Humanity’s Trajectory workshop at EA Global
Helped organize NIPS symposium on societal impacts of AI

Rationality / effectiveness:

Extensive use of FollowUpThen for sending reminders to future selves
Mapped out my personal bottlenecks
Sleep:
- Tracked insomnia (26% of nights) and sleep time (average 1:30am, stayed up past 1am on 31% of nights)
- Started working on sleep hygiene
- Stopped using melatonin (found it ineffective)

Continue reading →