Near-term motivation for AI alignment

AI alignment work is usually considered “longtermist”, which is about preserving humanity’s long-term potential. This was the primary motivation for this work when the alignment field got started around 20 years ago, and general AI seemed far away or impossible to most people in AI. However, given the current rate of progress towards advanced AI capabilities, there is an increasingly relevant near-term motivation to think about alignment, even if you mostly or only care about people alive today. This is most of my personal motivation for working on alignment.

I would not be surprised if general AI is reached in the next few decades, similarly to the latest AI expert survey‘s median of 2059 for human-level AI (as estimated by authors at top ML conferences) and the Metaculus median of 2039. The Precipice gives a 10% probability of human extinction this century due to AI, i.e. within the lifetime of children alive today (and I would expect most of this probability to be concentrated in the next few decades, i.e. within our lifetimes). I used to refer to AI alignment work as “long-term AI safety” but this term seems misleading now, since alignment would be more accurately described as “medium-term safety”.

While AI alignment has historically been associated with longtermism, there is a downside of referring to longtermist arguments for alignment concerns. Sometimes people seem to conclude that they don’t need to worry about alignment if they don’t care much about the long-term future. For example, one commonly cited argument for trying to reduce existential risk from AI is that “even if it’s unlikely and far away, it’s so important that we should worry about it anyway”. People understandably interpret this as Pascal’s mugging and bounce off. This kind of argument for alignment concerns is not very relevant these days, because existential risk from AI is not that unlikely (10% this century is actually a lot, and may be a conservative estimate) and general AI is not that far away (an average of 36 years in the AI expert survey).

Similarly, when considering specific paths to catastrophic risk from AI, a typical longtermist scenario involves an advanced AI system inventing molecular nanotechnology, which understandably sounds implausible to most people. I think a more likely path to catastrophic risk would involve general AI precipitating other catastrophic risks like pandemics (e.g. by doing biotechnology research) or taking over the global economy. If you’d like to learn about the most pertinent arguments for alignment concerns and plausible paths for AI to gain an advantage over humanity, check out Holden Karnofsky’s Most Important Century blog post series.

In terms of my own motivation, honestly I don’t care that much about whether humanity gets to colonize the stars, reducing astronomical waste, or large numbers of future people existing. These outcomes would be very cool but optional in my view. Of course I would like humanity to have a good long-term future, but I mostly care about people alive today. My main motivation for working on alignment is that I would like my loved ones and everyone else on the planet to have a future.

Sometimes people worry about a tradeoff between alignment concerns and other aspects of AI safety, such as ethics and fairness, but I still think this tradeoff is pretty weak. There are also many common interests between alignment and ethics that would be great for these communities to coordinate on. This includes developing industry-wide safety standards and AI governance mechanisms, setting up model evaluations for safety, and slow and cautious deployment of advanced AI systems. Ultimately all these safety problems need to be solved to ensure that general AI systems have a positive impact on the world. I think the distribution of effort between AI capabilities and safety will need to shift more towards safety as more advanced AI systems are developed.

In conclusion, you don’t have to be a longtermist to care about AI alignment. I think the possible impacts on people alive today are significant enough to think about this problem, and the next decade is going to be a critical time for steering advanced AI technology towards safety. If you’d like to contribute to alignment research, here is a list of research agendas in this space and a good course to get up to speed on the fundamentals of AI alignment (more resources here).

Victoria Krakovna

Near-term motivation for AI alignment

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply