Near-term motivation for AGI alignment

AGI alignment work is usually considered “longtermist”, which is about preserving humanity’s long-term potential. This was the primary motivation for this work when the alignment field got started around 20 years ago, and AGI seemed far away or impossible to most people in AI. However, given the current rate of progress towards general AI capabilities, there is an increasingly relevant near-term motivation to think about alignment, even if you mostly or only care about people alive today. This is most of my personal motivation for working on alignment.

I would not be surprised if AGI is reached in the next few decades, similarly to the latest AI expert survey‘s median of 2059 for human-level AI (as estimated by authors at top ML conferences) and the Metaculus median of 2039. The Precipice gives a 10% probability of human extinction this century due to AI, i.e. within the lifetime of children alive today (and I would expect most of this probability to be concentrated in the next few decades, i.e. within our lifetimes). I used to refer to AGI alignment work as “long-term AI safety” but this term seems misleading now, since alignment would be more accurately described as “medium-term safety”. 

While AGI alignment has historically been associated with longtermism, there is a downside of referring to longtermist arguments for alignment concerns. Sometimes people seem to conclude that they don’t need to worry about alignment if they don’t care much about the long-term future. For example, one commonly cited argument for trying to reduce existential risk from AI is that “even if it’s unlikely and far away, it’s so important that we should worry about it anyway”. People understandably interpret this as Pascal’s mugging and bounce off. This kind of argument for alignment concerns is not very relevant these days, because existential risk from AI is not that unlikely (10% this century is actually a lot, and may be a conservative estimate) and AGI not that far away (an average of 36 years in the AI expert survey). 

Similarly, when considering specific paths to catastrophic risk from AGI, a typical longtermist scenario involves AGI inventing molecular nanotechnology, which understandably sounds implausible to most people. I think a more likely path to catastrophic risk would involve AGI precipitating other catastrophic risks like pandemics (e.g. by doing biotechnology research) or taking over the global economy. If you’d like to learn about the most pertinent arguments for alignment concerns and plausible paths for AI to gain an advantage over humanity, check out Holden Karnofsky’s Most Important Century blog post series. 

In terms of my own motivation, honestly I don’t care that much about whether humanity gets to colonize the stars, reducing astronomical waste, or large numbers of future people existing. These outcomes would be very cool but optional in my view. Of course I would like humanity to have a good long-term future, but I mostly care about people alive today. My main motivation for working on alignment is that I would like my loved ones and everyone else on the planet to have a future. 

Sometimes people worry about a tradeoff between alignment concerns and other aspects of AI safety, such as ethics / fairness, but I still think this tradeoff is pretty weak. There are also many common interests between alignment and ethics that would be great for these communities to coordinate on. This includes developing industry-wide safety standards and AI governance mechanisms, setting up model evaluations for safety, and slow and cautious deployment of advanced AI systems. Ultimately all these safety problems need to be solved to ensure that AGI systems have a positive impact on the world. I think the distribution of effort between AI capabilities and safety will need to shift more towards safety as more advanced AI systems are developed. 

In conclusion, you don’t have to be a longtermist to care about AGI alignment. I think the possible impacts on people alive today are significant enough to think about this problem, and the next decade is going to be a critical time for steering advanced AI technology towards safety. If you’d like to contribute, here is a list of research agendas in this space, and a good course to get up to speed on the fundamentals of AGI alignment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s