A mythic villain of the AI Safety movement is the Paperclip Maximizer—a machine so fixated on making paperclips, it liquidates humans so it can extract a few hundred milligrams of iron from our blood.
Yes, it’s an absurd thought experiment. But we see its basic dynamic play out daily: people, institutions, and algorithms are given simplistic objectives, which they pursue to the point of pathology. Creators hungry for clicks peddle fake news; corporations harm their customers in the name of “shareholder value”.
In a world of big data and rapid change, the relentless pursuit of numerical targets is an existential threat.
AI enthusiasts talk about the problem of alignment—how do we give intelligent machines a target that’s properly aligned with human values?
This is an impossibly hard task. No matter what objective you pick, an overly-literal AI ends up hacking its way towards the goal. You tell the AI to make money, and it robs a bank or enslaves a nation. Tell the AI to make you smile, and it surgically alters your face. Tell it to make you happy, and you’ll wake up in a wireheaded bliss-dream.
If these AI-related thought experiments seem far fetched, the same pattern plays out with humans: it’s impossible to incentivize the right behavior with a single reward. Here are a few real-world examples where good intentions went wrong:
Tying bank CEOs’ compensation to stock performance led to excessive risk-taking, potentiating the 2008 financial crisis
Engagement metrics on social media have generated bafflingly pointless and harmful content
Sales quotas at Wells Fargo led to widespread fraud
Standardized testing in schools hampers teachers, harms students, and creates perverse feedback loops
China’s zero-COVID policy wrecked its economy and ultimately resulted in a deadly surge in cases
And all these goals seem quite reasonable!
Usually if people are rewatching your video and commenting on it, it’s attention-worthy. Usually if your company’s sales are increasing and its stock is going up, you’re creating value. Usually, if your students test scores are improving, they’re learning.
But not always.
Philosophers have tried to sweep this problem under the rug by inventing the concept of “utility”. Utility is supposed to encompass all our different values, weighted appropriately. Utility, by design, is something we can always maximize safely.
Problem is, utility is a fictional abstraction. There’s no way to measure utility as such.
We can come up with all sorts of indirect measurements that correlate with utility: life expectancy, serotonin levels, the number of people living above the poverty line, etc. But each of these fall into the alignment trap. (Everyone has to wear a helmet at all times! and take antidepressants! and be euthanized if poor!)
Every measurable metric can be gamed.
This is a common pattern in social disciplines. We invent an abstractly good but immeasurable goal, like utility, freedom, truth, or justice. Then we come up with measurable-but-fallible proxies for that goal: “value” is proxied by share prices; “knowledge” is proxied by test scores.
When these proxies are studied passively, they often yield good information. But when the proxy becomes an explicit goal for the agent being measured, and the agent will find creative or devious ways to game the system.
Here’s Behavioral Economist Dan Ariely discussing the problem with proxy metrics:
Human beings adjust behavior based on the metrics they’re held against. Anything you measure will impel a person to optimize his score on that metric. What you measure is what you’ll get. Period.
This phenomenon plays out time and again in research studies. Give someone frequent flyer miles, and he’ll fly in absurd ways to optimize his miles.
When I was at MIT, I was measured on my ability to handle my yearly teaching load, using a complex equation of teaching points. The rating, devised to track performance on a variety of dimensions, quickly became an end in itself. Even though I enjoyed teaching, I found myself spending less time with students because I could earn more points doing other things. I began to scrutinize opportunities according to how many points were at stake. In optimizing this measure, I was not striving to gain more wealth or happiness. Nor did I believe that earning the most points would result in more effective learning. It was merely the standard of measurement given to me, so I tried to do well against it (and I admit that I was rather good at it).
There’s a broad pattern here: the “safer” a maximizer seems, the more abstract and immeasurable it is, and the more it has to rely on unsafe proxy variables.
Even if there existed a Good that is one and can be predicated generally or that exists separately and in and for itself, it would be clear that such a Good can neither be produced nor acquired by human beings. However, it is just such a Good that we are looking for.
—Aristotle
But the idea of a “safe maximizer” is disturbingly common. Various subcultures aggressively push for the maximization of their own preferred abstraction: Justice, Jesus, Democracy, Truth, Equality, Anarchy, Science.
Most of these maximizers are provisionally safe, because they encounter enough resistance that they can’t go off the rails. We’re nowhere close to radical anarchy, so it’s pretty safe for anarchists to pull as hard as they can in that direction.
But every maximizer has a horrific shadow that emerges as it gains dominance. Cancel culture was built by justice-maximizers; eugenics and the atom bomb by science-maximizers; Christofascism and fundamentalism by Jesus-maximizers.
In extremis, each variable not only yields perverse outcomes—it begins to negate itself. Justice becomes vindictive; science becomes dogmatic; religion becomes soul-crushing.
In the relentless pursuit of growth, the snake eats its own tail.
There is always a well-known solution to every human problem—neat, plausible, and wrong.
—H.L. Mencken
The obvious answer is that we need nuance. Rather than telling a CEO to “maximize shareholder value”, tell them to balance the needs of customers and employees in a way that grows the company over time. Right?
But nuance confuses people. It especially confuses institutions. Without clear goals, we get stuck in perpetual debate, and never move forward.
Here’s Dan Luu talking about learning this lesson at Microsoft:
When I was at MS, I remember initially being surprised at how unnuanced their communication was, but it really makes sense in hindsight.
For example, when I joined Azure, I asked people what the biggest risk to Azure was and the dominant answer was that if we had more global outages, major customers would lose trust in us and we'd lose them forever, permanently crippling the business.
Meanwhile, the only message VPs communicated was the need for high velocity. When I asked why there was no communication about the thing considered the highest risk to the business, the answer was if they sent out a mixed message that included reliability, nothing would get done.
The fear was that if they said that they needed to ship fast and improve reliability, reliability would be used as an excuse to not ship quickly and needing to ship quickly would be used as an excuse for poor reliability and they'd achieve none of their goals.
So we have a tradeoff: the more single-minded your goals, the more effectively you can execute on them. The more nuanced they are, the slower you’ll move—but at least there’s less chance of liquidating all humans.
There’s a paradox lurking here. If we completely eradicate the variable-maximizers, we’ve fallen prey to yet another maximizer: we start to maximize “balance”, and thereby put ourselves out of balance.
Scott Alexander paints a picture of this infinite regress in his short story, In the Balance:
Just as you think you have figured all this out, there will appear a vision of MLOXO7W, Demon-Kaiser of the Domain of Meta-Balance, who appears as a face twisted into a Moebius strip. It will tell you that sometimes it is right to seek balance, and other times right to seek excess, and that a life well-lived consists of excess when excess is needed, and balance when balance is needed….It will ask you to devote the Artifact and its power to balancing balance and imbalance, balancedly.
We can’t give up entirely on maximization. Maximization is the basic ordering principle of humanity: give someone an incentive, and watch them go. We like having straightforward, measurable goals.
So how can we avoid the danger?
The key is to accept and embrace the variable-maximizers, but lots of them, and with strong safeguards in place so we don’t get completely lost.
At a personal level, this can mean surrounding yourself with people and media that contradict your worldview. It can mean using therapy, meditation, and introspection to periodically “zoom out” and reorient. It can mean deliberately holding contradictory beliefs and living with the dissonance that creates.
At an institutional level, it means embracing conflict as creative tension. The most productive organizations I’ve seen give clear but conflicting incentives to each sub-organization: Sales is focused on short-term revenue; Product on creating long-term value; Engineering on delivering a reliable service. Each sub-organization can move obsessively towards its goal, while the CEO moderates the inevitable conflict that arises within the executive team.
The downside is that the executive—or, at an individual level, the you that comes out in moments of deep introspection—perpetually lives in a world of tension and uncertainty. The executive is in a constant state of turmoil, trying to keep any one goal from dominating to the point of excess.
With variable-maximizers around, there’s always a risk of getting lost. But with enough competition and the right safeguards in place, we can at least keep ourselves from drifting off to infinity.
Hopefully we’ll get to keep the iron in our blood too.