Monday, March 20, 2023

The Actual Alignment Problem

     Aligning artificial intelligence with human values appears to be extremely difficult, if not impossible. Before we get into this post, let's clear up some definitions.

    This website is called Aligned Intelligence Solutions. "Aligned" means "controlled" in a sense. A aligned AI will not kill all of humanity, or put us all in a virtual Hellscape. It will not lock in bad moral values, and it will respond to humanity's requests in a reasonable manner. It will lead to the flourishing of human life and positive moral outcomes. I never really liked the word "artificial" in the phrase "artificial intelligence." This word it assumes that such intelligence won't be as robust or comparable to human intelligence. Worse yet, it may lead some to believe that "artificial" beings aren't as deserving of moral consideration. For convenience, I'll still use the term when I refer to AGI, "artificial general intelligence" (human level on every metric), and ASI, "artificial super intelligence" (way past human level intelligence). "Solutions" is a pretty vague word, but it leaves room for both discussing solutions to the alignment problem and for providing actual tangible solutions.

    Aligning an AI system with human values seems extremely difficult. There are many reasons for this, laid out very well in the books Superintelligence and Human Compatible. Before we discuss potential solutions to the alignment problem, we should discuss the potential outcomes from a utilitarian perspective. For all of these, assume that the goal that the AI is programmed with actually flows through to the outcome. Also, I am measuring these all not compared to where we are at right now, but the utility of each compared to a world with no sentient life.

1. Worst outcome: A superintelligence is programmed to create the maximum amount of suffering in the universe. The ASI makes a lot of humans (biological or digital) and tortures them for near-eternity. More likely, it kills all of humanity, and then makes a lot of really intelligent digital agents and tortures them for eternity.

2. Very bad outcome: A dystopia is locked in. Not the maximum amount of suffering, but a incredible amount. There's a million sci-fi examples of this. Authoritarian regimes rule the universe, human freedom is massively curbed, or humans make AIs into slaves (despite the AIs deserving moral values). Maybe a bad theocratic government imposes its values on the future of humanity in a way that we can't reverse. In order for this to be a bad outcome, it would have to have enough mass suffering to make it worse than simple non-existence. An example of this would be 90% of humans suffering for the rest of time while 10% live in luxury.

3. Neutral outcome: Every human is killed by the paperclip machine. Whoops. No more sentient life in the universe. Some would argue this is worse than the "worst outcome" above, because consciousness in and of itself is so valuable that even eternal torture is better than non-existence. I've had these arguments before with people and they are stupid.

4. Very good outcome: Things go really great! You could argue that if 99% of humans thrive while only 1% suffer forever that would qualify as great, but what I am imagining here is some sort of sci-fi utopia. We use intelligent machines to cure disease, spread among the starts, and make significant progress on morality. Everyone lives happy ever after (but utility is not maximized).

5. Best outcome: A superintelligence is programmed to create the maximum amount of well-being in the universe. Not sure what this would actually be (since I am a dumb human). But there is some form of total utopia. Maybe there's a bunch of humans or just a bunch of digital minds having a great time, but regardless it's a total party.

    Why did I break out the outcomes in this way? Simple, I think that humans currently working on AGI are too shortsighted. Most organizations working on AI risk are only worried about human existential risk, the Neutral outcome laid out above. Yes, it sounds absolutely terrible in comparison to where we are now, but let us not forget that there are much worse outcomes. I have stated before that it may not matter who gets AGI first, because it may be the case that we are all doomed to die via the paperclip machine regardless. However, if the Very Bad and Worst outcomes are possible, maybe it does matter to a substantial degree. The alignment problem, in my opinion, does not refer to biased training data and black-box neural nets. It does not refer to the problem of avoiding the paperclip maximizer. Overall, it refers to the sheer difficulty of achieving a Very Good or Best outcome. Assigning probabilities to any of these outcomes is silly, as they are not five discrete outcomes but rather a continuous scale. However, it seems clear to me that ending up on the "Good" side of the graph will take a whole lot of intelligence, character, and collaboration, something that I am not sure humanity is ready for.

No comments:

Post a Comment

Mind Crime: Part 10

    Standing atop the grave of humanity, smugly looking down, and saying "I told you so," is just as worthless as having done noth...