Wednesday, April 19, 2023

Solving Alignment Would Be Terrible?

    If everyone in the world was given a genie that granted three wishes, everything would fall apart. Even if there were no "monkey's paw" problems, and every single person's true intention was granted, chaos would be the only outcome. I'd wish for "make me a million dollars, legally." Someone else would wish for "steal ten million from J.P. Morgan and make it untraceable." Another would wish for "push through legislation that would make it illegal to fish." Plenty of wishes would contradict and the war would be won by the people with the most powerful genies. Regardless, society as we know it would collapse. This is why I'm wondering if solving alignment may actually be a horrible thing to do right now. Not the problem of finding the objective moral values of the universe and embedding them into all AI, but rather the problem of making an AI follow along with your arbitrary values (also called "wishes"). In a world of aligned AGI that can replicate, if every person is given a personal AGI, absurdity begins. The same wishes are pursued. Labor costs are now essentially zero, and the only real winners are the people with the most powerful genie. We wouldn't give everyone a nuke, just as we wouldn't want a small group of unelected people to have the only nukes. Given that the capabilities of an AGI will increase with time, I don't see how democratizing AGI leads to anything but madness. I also don't see how leaving AGI in the hands of a small group of people leads to anything but madness. I guess I only see madness.

    If anyone on Earth has access to a digital god, things will not go well. Even if that god is not all-powerful, things will not go well. I don't see a massive distinction between AGI and ASI, because at some level a human brain emulated in a computer is already superintelligent. It can think faster, access the entirely of human knowledge ("the internet"), and probably replicate pretty easily. Obviously I care way less about aligning AGI as I do about aligning ASI, but I need to remind myself that they are not so far off or necessarily different. What does all of this mean in the short term? Let's take interpretability for example. If we knew exactly why a neural net made every decision, would that be a good thing? Would that create massive increases in AI capabilities and trust in AI, and lead to everyone getting a genie even sooner? Maybe not, and maybe having aligned genies is way better than having unaligned genies. But if some unaligned low-level genies start messing up and killing people, maybe we take a big step back as a society. Maybe we outlaw genies, or take a serious look at figuring out our values. If the aligned ride to AGI goes smoothly and then the first deaths occur in an abrupt human genocide, we'll be too late. Whether an ASI ends up being good for humanity or not greatly depends on the values it is following. Even if it is "aligned" to those values perfectly, things will probably go horribly wrong for most people. If you think power corrupts, wait until a small group of individuals determines the values of this Better God. This is why I am pretty hesitant about my idea to massively boost alignment research across the board. Yes, hesitant about an idea I came up with yesterday. Maybe research into corrigibility (figuring out how to turn AI off or change its values) is much more important than all other research. I really have no idea, but it is probably an important conversation to have.

No comments:

Post a Comment

Mind Crime: Part 10

    Standing atop the grave of humanity, smugly looking down, and saying "I told you so," is just as worthless as having done noth...