Aligned Intelligence Solutions: June 2023

Tuesday, June 20, 2023

Mind Crime: Part Two

Historically, I have spent much more time than the average person thinking about eternal life. Based on our current understanding of cosmology, it is likely that the universe is finite and temporal. It has not been around forever, and it is only so big. The universe is expanding at an accelerating rate, and as a result even the smallest atoms will be torn apart by the expansion of space. Thus, there is a time limit on the whole thing. The party will end. Eternal life of any sort is impossible. Perhaps we last trillions of years, perhaps through digitally uploading our minds and ramping up our processing power we make those trillions feel like trillions of trillions. But, regardless, there is an end. Our only way to true eternity appears to be outside of known science, and I don't find any current methods particularly convincing. So, no eternal torture, sounds pretty rad, right? Right?

Instead of eternal, everlasting Hell, what if you were just tortured for trillions of years? Still sounds pretty bad, in my humble opinion. Unfortunately, this is still within our power. I don't find the human brain particularly special. It is incredible, yes, and we still don't really understand how it works, but there doesn't seem to be any physical reasons that we can't replicate the same thing with silicon, eventually. There will probably be a time, potentially in the next few centuries, where we could digitally upload our brains to computers, or build brand new morally significant thinking machines in computers. This is not a road to eternity, but it is still a road to trillions of years of pleasure and/or pain. Life extension of this sort is almost incomprehensible at this point, but that doesn't mean we shouldn't think about it. It is on this time scale that things become particularly significant in utilitarian terms. A bad actor or unaligned AI could fit quite a bit of suffering into that timescale, especially if they can replicate digital minds on a mass scale (and there doesn't seem to be a reason they couldn't). Something to that extent could make slavery or the Holocaust look like a papercut, and I say that with all the recognition of the pain and brutality of those events. Why is this not more talked about? Because we are stuck in the naturalistic fallacy. Ask the average person if they care about the potential pain and suffering of computers, and you will be met with scoffs. You'll probably get the standard response that we dish out all too often: "who cares, they're not human." A dangerous sentence. A sentence that has been responsible for more pain and suffering than any other in human history.

The Near-Alignment Problem

Let's walk through a quick "would you rather." Would you rather have a horrible first date or a great marriage that ends horribly years down the line? In the first scenario, let's assume that you and this person are just simply not compatible. Your date dumps his or her entire drink on you at the start, and then starts to loudly complain to you about their ex. You are mortified. Your date then proceeds to explain to you that the world is flat, and they mention off-hand that most people are actually lizards in disguise. You find this roughly amusing, until you realize that its only been ten minutes and you should probably wait a full hour in order to not be seen as rude. Not great, right?

Well, in the second scenario, assume the date goes perfectly. A great relationship of two years blossoms, and pretty soon you wind up married. Your partner seems perfect, and you are madly in love. You and your partner have three incredible children, and everything seems amazing. Then, four years into marriage, your partner starts acting quite strange. They start to despise you for no discernable reason, and they start pushing your buttons in ways only someone who knows you intimately could. Out of the blue they file for divorce and aim to take the kids. You are blindsided, enraged. But they gaslight you over and over and claim that you are the crazy one. One day, you are rummaging through a drawer in the house when you find a sketchbook. You start to flip through it, and you find that it is full of crude drawings of lizard people, accompanied by rambling, incoherent sentences about you and your children. You begin to realize the obvious: your partner is losing their mind. Even worse, there are children at stake now. The divorce proceedings continue as normal, despite your pleadings. In public, your partner shows no signs of craziness. But sometimes, very infrequently, you catch a flicker of insanity in their eyes.

This is very long-winded metaphor for AI alignment. I am saying that a relationship that goes 99% right but goes wrong at the very end could be much worse than a relationship that is a non-starter. In the same vein, if AI alignment goes 99% right but then goes wrong at the very end, that could be much worse than AI that fails to be aligned outright. How so? Well, the "first-date" AI could be something like a paperclip maximizer. We probably don't delegate as much authority to such a system, or if we accidently do, we may notice early on some warning signs and remove authority quickly. The "marriage" AI might do everything we want for quite a long time. Maybe it maps the human value function exactly correctly, and knows exactly what we need. Then, for some unforeseen reason, it puts a negative sign in front of the human value function. Boom, now there is incredible suffering risk. By then, maybe our systems are largely controlled, offloaded. Maybe we are simply too dependent, with too many ties. Maybe we don't have the power to change course. By then, maybe the entire human race is on the line. Maybe we are in too deep.

Tuesday, June 13, 2023

The Lone Genius

Either humanity will solve AI alignment, or we won't. Whether we do or don't depends largely on the type of alignment ecosystem we build out beforehand. Not only is deep learning difficult, but it requires a large number of resources (algorithmic expertise, computational resources, training data) and thus a large number of human inputs. Humans will write the algorithms, humans will run the computational facilities and build the GPUs, and humans will create and clean the required training data. I would compare this creation to that of the atomic bomb in a lot of ways. You need a certain level of research progress: "hey, atoms make up everything and you can split them, which causes a chain reaction, which we can use to make a really, really big bomb that could kill thousands of people." This goes from the theoretical to applied science in a messy way: "hey we are at war with the Germans and they have really smart scientists. If we don't make this super-weapon, they will." For the atom bomb, the full industrial weight of the United States military was put behind the development of this super-weapon. And then, at some point it gets applied: "well we just dropped the weapon a couple of times and killed hundreds of thousands of people." During this process, an entire field of science (nuclear research) was involved. An entire landscape of military might was utilized. The development and testing of the bomb required an entire complex of bomb-able real estate, industrial machinery, and American workers.

Contrast this to chemically engineered pandemics. As we saw in the 2001 anthrax attacks, a very small number of people (or a single person) can create a bioweapon. Yes, decades of research in chemistry and biology will pave the way for such weapons (please for the love of god stop publishing research on how to make vaccine-proof smallpox), but an individual terrorist, if given the right skill set, could synthesize a horribly transferable and deadly virus. Maybe some state actor vaccinates its population and then releases super-smallpox on the rest of us, but it is more likely that a single individual with a school-shooter mentality learns biology. This is something we need to protect against (again, open source is good for some software, not chemically engineered bioweapons of mass destruction).

AGI, at this point in human history, is likely to be much more similar to nuclear weapons. The work of an entire field of researchers and an entire industry of engineers will lead to the development of AGI. Such a massive set of training data and such a large amount of compute is simply not accessible to lone individuals. There is a certain romanticization of the "lone genius." People such as Einstein who contributed massively in their field, breaking away from the standard line of thinking and jumping to revolutionary conclusions seemingly overnight. There are also the engineers with massive individual impact, such as Linus Torvalds (creator of the Linux operating system and Git). However, even these impacts are within a certain ecosystem, followed up by critical additions by their spiritual descendants. In some fields of science, a lone genius can create (Linux) or destroy (Smallpox 2.0). In the world of AI, it seems we are stuck with organizational level change. This can be a blessing, or it can be a curse. Who do you trust more, organizations (companies, governments, NGOs), or individuals (Einstein, Linus, the unknown individual who killed five people via Anthrax)?

Tuesday, June 20, 2023

Mind Crime: Part Two

The Near-Alignment Problem

Tuesday, June 13, 2023

The Lone Genius

Mind Crime: Part 10