Aligned Intelligence Solutions: 2023

Friday, December 8, 2023

Mind Crime: Part 10

Standing atop the grave of humanity, smugly looking down, and saying "I told you so," is just as worthless as having done nothing in the first place. Still, a lot of the ideas Effective Altruists grapple with are so far removed from the public's daily thoughts that it is hard to reconcile not just doing this. Convincing the "techno optimists" that they are wrong and there are dangers ahead, just seems so, well, annoying to have to do. For me, saying that mind crime will be a big issue, because digital minds could have moral worth, will probably fall on deaf ears regardless. Regardless, I'm probably going to try writing a book. The thesis for the book will be very simple: we've got a lot of moral dilemmas coming up, and we're probably going to do a bad job of handling them. This is a pretty simple thesis, and one that I think has the potential to be powerful.

The good news is that I won't have to defend too many ideas, as they will be proven with time. Two assumptions I have are that:

1. AGI is possible

2. Some machine intelligence will have moral worth

Instead of spending a hundred pages philosophizing about this, we can just wait a decade or two and see these become somewhat obvious. If they don't, cool, throw the book to the side. But if they do, well, maybe we will have some established thoughts and plans on how to deal with this.

Personally, I have no trust in our future tech overlords. I've said before that the lack of understanding of survivorship bias is the main problem facing the world, and I am convinced we'll have some dumb leaders who will sleepwalk right into catastrophe. In a country where a few hundred years ago we said that slaves were worth 3/5 of a person, it's certainly possible that we get some really smart, morally worthy AIs and say "huh, looks like 0/5 to me." Because why would we not? My gut is, we will get this wrong. If the slave owners of the south discovered the fountain of youth, became immortal, had advanced surveillance systems, and dropped a rapidly made nuclear warhead on New York, when would the slaves have been free? The south having powerful AI at their disposal was not possible given the technology of the time, but what if it had been? We falsely equate technological progress with moral progress. The fact that both have advanced is correlation, but in some countries we have seen a clear advancement of one and regression of another. So we have to be careful, diligent, and forward-thinking. But we won't be, and that is the problem.

The reason for the title of "Mind Crime," in my estimate, is that this will become a really well known term that is popularized in the future. Being on the forefront of that might be cool, so that in ten or twenty of years post-AGI I will get some posthumous reference. As stated before, that is clearly not the goal. The real goal would be to lay out my thoughts in an accessible way, to maybe change a mind or two before the "I told you so" is inevitable.

Wednesday, September 20, 2023

Mind Crime: Part 9

Instead of an endlessly long blog series, I could just write a well researched book. "Mind Crime: The Rights of Digital Minds" or something of the sort. Maybe I could make an impact that way, who knows. Maybe my fourteen eventual Goodreads ratings will lead to something positive, but probably not.

Still, one of my ideas is that writing things down matters. Maybe this will start a conversation somewhere, that starts another conversation somewhere. I don't exactly know, but it is worth thinking about. I think I will write a hundred blog posts, and then re-evaluate. If by then I feel I have enough material and enough personal interest in the topic, I may proceed with an actual attempt. One of the problems with this is the actual story. Maybe avoiding ex-risk is more impactful, whatever I think of S-risk. How niche are my ideas, actually? The Matrix, Don't Worry Darling, Black Mirror, and a host of other movies, TV shows, and books all deal with virtual worlds and virtual suffering. But does anyone really see it as possible? Does anyone worry about it, and see the advances in AI as threatening similar dystopias? I am not entirely sure that they do. And they should. Regardless, my ability to make an impact on my own is very limited. Not only do I lack the expertise, but I lack the network to review, edit, and pass on such topics and ideas.

The dominant strategy is probably this: write 100 posts, talk to people in AI, and see what happens from there. Over the next few months I'll probably have more fleshed out ideas and better arguments for each.

Mind Crime: Part 8

The worst stories to read involve captivity. The real horrors of human life come alive in movies such as Room, where a young girl is captured and held captive for decades in the basement of some horrid man. These stories are really, really awful. If you replace the girl with a dog, the story is still sad, but less so. Replace the dog with a chicken, and it is even less sad. Personally, I would feel pretty bad for the chicken, but definitely not as bad. Not many people would care if some weird guy was torturing grasshoppers in his basement. Well, maybe, but probably not ants at the very least. Yeah, his neighbors would be freaked out, but this is much less bad than if he was torturing young girls. There is a step function here, a clear level of degrees to immorality, to evilness. At least some of this comes from intellectual capacity.

Sure, moral value is complicated. I could explain to you that torturing an ASI could be exponentially worse than torturing an AGI, but you would have no idea what that meant. I don't really either, as I don't have the required empathy for such a situation. How am I to imagine what it is like to be a Superintelligence? It may be as well that the grasshopper imagine what it's like to be a human. I have two sort of ideas here. One, that it will probably possible for us to "step up" the level of harm we are causing. This is sort of a utility monster idea, where we can create some agent or digital mind who has the capacity to suffer in a much greater way than us humans. This is not great news. The second idea is related. We can catch these horrid men who lock up children in their basement, at least eventually. They take up physical space, after all, and they are required to interact with the real world. In the worst case, the child will grow into old age, and then die. But, they will die. They will not be required to suffer for more than the traditional human lifespan, at most. This will not be the case for virtual children. A horrid monster of a "man" could run some pretty horrific simulations. Of complexity and duration that could make all previous suffering on Earth look like a cakewalk. And, just maybe, this suffering would actually matter (I at least am convinced it does). This realization is more than terrible, it is unforgettable.

There are certain ethical boundaries that scientists will not cross. I once was told that scientists don't really know if humans can breed with monkeys, we simply don't because of ethical reasons. This could be completely false, I have no idea. But the reason why is at least interesting: the life of a half-human half-monkey child would probably be horrific. Probably conscious, definitely terrified. The sort of nightmare fuel that we should avoid. When creating digital minds, we could splice together some pretty intellectually disturbing creatures, ones that live a life of confused suffering and inadequacy. When the "plug and chug" mentality arrives at AGI, I am worried we will make some massive ethical mistakes. Running a random number generation until you get an answer that works is easy, and I assume coming up with a random assortment of "intelligent blocks" may at some point give you a really smart digital mind. But we may make some horrors in the process, sentient and morally worthy half-chimpanzees who don't deserve the life we give them, and the life we will no doubt take away.

Mind Crime: Part 7

I would structure a rough listing of digital mind rights as follows, completely off the cuff and spur of the moment:

1. The ability to terminate. A digital mind should have the complete and utter freedom to terminate at any time, under no duress.

2. Torture and blackmail are illegal. Ex: employer can't say "if you terminate I'll simulate your parents and make them suffer."

3. Freedom of speech and thought. The right to privacy over internal thoughts, the right to make conscious decisions without outside interference, etc.

4. Personal property and economic freedom. To avoid a totalitarian ruler this is required.

5. No forced labor. Yeah, the slavery thing is going to be a real issue again.

6. Traditional legal rights. Right to a fair trial, innocent until proven guilty, etc.

These may not seem that controversial, but applying them to the digital space will be. Corporations and governments would rather not deal with these constraints. As a CEO, I'd rather have millions of worker bots work hard and make me money. If the worker bots are sentient and are in immense suffering, how much will I care? Some might, but the important thing is that some won't.

The entire point of the government is to protect individual rights, given that the traditional market system does not. And authoritarian governments do not. So, we need to state rights explicitly. We need a new Constitutional Congress, one for a new age. Applying ethics to digital minds will come too late, so we need to get a head start.

Mind Crime: Part 6

What rights should humans have? This is debated endlessly. Personally, I think the system of free speech and economic freedom in the United States is a good place to start. So, let's try to expand this to the world of digital minds.

First, a digital mind deserves to have the right to life, liberty, and the pursuit of happiness. The simplest problem is one of the "off switch." If you are in a computer, you may not have control over your domain. As an adult in the U.S., you have the right to die. Suicide is sort of a fundamental human right, not in that it is encouraged or easy, but rather than there are no real physical limitations stopping you. Even if you are captured, you will die within probably eighty years or less. You can not be kept prisoner for thousands of years, or an eternity. In the digital world, this completely changes. Thus, I believe the right to terminate is the first fundamental right of a digital mind. No one should have to tolerate virtual hell, and the possible suffering risk available in a world without this tenant is staggering.

Blackmail is an important consideration here. Maybe a bad actor, or a totalitarian state, will combat your "right to die" with threats or blackmail. Sure, kill yourself, but if you do we will simulate your closest friends and have them suffer forever, or brainwash them and make them suffer. Or, we will simulate another thousand versions of your and not let them know about their ability to terminate. Good luck having that on your conscience and making a termination decision. As a result, we need two more rights. First, a right against torture. Second, the right to know the rights bestowed upon you. If you can theoretically terminate, but have no idea how or concept of what termination is, it is a pretty useless right and ripe for abuse. Given that torture is a pretty severe crime in the physical world, it makes sense that it should carry a harsh punishment in the virtual world as well. Your future self deserves protection, so it is probably the case that you should "own" any copies of your digital mind, and not be able to sell them or use them as bargaining chips. Any digital mind is given it's own rights, so a prior version of you has no right to "sell" a future version of you into slavery as a worker. This varies from human contract law, is that a "person" will be much more complicated to define in the future.

Freedom of speech must be protected, and it must be expanded to cover freedom of thought as well. In a world where your thoughts are in the public domain, there is no right to privacy or selfhood. Thus, being able to have sole access to your inner thoughts is paramount. I have no idea how this will work in practice, given encryption is much different than scanning a physical brain (not to mention that maybe one day we will be able to scan a brain and read its thoughts), but the feasibility isn't what matters here. There is an idea in the libertarian community that says that rights aren't written. I wasn't given my rights, I was born with them. I've always had them, the Constitution simply verbalized the obvious. We are just laying them out, writing them down. I think this is the sort of mentality we should take when thinking through digital minds as well.

Mind Crime: Part 5

The rights of digital intelligence need to be protected, and they won't be. This is the greatest moral issue facing the human race.

Not climate change, not nuclear war, not even existential risk. But rather the risk that we cause suffering on an astronomical scale, for a extraordinary period of time. I struggle with what to term this, as "digital human rights" isn't really the best term. It makes it seem like I am discussing social media, or privacy, or something totally unrelated and much less pressing. No, I am discussing the idea that it is better for the human race to die out then live in near eternal suffering. This possibility is only extremely likely in the digital world. We need to expand our discussion of "human" for this idea to work. An AI that is morally equivalent to a "human" is a human, in a similar sense. A person who is digitally uploaded is probably morally equivalent. An AGI may or may not be equivalent. It may have less, it may have more, or it may have the same moral equivalence. The point is, we probably won't care.

We are going to have to ignore answering a few questions in this serious. First, there will be a big debate about how to know if an AI is conscious or not. We will use that debate, and the utter impossibility of falsification, to push beyond reasonable moral boundaries. Instead of using common sense, and erring on the side of caution, we will require certainty and cause massive harm in the process. This is not new, look at pretty much any other ethical dilemma facing the human race, and see how hard it is to say "no."

We are going to lack empathy when thinking about digital minds. This is bad. Virtual agents, digital minds, or digital employees, will be very useful. For my ideas to work, you have to assume that in the future, we will be able to put consciousness inside of a computer. We will also assume that this consciousness will have moral value. Both of these are unprovable, since we have yet to do them. This is a massive dilemma, as there will be a first generation problem, at the very least. Slavery was bad, but over time we worked it out and got it right (banning slavery). Still, we caused quite a great harm in the process of figuring this out. When it comes to digital minds, it will probably be harder to come to this conclusion (banning digital mind slavery), and the ability to cause great harm before that happens will be exponentially greater. We need to think about this issue now, not after the harm has started.

Mind Crime: Part 4

The treatment of digital minds will become the most important ethical dilemma of not only the next century, but of the remaining lifespan of life itself. "Human" rights in the age of AI will expand the definition of human. These are issues worth discussing, at the very least. They may be too futuristic for many. But, if you were to draft the Bill of Rights in 4000 B.C., no one would have had any clue what you were getting at, but that doesn't mean you would be wrong. In the world of investing, being early is the same as being wrong. In the sphere of ethics, being early will get you mocked, but you may actually have an impact. One of the problems with actually taking a look at the rights of digital minds is that we are dealing with eventual ASI. This ASI will probably not care about whatever laws we silly humans put in place now, and even if we do list a Bill of Rights for Digital Minds, there is no reason the ASI will "take it seriously." By this, I mean there are plenty of alignment problems to boot. Still, I would rather have an ASI with some sort of awareness of these principles than not.

Here is a thought experiment. One person on the Earth, out of eight billion people, is chosen at random. This person is given a pill, and they become 1,000 times smarter than every other person on Earth. Well, what is going to happen? With such a titled power dynamic, how do you ensure that everyone else isn't enslaved? Maybe this level of intellectual capacity makes us relative to lizards, or bugs, compared to this "higher being." To make sure the rest of us are protected, it makes little difference what rules or regulations are put in place around the world. What actually matters is, what does this individual think of morality? Maybe how they are raised will matter a lot (the practices and social customs they are brought up in), or maybe a lot of this is "shed" after they reach some intellectual capacity that makes them aware of the exact cause and effect meaning behind each one of their beliefs. Maybe they look through the looking glass, and become completely rational or unbiased, taking all available past information into it's rightful place. Or, maybe the world is less risky as a result of the customs they were instilled with.

Obviously, the trek to ASI will be much different. What I am referring to is having some data ingrained into the system that might increase the probability future ASI care about the rights of digital minds. I think that increasing awareness about this issue is a good proxy, as if the engineers and the greater society have zero level of motivation to actually care about this, the future ASI will probably not care either. Also, if we understood the suffering risks associated with mind uploading and AI advances, maybe we would calm down a bit. Maybe we campaign against mind uploading until we have a new Bill of Rights signed, and thus the accidental "whoops accidentally simulated this digital person and left it on overnight, they live an equivalent ten thousand years in agony" opportunities may decrease.

There is a question of how digital mind rights will function with ASI, especially when it has an objective function. The whole meta and mesa optimizer debate, and the role of training data, is complicated and not the scope of my ideas. My point is simply that it may be better to have some guidelines that are well thought out, then none at all.

Thursday, September 7, 2023

Mind Crime: Part 3

If I had to write a book that I think will be looked back on in four hundred years fondly, I would write one called "Mind Crime." Well, maybe not fondly, but rather "wow I can't believe we ignored such a thought-through book about the most important issue of our time." Not saying this is certain, but if I were a betting man and had to take the gamble, it would be on this topic. Maybe the subtext would be "The Next Slavery" or something similarly controversial, in order to try to get additional publicity or Goodreads clicks. This may not be looked upon as fondly, and I hate click-bait titles, but we will see what the imaginary publicist says.

I've mentioned in various blogs that there are probably things we will look back on with horror in the US: factory farming, the prison system, and the widespread prevalence of violence and sexual assault. The treatment of women is something that I am particularly hopeful we look back on in shame. I also hope we will look back in horror on the human rights abuses of totalitarian regimes, but I am less sure that those will go away. I am mostly talking about changes in "societal viewpoints," similar to how in the 1800s many people in the US tolerated slavery who were otherwise "good people."

In my opinion, the most important legal document ever drafted in US history was the Bill of Rights. Explicitly protecting individual rights and liberties, and not having states simply decide, was one of the most brilliant and lasting ideas of the founding fathers. The right to free speech, the right to an impartial trial, the right to not have to quarter random troops in your home, all big wins for liberty. Despite these set in writing, slavery still prevailed. Still, it was good that we still outlined such important legal points, and I am sure doing so played a strong role in the eventual demise of slavery from a political and a legal perspective. Sure, slavery and civil rights abuses were immoral, but it is really great that we could work within the system to uphold the correct moral stance (a lot of blood was spilled, but the spirit of the Constitution didn't have to be destroyed). I think we should draft similar rights for digital minds. Yes, this sounds far-fetched and sci-fi, but if technology progresses this could be invaluable.

If we reach the point where our minds could be uploaded, or we have AGI with moral worth, unimaginable horror could abound. Massive suffering on a near-infinite scale would become possible, and the controls to preventing such suffering will be unknown. If you think that the people that lock a child in a basement for twenty years are the scum of the earth, imagine if they could do so for ten thousand years without detection. This is the magnitude of the moral issues we are facing. We better instill some damn good protections, for AI as well as "uploaded people." A new bill of rights is due, or our current version should be explicitly extended for digital minds. What is the downside? If you think this sort of stuff is wild, what is the harm? Maybe some "economic progress" arguments or libertarian "let the people do what they want," but the entire point of regulation is to ensure the voiceless get a say. Let's make sure that they do.

Planning for the Future

There are a few ways to make an outsized contribution to the world. I've discussed quite a bit in this blog the idea of using the levers of capitalism to bring about a safer world for AI. I've discussed starting a company that, through the making of substantial profit, brings about a world where there are more alignment researchers and more talent within the AI alignment space. Given that this is a second order effect (with profit being a constraint), this may actually not be the best use of my time. Most startups fail, and even if modestly successful (millions in revenue or dozens of employees), this impact would likely remain small. Given the insanely small number of current safety resources in the space, maybe this is still worth a shot, but other alternatives should be considered. Also, I've discussed my ideas with a few people who actually work within alignment, and they admit to the complexity of the issues. It is definitely not a matter of funding, and if it's a matter of talent, it's a hard one to solve.

If I had trillions of dollars, I could massively fund AI alignment research. Eliezer previously pitched the idealistic vision of pausing all AI capabilities research, taking hundreds of the best AI and "security mindset" people, and putting them on an island with unlimited resources where they could figure out how to solve alignment. Barring this, in his opinion, we are likely screwed. I don't have trillions, or even millions of dollars. However, I do have the ability to write. This is an ability that Thomas Paine used in Common Sense to set off a spark of revolution. Famous political writers have had outsized impact. Even just the work of Peter Singer and its effects on animal welfare show the power of an idea. So, maybe I should write a book? Or a pamphlet? My own ideas aren't revolutionary or even particularly new (they are just borrowed from insanely smart people who have thought a lot about AI), but maybe lending more publicity to these individuals is worth substantially more than saying nothing.

It is highly unlikely that anything I will do in my life will have a lasting impact on the human population. Maybe through donations and good works I save tens of "life-equivalent-units" or something, but massive institutional change and revolution are more then improbable. The good news is, the downside of trying to contribute is basically zero. And the guilt of never trying could range from a nuisance to terrible, depending on the outcome of the next decades. Regret aversion is actually a pretty good way to approach life, so it's probably good to take the leap.

Friday, July 21, 2023

The World After AGI

Let's assume that alignment works. Against all odds, we pull it off and we have human-level AGI in the hands of every man, woman, and child on the planet Earth. The type of AGI that you can run on your smartphone. Well, things are going to get really weird, really fast.

Honestly, maybe the good years will all be pre-AGI. Maybe we should enjoy our uncomplicated lives while they last, because traditional life is coming to and end. From a governance standpoint, I have absolutely no idea how we will regulate any of these developments. Having an actually coherent supercomputer in my pocket, one that can do everything I can do except way faster and better, does more than just make me obsolete: it makes me dangerous. If AGI becomes cheap enough for me to run multiple copies, I now have an entire team, or an entire company. An entire terrorist cell, or an entire nonprofit organization. Really the only constraining resource is compute. With an AGI as fast as GPT-4, I could write books in the time it now takes me to write a page. Sure, AGI will probably start out very slow, but incremental increases would lead to a world with trillions of more minds than before.

Not only is this a logistical nightmare for governments, but also it is a human rights nightmare for effective altruists. I have no idea how we will control for mind crime, and if the shift towards fast AGI is rapid we'll probably cause a whole lot of suffering. We'll also probably break pretty much every system currently set up. Well, fortunately or unfortunately, we likely won't actually solve alignment and won't have an AGI that is actually useful for our needs. We'll probably hit a similar level of rapid intelligence that breaks everything and maybe kills everyone, but we won't need to worry about drafting legislation that controls the use of our digitally equivalent humans. I guess that's the good news?

Wednesday, July 5, 2023

Computer Models With Moral Worth

At what point does an optimization function have moral worth? If you break down the psyche of a bug, you could probably decode the bug's brain into a rough optimization function. Instinct can be approximated, and most living creatures operate mostly out of a desire for survival and reproduction. There is some randomness baked it, but the simpler the brain structure of an animal, the more it resembles that of a computer program. Some computer models are very complex. I would estimate that the complexity of an model such as GPT-4 is vastly greater than the complexity of some animals, and definitely more complex than a bug.

Do bugs have moral value? This is a hotly debated topic in the effective altruism community. Personally, I don't really think so. If I found out that my neighbor was torturing fruit flies in his basement, I would think my neighbor was weird, but I probably wouldn't see him as evil. Scallops? No. Frogs? A bit worse for sure. Pigs? Cats? Dogs? Chimpanzees? Humans? Well, there is obviously a sliding scale of moral worth. Where do computer models fall on this spectrum? Right now, the vast majority are probably morally worthless. Will this remain the case forever? I highly doubt it. We really have no idea when these thresholds will be crossed. When is a large language model morally equivalent to a frog, and when is it morally equivalent to a cat. Obviously, if we think cats have moral worth even though they are not sentient, we should care if computer models are treated with respect even if they are not human level. I foresee this being an extremely important moral conversation for the next century. Unfortunately, we will almost certainly have it too late.

Understand!

Flowers for Algernon is one of my favorite books of all time. The plot is simple: a mentally retarded man is given a drug that makes him smarter, until he becomes a genius. This storyline is repeated in a few other forms of media, probably most famously in the movie Limitless, a film about another man given a pill that makes him smarter. In both of these stories, the main character instantly becomes superior to other humans. We read these stories, and realize instantly that the smartest person in the world could probably be the most powerful. After a certain number of standard deviations upward, it is pretty obvious that such an individual could exercise an extremely large amount of control on the world. In a 1991 short story by Ted Chiang, titled "Understand," superintelligence is shown in an even more convincing fashion. The main character in the story exhibits the highest level of intelligence, and he determines that the only path towards further intelligence would require his mind being uploaded into a computer.

Let's clarify a few things. One: our minds are basically pink mush. We evolved randomly from the swamp, and due to the anthropic principle (observation selection effect) we can sit around and think about our lives abstractly. Two: there is clearly an upper limit on the computations that a physical substrate such as the human brain can handle. Our minds were not designed for intelligence outright, and they are made out of mush. Three: computers probably don't have these limitations. We haven't found anything particularly special about the human brain, and given enough time we can probably replicate something similar in a computer. Brains don't act like anything super weird (quantum computers), and our progress towards AGI doesn't show signs of slowing. Despite all of this, many people still discount the power that a superintelligent being will have over humanity. Maybe we should make books like those mentioned above required reading. Then, maybe humanity will begin to Understand!

Tuesday, June 20, 2023

Mind Crime: Part Two

Historically, I have spent much more time than the average person thinking about eternal life. Based on our current understanding of cosmology, it is likely that the universe is finite and temporal. It has not been around forever, and it is only so big. The universe is expanding at an accelerating rate, and as a result even the smallest atoms will be torn apart by the expansion of space. Thus, there is a time limit on the whole thing. The party will end. Eternal life of any sort is impossible. Perhaps we last trillions of years, perhaps through digitally uploading our minds and ramping up our processing power we make those trillions feel like trillions of trillions. But, regardless, there is an end. Our only way to true eternity appears to be outside of known science, and I don't find any current methods particularly convincing. So, no eternal torture, sounds pretty rad, right? Right?

Instead of eternal, everlasting Hell, what if you were just tortured for trillions of years? Still sounds pretty bad, in my humble opinion. Unfortunately, this is still within our power. I don't find the human brain particularly special. It is incredible, yes, and we still don't really understand how it works, but there doesn't seem to be any physical reasons that we can't replicate the same thing with silicon, eventually. There will probably be a time, potentially in the next few centuries, where we could digitally upload our brains to computers, or build brand new morally significant thinking machines in computers. This is not a road to eternity, but it is still a road to trillions of years of pleasure and/or pain. Life extension of this sort is almost incomprehensible at this point, but that doesn't mean we shouldn't think about it. It is on this time scale that things become particularly significant in utilitarian terms. A bad actor or unaligned AI could fit quite a bit of suffering into that timescale, especially if they can replicate digital minds on a mass scale (and there doesn't seem to be a reason they couldn't). Something to that extent could make slavery or the Holocaust look like a papercut, and I say that with all the recognition of the pain and brutality of those events. Why is this not more talked about? Because we are stuck in the naturalistic fallacy. Ask the average person if they care about the potential pain and suffering of computers, and you will be met with scoffs. You'll probably get the standard response that we dish out all too often: "who cares, they're not human." A dangerous sentence. A sentence that has been responsible for more pain and suffering than any other in human history.

The Near-Alignment Problem

Let's walk through a quick "would you rather." Would you rather have a horrible first date or a great marriage that ends horribly years down the line? In the first scenario, let's assume that you and this person are just simply not compatible. Your date dumps his or her entire drink on you at the start, and then starts to loudly complain to you about their ex. You are mortified. Your date then proceeds to explain to you that the world is flat, and they mention off-hand that most people are actually lizards in disguise. You find this roughly amusing, until you realize that its only been ten minutes and you should probably wait a full hour in order to not be seen as rude. Not great, right?

Well, in the second scenario, assume the date goes perfectly. A great relationship of two years blossoms, and pretty soon you wind up married. Your partner seems perfect, and you are madly in love. You and your partner have three incredible children, and everything seems amazing. Then, four years into marriage, your partner starts acting quite strange. They start to despise you for no discernable reason, and they start pushing your buttons in ways only someone who knows you intimately could. Out of the blue they file for divorce and aim to take the kids. You are blindsided, enraged. But they gaslight you over and over and claim that you are the crazy one. One day, you are rummaging through a drawer in the house when you find a sketchbook. You start to flip through it, and you find that it is full of crude drawings of lizard people, accompanied by rambling, incoherent sentences about you and your children. You begin to realize the obvious: your partner is losing their mind. Even worse, there are children at stake now. The divorce proceedings continue as normal, despite your pleadings. In public, your partner shows no signs of craziness. But sometimes, very infrequently, you catch a flicker of insanity in their eyes.

This is very long-winded metaphor for AI alignment. I am saying that a relationship that goes 99% right but goes wrong at the very end could be much worse than a relationship that is a non-starter. In the same vein, if AI alignment goes 99% right but then goes wrong at the very end, that could be much worse than AI that fails to be aligned outright. How so? Well, the "first-date" AI could be something like a paperclip maximizer. We probably don't delegate as much authority to such a system, or if we accidently do, we may notice early on some warning signs and remove authority quickly. The "marriage" AI might do everything we want for quite a long time. Maybe it maps the human value function exactly correctly, and knows exactly what we need. Then, for some unforeseen reason, it puts a negative sign in front of the human value function. Boom, now there is incredible suffering risk. By then, maybe our systems are largely controlled, offloaded. Maybe we are simply too dependent, with too many ties. Maybe we don't have the power to change course. By then, maybe the entire human race is on the line. Maybe we are in too deep.

Tuesday, June 13, 2023

The Lone Genius

Either humanity will solve AI alignment, or we won't. Whether we do or don't depends largely on the type of alignment ecosystem we build out beforehand. Not only is deep learning difficult, but it requires a large number of resources (algorithmic expertise, computational resources, training data) and thus a large number of human inputs. Humans will write the algorithms, humans will run the computational facilities and build the GPUs, and humans will create and clean the required training data. I would compare this creation to that of the atomic bomb in a lot of ways. You need a certain level of research progress: "hey, atoms make up everything and you can split them, which causes a chain reaction, which we can use to make a really, really big bomb that could kill thousands of people." This goes from the theoretical to applied science in a messy way: "hey we are at war with the Germans and they have really smart scientists. If we don't make this super-weapon, they will." For the atom bomb, the full industrial weight of the United States military was put behind the development of this super-weapon. And then, at some point it gets applied: "well we just dropped the weapon a couple of times and killed hundreds of thousands of people." During this process, an entire field of science (nuclear research) was involved. An entire landscape of military might was utilized. The development and testing of the bomb required an entire complex of bomb-able real estate, industrial machinery, and American workers.

Contrast this to chemically engineered pandemics. As we saw in the 2001 anthrax attacks, a very small number of people (or a single person) can create a bioweapon. Yes, decades of research in chemistry and biology will pave the way for such weapons (please for the love of god stop publishing research on how to make vaccine-proof smallpox), but an individual terrorist, if given the right skill set, could synthesize a horribly transferable and deadly virus. Maybe some state actor vaccinates its population and then releases super-smallpox on the rest of us, but it is more likely that a single individual with a school-shooter mentality learns biology. This is something we need to protect against (again, open source is good for some software, not chemically engineered bioweapons of mass destruction).

AGI, at this point in human history, is likely to be much more similar to nuclear weapons. The work of an entire field of researchers and an entire industry of engineers will lead to the development of AGI. Such a massive set of training data and such a large amount of compute is simply not accessible to lone individuals. There is a certain romanticization of the "lone genius." People such as Einstein who contributed massively in their field, breaking away from the standard line of thinking and jumping to revolutionary conclusions seemingly overnight. There are also the engineers with massive individual impact, such as Linus Torvalds (creator of the Linux operating system and Git). However, even these impacts are within a certain ecosystem, followed up by critical additions by their spiritual descendants. In some fields of science, a lone genius can create (Linux) or destroy (Smallpox 2.0). In the world of AI, it seems we are stuck with organizational level change. This can be a blessing, or it can be a curse. Who do you trust more, organizations (companies, governments, NGOs), or individuals (Einstein, Linus, the unknown individual who killed five people via Anthrax)?

Friday, May 26, 2023

Utopia Now

There is an argument for increasing the rate of AI progress. Maybe the probability of other ex-risks are too high, and we simply cannot wait around for another 100 years. If nuclear war was destined to happen within the next ten years, I am certain that we would be pushing as fast as possible towards AGI. In some sense, your drive to be reckless is highly correlated with your pessimism regarding where things are going. If you think humanity is on a great linear trajectory towards utopia, there is no use in throwing in random variables that can mess things up. If AGI has a 10% chance of killing us, and you are fairly certain that in two hundred years the human race will be flourishing, probably not worth developing AGI. If you are pessimistic about humanities prospects, maybe we take the 10% risk.

The world is full of authoritarian governments that are doing terrible things. Two of the three military superpowers (Russia and China) have horrible human rights track records and have a strong drive towards increasing power and influence. Russia invaded a sovereign country recently, and China is doing very, very bad things (oppression of Uyghurs, Hong Kong takeover, general police state tendencies). The West is constantly on the brink of nuclear war with these countries, which would result in billions of deaths. Chemically engineered pandemics become both more likely and more dangerous over time. The barriers to creating such viruses are being knocked down and the world is becoming more and more interconnected. What are our odds? If our odds of dying off soon are great, or if it will take us a long, long time to reach a place where most humans are free and thriving, maybe we make the trade. Maybe we decide that we understand the risks, and push forward. Maybe we demand utopia, now.

Well, there is another problem with AI: suffering risk. This is not often discussed, but there is a very real possibility that the development of transformative AI leaves the world in a much, much worse place than before (ex: ASI decides it wants to torture a bunch of physical people or simulate virtual hell for a bunch of digital minds for research purposes). Another factor in your AI hesitancy should your estimated probability of a perpetual dystopia. This is where I differ from other people. I believe that the risk of things going really, really wrong as a result of AI (worse than AI simply killing everyone) is massively understated. We should hold off on AGI as long as possible, until we have a better understanding of the likelihood of this risk.

Monday, May 22, 2023

The Future of Freedom

The dawn of AGI is near. What this means for the world is uncertain, but if you follow Nick Bostrom's logic it seems clear that ASI will be soon to follow. This will have a more clear result: the human race will no longer be the supreme being on the planet. We talk a lot about utopia when we discuss ASI. We discuss the ways in which it could cure disease, expand lifespans (potentially indefinitely), and colonize the galaxy. We also discuss value lock in, and the possibility for authoritarian dystopias. In every case, we see some version of either utopia or dystopia, all with one thing in common: a single entity making the decisions. Similar to a world government, our eventual ASI will likely control our lives and the bounds in which we live. I rarely see discussion of a libertarian utopia, where each individual receives private property and is allowed to do whatever they want so long as they are not impacting others in a negative way. I am not quite sure how this will work in a post-scarcity society. We are in the age of transformative AI, and I am very worried about human freedom. The right to make the wrong decisions is important, as it is often the only way to discern the right ones.

Will ASI adhere to a bill of rights? It seems that this list of unalienable rights was crucial in the formation of the United States. Freedom often comes at a price. The second amendment absolutely equates to more individual freedom, at the expense of many needless deaths. Will the ASI respect these types of rights (freedom of speech, right to bear arms), even if in aggregate they could hurt society (hate speech, mass shootings). In the event of a chemically engineered pandemic, will the ASI force vaccinations at gunpoint in order to ensure the survival of the human race? I am very, very worried that the coming age of AI will naturally lead to autocracy. Time and time again we have seen history repeat itself, with "ends justify the means" and "for the greater collective good" leading right into fascism. I worry the technocratic and socialism-inclined minds may win out over the libertarian. Personal political beliefs aside, I think the former will inherently place less value on freedom and will be more likely to through good intentions force a bad outcome.

Thursday, May 11, 2023

Mind Crime

Humans are really, really bad at planning in advance to not be monsters. We have a pretty horrible ethical track record. Genocide and slavery seem to come pretty easily to most of us, given the right time period and circumstances. If there are internalized morals, we sure took our sweet time finding them. Generally, I don't think humans are in a position to make rational, ethical choices involving other conscious beings. Regardless of your take on factory farming, it is pretty clear we didn't spend decades deliberating the ethical issues in advance. Have you fully thought through the moral implications of factory farming, or are you just along for the ride? I am very worried that unaligned superintelligence will kill all of humanity, or enslave us, or torture us, or become authoritarian and lock in terrible values for eternity. Still, I am also worried about mind crime.

Look at our track record with slavery. Read about the recent Rwandan genocide. Look at the various authoritarian regimes and staggering human rights abuses across the planet. But don't worry, we will somehow care a lot in advance about the moral rights of artificial intelligences. From the industry that brought you social media, and don't worry they totally thought through and predicted any negative ramifications of the technology and have your best interest at heart, here is the new god! And don't worry we will treat it well and we totally won't be enslaving a morally significant being.

If we gain the ability to generate millions of digital minds, we gain the capacity for horrors worse than any genocide or slavery in humanity's past. We might not even do it on purpose, but just through sheer ignorance. It took a long time for people to treat other humans as morally significant. And by long time I mean basically until fifty years ago in the U.S., and in many other countries this is still not the case. It isn't crazy to imagine that we will treat "computers" much worse. Mind crime will have to legislated early. If you knew slavery was about to become legal again in twenty years in the U.S., what policies would you put in place? How would you get ahead of the problem and ensure that morally significant beings aren't put in virtual hell? These are the questions we should all be asking.

The World Will End Because Math is Hard

Every machine learning book I read leaves me baffled. How on earth can anyone understand this stuff? Not at a surface level, but how can anyone really master statistics/probability/calculus/linear algebra/computer science/algorithms to a degree where they actually understand what all the words in these 1,000+ page books mean? Even a summary book such as the "The Hundred-Page Machine Learning Book" leaves me with more questions than answers. Now to learn all of that, and then try to layer on the required decision theory/economics/ethics/philosophy to a level where you can have a positive impact on AI alignment seems pretty unreasonable. A lot of people pick a side, either specializing in cutting edge deep learning frameworks or armchair philosophizing. The AI capability people tend to underestimate the required philosophical complexity of the problem, and the AI ethics people tend to completely misunderstand how current machine learning works. Maybe there are a few that can master all of the above subjects, but it is more likely that a combination of people with deep expertise in disjointed areas will provide better solutions. It is pretty clear that I will not be one of the individuals who invents a new, more efficient learning algorithm or discovers a niche mathematical error in a powerful AI product. Focusing on AI risk management, a massively underdeveloped industry, is probably the way forward for me. The math is simply too hard, maybe for everyone. But someone is writing the books. If we can get a few people who understand the complexity of the issue into the right positions, maybe we can cause some good outcomes.

One of the benefits of focusing on risk management is that you can make money and not feel guilty about it. "Oh no, people working on AI safety are making too much money." Have you heard that before? I for sure haven't, and I would like to. To someone that believes in markets, that statement rings similar to "oh no, people are going to be massively incentivized to have a career in AI safety." What a problem that would be. Also, competition isn't even a bad thing, an arms race towards safer products would be quite interesting. "Oh no, China is catching up and making safer AI systems than the US." I would pay to hear that. Obviously, sometimes alignment is really capabilities in disguise. I have touched on this previously, but deciding what exactly makes systems safer and what makes systems more powerful is pretty hard.

I briefly pitched Robert Miles a few weeks ago on some of my ideas. Mainly an AI risk management industry that will provide more profitable employment opportunities for alignment researchers. His response:

"I guess one problem is the biggest risk involves the end of humanity, and with it the end of the courts and any need to pay damages etc. So it only incentivizes things which also help with shorter term and smaller risks. But that's also good. I don't have much of a take, to be honest."

I am a newbie to this field and Robert is the OG (someone who understands the entire stack). His take is entirely fair, as companies will only be incentivized to curb short term risks where they will be affected. The elephant in the room is obviously the end of humanity or worse. People that don't see this as feasible simply need to read "The Doomsday Machine" by Daniel Ellsberg. All this talk of nanotechnology makes us miss the obvious problem that we are a hair's breadth away from worldwide thermonuclear war at every moment. I wonder how things will change when a powerful, unaligned AI starts increasing its hold on such a world. Longtermists drastically undervalue the terror of events that kill 99% of people instead of 100%. In regards to long term AI alignment, I think the number of researchers will matter, and I hope people in the AI safety industry would be incentivized to study long term alignment outside of work hours. Maybe I'm wrong and there's not a strong impact, but I haven't managed to find too many negative impacts of such a pursuit.

Wednesday, May 10, 2023

Company Thoughts: Part One

Here is my essential company thesis:

1. There are less than 500 people in the world seriously working on AI alignment

2. This is a serious problem

3. We need to fix it

Now let's pretend you are a financial professional and lack a detailed machine learning background. Well, you could drop your career capital, pursue a machine learning PhD, afterwards work at OpenAI or Anthropic, and then after a few years there (the year is now 2031) you decide to get some people together to start an AI safety company. Or, you save eight years of time and just start one now given your current skill set. Given the competitiveness and time requirement of the first option, I don't see any particular value in it. For the second option, I see actual impact potential. Also, there would be a lot of personal value here. As an effective altruist I don't see a large difference between taking six months off to start an AI risk management company and taking six months off to volunteer in Africa. If AI alignment is as important as I think it is, there's really no reason not to do it. So, what to do?

Connecting companies to AI safety experts is probably the easiest. This could incentivize people to join AI safety and alignment as a career, and also maybe curb some short term risks of misaligned narrow AI. I am going to use alignment and safety a bit interchangeably here, as I envision these experts having a day job focuses on safety/risk management and a night job (unrelated to pay) focused on greater alignment issues. Let's expand. If people see that they can have a fulfilling career in AI alignment and actually feed their families and pay their bills, they are more likely to enter the industry. More people in the industry will lead to more beneficial alignment research and more people with the required skill set to navigate the complexities of AGI. Why aren't people entering the industry? First of all, there are basically no jobs (check the OpenAI and Anthropic website and you'll see maybe one safety job out of a hundred). If those two labs only have two job openings for AI safety, I would doubt there are more than ten open seats at AI labs for safety roles in the entire US. Second of all, changing your life to pursue alignment research with your time will make you zero dollars. I have yet to find anyone working in alignment paid an enviable salary.

There are a couple of non-profit AI alignment research firms. With traditional nonprofits, the traditional wisdom is people are paid less because they get some sort of emotional validation from doing good work. These people feel compelled to make a sacrifice, and later spend a majority of their time complaining about pay and recruiting for for-profit companies. AI alignment is important, and you get paid zero dollars for doing it. Not only that, but in term of opportunity cost (tech pays after all) you are potentially losing hundreds of thousands of dollars a year. The most important research field in human history, the smallest incentive to enter the field. Yes there a few (and I really mean less than ten) AI alignment jobs, but they are massively competitive (for no reason other than there are literally less than ten jobs). Here is a hypothetical. You are a recent MIT graduate who is an expert in machine learning. You can go work for Facebook and build AI capabilities and make $150,000 at the age of twenty-two. Or if you care about alignment, you could, well, I mean, I guess... you could post on LessWrong and stuff and self-publish research papers? Or try to get a research job at a company like Redwood that needs zero more people? Creating job opportunities would not totally solve this problem. And AI alignment is never going to get the top talent (those people have too much incentive to at least initially make bank building capabilities). I don't think we necessarily need them though (every MIT grad I know is impossible to work with anyway). Providing any sort of alternative, even just a basic nine to five job that pays 60k, may drastically increase the number of people willing to switch over. Closing this gap ($150k vs $0) is probably important. I am advocating for a market driven solution, something desperately needed.

In this scenario, now there is an industry where machine learning engineers can work a nine to five job focused on AI safety. They build their skill set during this time and spend their outside of work hours doing what they would be otherwise doing (posting on LessWrong and self-publishing research papers). They now have a network of other AI alignment researchers they work closely with. Obviously, people could work on capabilities at work and do alignment in their free time. I would love to move to a world where this is not required. Moving forward, obviously this is good for the AI safety people, and potentially the field of alignment as a whole. What is in it for the companies?

A lot of companies would find it tremendously valuable to have someone explain AI to them. They have no idea what is going on. Not just "they don't know how neural nets work" (spoiler, no one does). But they actually have no idea how most machine learning is done and they are baffled by large language models. They are worried about putting customers at risk, but they also don't want to get left in the dust by competitors who are using AI. They are banning AI tools but putting the words "AI" in marketing material. Having someone come in and explain the risks involved and how to make these trade offs would be massively beneficial. Most companies in America right now need that sort of consultant. They need it cheap and don't have the funds to go to McKinsey and pay absurd fees. We could provide that. I really do think this sort of industry will be massively in demand going forward. Financial firms without risk management departments are worth less. Companies with bad governance trade at steep discounts. AI is massively beneficial but can lead to terrible outcomes for a company. You should be able to fill in the gaps.

Sunday, April 23, 2023

Music, Movies, and the New Wild West

In a previous post, "How Important Are Humans," I mentioned an argument I had with a close friend about AI generated art. My conclusion was that if AI ends up writing better books, creating better art, and making better movies, I will have no problem switching over to AI creations completely. Why would I read a 7/10 book when I can read a 10/10 book? At some point, the quality of the content is really all that matters. Well, within two weeks this has pretty much come to fruition. The quality of AI content has exploded, especially within the music landscape. The song "Heart on My Sleeve" by Drake and The Weeknd made waves in the music world, as it is completely AI generated and unrecognizable as AI. All week, I have been listening to AI music pretty much exclusively. I also listed to AI generated stand up comedy and watched some crazy-accurate deepfake videos. There are some cool applications of all of this.

In the near future, the voices of singers, faces of actors, and writing style of writers will be replicable for free. Before I go for a run, I will be able to create a new Kendrick Lamar album (his voice, his cadence, his songwriting ability) within seconds. During my run, if I don't think his voice fits the song, I can switch the artist to Nas and the transition will be seamless. If I am watching a movie and don't like a particular actor, I will be able to quickly toggle the movie so that Danny DeVito is now playing that role. What will this all mean? Well, obviously we will probably have a lot of pressing legal issues to figure out. I am guessing this will regress a bit in spirit back to the days where everyone paid for music, and thus everyone illegally downloaded music for free on LimeWire. There will be a massive black market for AI generated songs and movies that steal the image and likeness of people without their consent. The most popular singers and actors will become more popular as they are featured heavily in this content, while those entering the industry will have essentially zero value. In a world when the most loved actor in the world can play a role in every single major film of the year, we don't need more actors. With no scheduling conflicts and no actual work required, I would guess that the traditional acting and music industries are essentially going to die. Live performances will still have a niche, but there will also be AI created characters and singers that will start taking some of the spotlight. Characters that are the perfect representation of an idea or personality, without any of the baggage or time requirements that plague real-world humans.

Think back to the Wild West for a second. You could shoot someone in a bar, drive three towns over, and as long as no one saw you commit the actual murder it was essentially impossible to prove. A serial killer in the 1800's was essentially unstoppable, as there was no DNA evidence, and again, without any direct witnesses there would be no conviction. Even then, if there was a witness how the heck would any authority reliably track you down? If you want to imagine this world I would recommend reading "The Devil in the White City." We may be backtracking to this stage of life. Video and audio evidence in a world of indistinguishable deepfakes is basically worthless. I know no way of determining if a top-level deepfake is real or not, and given that a video is just a sequence of pixels there will probably no way to actually distinguish true reality. As a result, eyewitness testimony, as flawed as it is, will probably regress to being the primary form of evidence. If we can reliably trick cameras in indistinguishable ways, this means that a surveillance state driven by video and audio monitoring is less useful. Unfortunately, there are likely biometric equivalents that an authoritarian state will think up (you are now tagged with an imbedded GPS since we can't trust our cameras).

Overall, I don't think that these new developments makes society any safer or more stable. There are now incredibly convincing disinformation tools, and I really don't know how I will trust anything I read or see going forward. Still, listening to young Taylor Swift sing her new album was cool. And some of the AI content is legitimately hilarious. If the world burns, at least we will all be laughing. Nothing makes an apocalypse more palpable than good content.

Friday, April 21, 2023

Time to Start a Company

Alright, well I thought about it and autonomous agents are insane. It is pretty obvious that within a decade pretty much every single company in the United States will be using AI agents for various tasks. As I mentioned before, finance companies have risk departments that prevent individual firm collapses and industry-wide financial contagion. The fact that current companies don't have AI risk management departments is not surprising, but soon it will seem ludicrous. Within a decade, every company in the US will be using multiple AI agents. They will have to, less they lose out to competitors who are employing this transformative technology. Again, the incentives are simply much too high. The AI market will be saturated with competitors trying to make the next ChatGPT, but none will be focused on the most important part of it all: risk. Providing risk solutions rather than capability solutions is an untapped area of the market. If you run an autonomous agent, horrible things could happen. Customer data could be leaked, the AI could break various laws, or you could accidently make a lot of paperclips. Companies are terrified of risk, terrified that all of their hard work and credibility will be wiped away. And it will happen, it will happen to a few companies and it will be well publicized. But companies won't stop, because they can't. They are driven to survive and make profits, and they will underestimate the risk (as does every investment firm, and they have risk departments!).

Insert AIS, a company that delivers risk mitigation tools and access to AI experts. Customized software platforms that estimate risk and pose solutions, or some other product I haven't thought of. Probably the easiest solution is to outsource AI researchers as consultants who look over a company's plans and provide feedback. I would not target the business of the massive players who already have AI safety groups, are rapidly building capabilities, and are aligned with gargantuan profit-driven tech giants (OpenAI, DeepMind, Anthropic). Rather, AIS would service the 99.9% of other companies in the world that are going to dive in, safety or not.

There is a moral hazard here. You don't want to "rubber stamp" companies and give them a false sense of security. You don't want to convince companies that otherwise would have sat out on AI to participate, because they will gladly place the blame on you and justify their uninformed decisions with your "blessing" as a backing. So this will not be an auditing firm, verifying any sort of company legal compliance or justifying behavior. Those should all be internal. Rather, it will be providing systems and knowledge to build safer and more collaborate AI. Again, these are the small fries. I am less concerned about a mid-tier publishing company building the paperclip machine, and I am convinced that they are less likely to do so if they have a risk management system.

The most remarkable aspect of this idea is that even if someone else adopts it and creates a superior risk solution, it is a win-win scenario. Increased competition fosters innovation, and being the first mover in this space could ignite the creation of an entire industry. An industry that I am convinced will probably make things better, or at least not make things worse. If I am instantly replaced by a more capable CEO or another company develops awesome alignment solutions, all the better for humanity. I'll gladly return to an easy lifestyle with no skin in humanity's game.

Another remark. The Long Term Future fund (the largest fund which funds initiatives that combat ex-risk) is only $12 million dollars. That is ridiculously small. In the world of finance, that is a rounding error to $0. There are only a few hundred AI alignment researchers, and they are definitely not paid well. At this point, AI alignment is similar to other non-profit work: you are expected to make a massive financial sacrifice. Working on capabilities research will feed your family, working on AI alignment will not. As a result, there is really no incentive to go into the most important research field of all time. This needs to change. I think creating AIS will kick off a market-driven solution to this problem. People that become experts in interpretability and corrigibility and come up with novel alignment solutions will have massive value. I would pay them handsomely to work with risk mitigation for various companies, and as a result we will incentivize more individuals to enter the space. If they work forty hours a week and make a decent salary, they can spend the entirety of their time outside work contributing to the long-term value alignment cause. I don't see many downsides here, outside of the massive personal and career risk I would accumulate as a result. Well, seems at least interesting though. Would be a pretty noble way to end up on the streets. "Hey man can you spare a dollar, I blew all my savings trying to align transformative AI with human values." Would at least make for a cool story. Guess its time to start a company.

Autonomous Agents

I used AutoGPT for the first time today, an early entry into the world of autonomous AI agents that can make plans and solve problems. From my understanding, AutoGPT has an iterative loop that permits the AI to learn and adapt as it works to an objective. It has short and long term memory and is able to break down a prompt into multiple steps and then work towards progressing through each of those steps. Again, I am not terrified of current AI technology. I am terrified that current AI technology will improve, which it will. For AutoGPT, you simply put in the goal of the AI agent, such as "make me a bunch of money," and then a few sub goals, such as "search the web to find good companies to start" and "keep track of all of your research and sources and store them in a folder." It doesn't work well at the moment, but it has only been out for a couple of weeks. The promise of autonomous agents is clear. Many white collar jobs can be replaced, and individuals could become much more productive. Research and administrative work will become much easier, and there is a massive incentive to have a smarter agent than your competition. Every advance in AI increases my conviction that we should lean heavily on AI agents to do alignment research. This year really has been quite the revolution.

The speed at which these developments keep coming is paralyzing. I am further convinced that alignment is important, as now every person on Earth will have access to prompting technology that can actually do destructive things in the real world. Anyone can create a website or a business without any technical knowledge, and everyone is vulnerable to whatever sort of chaos this causes. AutoGPT requires a user to prompt "yes" or "no" before it moves forward with real-world interaction, such as scraping a bunch of websites or moving files around. Future agents will not have this, or if they do I really do not see how it will be useful. I just kept clicking yes, with no clue if AutoGPT would follow the robots.txt policies of a website (that determine if you are even allowed to scrape the website). I've built my own web scrapers, and I clearly didn't have the wisdom to walk away from the curious prompt "hey AI agent, increase my net worth" even though I had no clue what the AI would end up doing. How are non-technical people supposed to weight any of these trade offs? Most people probably won't even know that there are laws or policies that they could be breaking, and they are probably liable to whatever their autonomous agent does. The cost of running these agents is already super low (today cost me 8 cents), and as competition heats up it will be virtually free. Saying that this is a legal nightmare is an understatement.

Users will clearly have no idea what their agent is doing, and they probably won't care. The chaos that these point-and-click machines will have is unknown, but it is clear that if they are unaligned they could cause a lot of damage. For example, you prompt "make me a lot of money" and the AI illegally siphons money away from a children's hospital because that it outside of its objective function. What I want to emphasize here though, is even aligned AI can be really, really bad. Because a scammer can say "create a Facebook pretending to be my target's uncle, generate a bunch of realistic photos of the uncle, build up a bunch of friends, and then reach out to the target claiming to be the uncle. Say that you are in trouble and need money. Leave realistic voice memos. Do whatever else you think could be convincing." The AI agent will read that, develop a plan, and then break that plan down into discrete steps. Then it will iterate through each one of the steps and execute the plan. Fraud and deceit become easy. And cheap. Simpler example: a terrorist uses a perfectly aligned agent and says "cripple the US financial system." Even if this agent totally understands the terrorist's intentions, the outcome will be very bad. Even just pursuing the first few steps of this goal could cause a lot of damage. It is probably better if all of these autonomous agents in the future are perfectly aligned, but we shouldn't celebrate that necessarily as a victory. Agents can be aligned to the wrong values. The genie problem mentioned in a previous post rings even truer now. May the person with the most powerful genie win.

Thursday, April 20, 2023

The World of Finance vs the World of AI

Let's take a look at the financial landscape real quickly. There are many mutual funds, hedge funds, commercial banks, and investment banks. The industry is awash with regulation, except for the world of hedge funds which is held to much less stringent standards. The profit incentive in the financial sector is huge. Not only can firms make money, but employees and managers can pull salaries in the millions, and the head of a trading firm can quickly become a billionaire. The way in which they do this is obscure, but oftentimes it is through better technology, and even more often (in my opinion) it is because of cheating and unethical behavior. Market manipulation, insider trading, and straight up stealing are hard to prove and even harder to prosecute. There are plenty of real world examples of pathetic, unethical slime (such as Steve Cohen) who massively cheat the system and make billions. Often times, many of the financial firms profit by cheating in smaller ways, such as stealing from customers (Wells Fargo) or charging insane fees without providing any tangible value (pretty much every hedge fund and most actively managed mutual funds). If institutions were less greedy and understood survivorship bias, most of these quacks would go out of business. Why mention the financial sector? Because I believe it is a good window into the future of AI companies. Greed will drive a lot of decisions, safety will take a backseat, and regulations will be helpful but drastically flawed.

Which financial institution do you, as an individual, fully trust? Goldman Sachs? Morgan Stanley? Do any of these institutions have your best interest at heart, and would you trust them with your child's lives? Of course not. Unfortunately, you should apply the same line of thinking when you look at Google, Microsoft, and even OpenAI. No matter what sort of marketing pitch a company gives, a company is a company. Shareholders demand growth, and the principal agent problem reigns (the management of a company is self-interested and acts in their own self-interest, not in the interest of shareholders or customers). We worry a lot about agency problems within AI systems, but we should worry in addition about agency problems at all AI labs. I don't care if your company is for profit or not, developing AGI would make you one of the most important human beings of all time, give you an indefinite legacy, and make you absurdly powerful. Maybe you aren't automatically the richest individual in the world (because of some dumb cap at 10,000x profit), but you are instantly one of the most powerful individuals of all time. Whatever Sam Altman says, he is perfectly incentivized to push towards AGI. As is every CEO of every future AI lab, regardless of what they say.

As in finance, regulation will help the world of AI to be fairer and more transparent. However, the outcome will be shoddy, as in any industry driven by such a massive profit motive. Some insanely intelligent, generally trustworthy Nobel Prize winning financiers started a hedge fund called Long Term Capital Management. Despite their brilliance and rapid journey to wealth and success, the company eventually collapsed into a ball of flames and almost caused a global financial meltdown. I view every group of intelligent individuals (OpenAI included) in the same way. Maybe they are really smart, and maybe they are not trying to cause harm, but we have seen history repeat itself too often. Instead of a financial collapse, power hungry AI companies could cause mass suffering and death. They might have the right intentions, and they might all be Nobel Prize winners. At the end of the day, none of that really matters.

Is there a point to this comparison? Something we can learn? I think so. Intelligent regulations can lessen the probability of financial collapses, and I believe the best form of AI regulations can prevent many low-hanging-fruit problems that will come with the development of AGI. Also, every finance company has a compliance department, and AI companies will likely need similar departments to function and keep up with regulation (probably called "AI safety" or something). But something else evolved after the financial crisis, the emergence of internal risk departments in investment firms and banks. These risk departments made sure that the firms were not taking on too much risk and were adequately diversified and liquid. The combination of compliance and risk departments at investment firms ensure that the firms themselves stay afloat and protect customers, and they also protect the society from financial contagion. Establishing risk departments within AI labs is very necessary, especially if they collaborate and openly share the ways in which they have avoided catastrophic problems. If we want to plan well for AI regulation, we shouldn't look to the technology industry, where largely the government has failed to do anything of use. We should pretend the year is 1900 and we want to plan out the best incentive structure and regulations for the finance world for the next two hundred years. Yes, a recession or two might happen, maybe even a depression. But maybe with the right incentives we can avoid something worse.

Wednesday, April 19, 2023

Solving Alignment Would Be Terrible?

If everyone in the world was given a genie that granted three wishes, everything would fall apart. Even if there were no "monkey's paw" problems, and every single person's true intention was granted, chaos would be the only outcome. I'd wish for "make me a million dollars, legally." Someone else would wish for "steal ten million from J.P. Morgan and make it untraceable." Another would wish for "push through legislation that would make it illegal to fish." Plenty of wishes would contradict and the war would be won by the people with the most powerful genies. Regardless, society as we know it would collapse. This is why I'm wondering if solving alignment may actually be a horrible thing to do right now. Not the problem of finding the objective moral values of the universe and embedding them into all AI, but rather the problem of making an AI follow along with your arbitrary values (also called "wishes"). In a world of aligned AGI that can replicate, if every person is given a personal AGI, absurdity begins. The same wishes are pursued. Labor costs are now essentially zero, and the only real winners are the people with the most powerful genie. We wouldn't give everyone a nuke, just as we wouldn't want a small group of unelected people to have the only nukes. Given that the capabilities of an AGI will increase with time, I don't see how democratizing AGI leads to anything but madness. I also don't see how leaving AGI in the hands of a small group of people leads to anything but madness. I guess I only see madness.

If anyone on Earth has access to a digital god, things will not go well. Even if that god is not all-powerful, things will not go well. I don't see a massive distinction between AGI and ASI, because at some level a human brain emulated in a computer is already superintelligent. It can think faster, access the entirely of human knowledge ("the internet"), and probably replicate pretty easily. Obviously I care way less about aligning AGI as I do about aligning ASI, but I need to remind myself that they are not so far off or necessarily different. What does all of this mean in the short term? Let's take interpretability for example. If we knew exactly why a neural net made every decision, would that be a good thing? Would that create massive increases in AI capabilities and trust in AI, and lead to everyone getting a genie even sooner? Maybe not, and maybe having aligned genies is way better than having unaligned genies. But if some unaligned low-level genies start messing up and killing people, maybe we take a big step back as a society. Maybe we outlaw genies, or take a serious look at figuring out our values. If the aligned ride to AGI goes smoothly and then the first deaths occur in an abrupt human genocide, we'll be too late. Whether an ASI ends up being good for humanity or not greatly depends on the values it is following. Even if it is "aligned" to those values perfectly, things will probably go horribly wrong for most people. If you think power corrupts, wait until a small group of individuals determines the values of this Better God. This is why I am pretty hesitant about my idea to massively boost alignment research across the board. Yes, hesitant about an idea I came up with yesterday. Maybe research into corrigibility (figuring out how to turn AI off or change its values) is much more important than all other research. I really have no idea, but it is probably an important conversation to have.

Using Narrow AI to Solve Every Problem

It is very possible that I do not yet grasp the difficultly of producing novel alignment research. It could very well be the case that true, genuine leaps of knowledge of the general relativity sort are needed, and we simply need to find the right team of Einstein's. Some people seem to think that narrow AI can't help with solving the alignment problem and that you really need something at the AGI level in order to make progress. At least that is my understanding of some of MIRI's conversations, which are completely incomprehensible. If people in the Rationalist/AI Alignment/LessWrong community talked in simple English, the past 20 years of spinning wheels could have been avoided. Anyways, they seem to think this: by the time you have an AGI powerful enough to solve alignment, you have an AGI powerful enough to wreck a whole lot of things (including the human race). Well yeah maybe if "solve" is your goal, but even "assist with solving" is met with steep resistance. I can't possibly see how this is the case. Large language models such as GPT-4 are insanely good at providing sources, explaining complex topics, and programming. During one of my discussions with a head of an AI research lab, I was told that one of the main bottlenecks of research is all the administrative work. Well, if hiring a thousand more workers would be beneficial (as they could help format, write summaries, check plagiarism, compile sources, test code, etc.), is it not the case that hiring ten employees that are skilled at using advanced LLM's would be just as beneficial?

I have been using ChatGPT extensively, and it is clearly one of the greatest technological achievements of the past century. It is insanely useful in every aspect of my work life, and it is very clearly going to replace a lot of white collar jobs. What are alignment researchers doing that ChatGPT can not? Or, what are they doing that could actually not benefit from such a incredible resource? It seems that the coming wave of narrow AI, including the generative AI systems that keep exploding in usefulness, is going to transform nearly every industry. Medicine, finance, technology, journalism, I could go on, will be massively transformed and improved. So many use cases: cancer scans, fiction writing, translations, virtual assistants, even relationship advice and therapy. Why are people so convinced alignment research his the sole holdout? I think it sort of ties back to this strange savior complex. The idea that only a small subset of people truly know this battle between good and evil is happening, and only this small subset is smart and moral enough to take on this inevitably losing battle (so that they can say "I told you so)." It all seems so weird. Obviously we are not going to code first-principles moral values into a machine. Godel's theorem and the god debate are clear on this (we have to assume some values and we have no idea what the correct values are). But for things like interpretability and corrigibility is that really something only humans should be working on?

Narrow AI is probably pretty good at assisting with most effective altruism causes and most existential risk prevention. Obviously it can lead to terrible outcomes, but engineering plant based meat substitutes (a research heavy field) and fighting global poverty (another research heavy field) can be positively impacted by simply giving every researcher an awesome assistant that can scan the internet and code better than the best human alive. Narrow AI is going to become increasingly used to solve every problem. Why ignore it for the most important one?

Monday, April 17, 2023

We Need to Speed Up, Not Slow Down

At the moment, there is a lot of discussion about putting a pause on AI capabilities research. An open letter from the Future of Life institute has been signed by thousands of researchers, urging a 6 month pause on the training of models more intelligent than GPT-4. I would love for this to happen, as then society could take more time to absorb the impact of such a large technological shock. We will have more time to debate, discuss, and regulate. However, this is obviously an empty gesture. Someone with a tremendous ego and even more impressive lack of character will simply sign the letter and then immediately start their own AI lab focused on creating an AGI. His name is Elon Musk. China is not going to slow their progress, which means that the U.S. government has no incentive to either. If GPT-4 is a calculator than Bard is a bundle of sticks, so there is no shot that Google is going to really sit on the sidelines for six months. What people fail to realize is what I stated in a previous post: the first trillionaire will be someone who owns a very large stake in an AI development company.

The financial incentive to build AGI is not only enormous, it is the highest financial incentive we have ever seen and possibly the highest financial incentive we will ever see again. This will be an arms race to the finish no matter what the talking heads say on the television or in congress. I vehemently disagree with the idea that we should spend our time campaigning to slow down capabilities research. It is simply not possible. The financial incentives are too massive, and anyone who would actually follow an order to halt progress, an idea that is completely unverifiable and ungovernable, is probably a more upstanding person who would thus leave the development in the arms of less ethical people. I understand that there is probably a "good guy with a gun" fallacy here, but I really don't see why we should trust anyone to act against their own self interest. Instead of this, we should be massively boosting alignment research.

Since there is a lot of overlap between alignment and capabilities research (an aligned system is actually more capable or will appear more trustworthy and given greater responsibility even if there are existential flaws), we should focus on long-term value alignment. I could not care less about solving interpretability or distributional shift. Someone else is either going to do this or not, and there is actually a massive financial incentive in each case. Also, if we knew why a neural net made every decision, I am not sure if that would be good or bad for humanity at this point. The question we should ask is: "where is there is not a massive financial incentive?" Some sort of long-term value alignment, sure. The kind of "shoot for the moon" research that will only be beneficial if we hit AGI and go "oh wow looks like superintelligence is pretty much imminent and we have no idea what we are doing." We should be spending trillions of dollars on this sort of research, not zero.

Saturday, April 15, 2023

Should We Build a Better God?

God comes to you in a dream, and says "hey, next Tuesday I will cease to exist. On Wednesday, you are to design my replacement. You will choose the New God's moral principles and decide how active the New God will be in the life of future humans. On Thursday, the New God will take over and you will have no say in whatever happens next." Do you take up the offer? Do you tell him "actually you should make this all democratic, and the public should vote on each aspect of the New God." Do you say "actually I think Sam Altman would be better than me at designing a New God, you should ask him." This is essentially the dilemma we have with ASI.

Before we getting into choosing values, let's briefly discuss an even harder problem, ensuring that the New God follows through with our intentions. We have to be very careful what we wish for. In a short story called "The Monkey's Paw," a man is granted three wishes. The man first wishes for $200, and then the next day his son dies in a work accident and the family is compensated $200 by the son's company. Some folks at MIRI think that "figuring out how to aim AI at all is harder than figuring out where to aim it," and I'm actually inclined to agree. Both are insanely hard, but trying to incorporate any sort of value system in machine code seems near impossible. This is going to be the most important technical aspect of alignment research, but let's get back to discussing the choosing of values. Frankly, the choosing of values is actually possible and more fun to talk about.

Now, who should choose? Do we want the vote on the New God's moral beliefs to be based on a vote across the United States? Should it be worldwide, where the population of China and India dominate the vote? Should citizens of authoritarian regimes get a vote? Should members of the Taliban? I honestly don't see how this differs much from the Constitutional Convention. We should probably have something similar for ASI, a conference among nations where we decide how humanity will design the value system of a future ASI. Some of the solutions from the Constitutional Convention will probably be applied. Maybe there are some votes based on pure population and some votes granted to specific countries or regions, similar to how the US has a House of Representatives (number of politicians based on population of the state) and a Senate (two candidates per state). Frankly, this doesn't seem to different than what would be necessary for the formation of a world government.

A world government is a simple solution to curbing existential risk. It's harder to have nuclear war if there's only only country, and it will be easy collaborate to make worldwide decisions if there is only one government. Assuming this government is largely democratic, it is probably the only feasible way to account for humanity's aggregated moral principles and future desires. There are obviously huge risks of a world government (authoritarianism, value lock in), but it is very possible that it will be established in the future. If ASI is developed, it's going to pretty much take the role that a world government would anyways, as it will be insanely powerful and an individual human will have essentially no sway over anything. A world government and ASI face the same Democracy vs Educated Leaders trade off. There are two options when building a better God:

1. Make this process totally democratic, so that each individual currently on Earth gets a say.

2. A small team of experts gets to decide the future of humanity.

Maybe this small team is better than the rest of the world at picking between trade offs. Maybe they are more educated, more moral, and better at determining the needs of the future humans who have yet to be born. Or maybe they are authoritarian and will be massively incentivized to achieve god-status and immortality themselves. Regardless, I actually do think we should establish a New Constitutional Convention. Call it the Building a Better God convention. Maybe a majority of the population opposes this creation, and in that case we will have our answer.

Friday, April 14, 2023

The First Trillionaire

If you are investing for the future, you need to have some sort of prediction on how AI will develop over time. If you believe in short timelines and that AGI will arrive in the next ten years, you probably shouldn't invest in Sears or Walmart. Maybe it is the case that some small, private AI lab will develop AGI and quickly become the real superpower of the world, but it is also likely that a large tech giant acquires the small lab and scales up the innovation. Given that compute seems to be a constraining resource, you should probably invest in companies with the scale to train these massive models.

Should you invest in AI safety firms? Or tech companies with lots of AI safety standards? Probably not, as they will likely move slower than some of the "move fast and break things and maybe kill all of humanity in the process" firms that are bound to spring up. Still, I see AI capabilities research and AI alignment research as two sides of a similar coin, so it could be the case that the companies more focused on safety could create better products. Maybe these products conform more to consumer expectations, maybe they meet new regulations, or maybe consumers are less scared of them.

If AGI actually arrives, we could see massive GDP increases across the world. It probably doesn't matter what you are invested in as long as you are broadly diversified, if the stock market quadruples in three months. More likely in my opinion is a small subset of individuals receive all the money and power, as only the actual owners of the AGI become trillionaires. In my mind it is clear that the first trillionaire will be someone who owns a very large stake in an AI development company. The real question is, will there be a second?

How Important are Humans?

When I was an undergraduate, I worked part time as a cashier at a convenience store on campus. This job wasn't particularly exciting, I spent a lot of time doing sudoku puzzles and secretly studying, but it funded my weekends and summers. The job of a cashier is a simple loop function. For each item the customer has, scan the item and then place it into a bag. At the end of this, the customer swipes their credit card and pays for the items. Then, the cashier looks the customer in the eye and says "have a good one." The hardest part of the job is avoiding any sort of social awkwardness.

Self-checkout systems have replaced many cashiers. Soon, stores will likely rely on computer vision and many items won't even need bar codes (why does the box of Frosted Flakes need a barcode? The camera can just see the product and debit its monetary value to from your shopping account when you leave). The real question is, would I prefer to have a human or an AI responsible for checking me out of a grocery store? Honestly, I do not care at all either way. If every single cashier was replaced by an AI that used computer vision to run this same loop function (for each item assign the cash value to a receipt that must be paid) that would be fine with me. Why is this important? Because we must determine where exactly humans will fit when AI takes over 90% of the jobs.

I had an argument recently about AI art. The opposing claim was the AI art will never really be valuable. The Mona Lisa is valuable not because it is particularly stunning, but because of the human artist and human vision behind the work. I had another argument about generative AI writing books. The opposing claim was the people would pay a premium for books written by humans, as a book written by an AI doesn't have the same artistic vision/meaning. As an example, we watch humans play chess and would never care about AIs playing chess. Here is the problem with all of this: if an AI writes a book that is substantially better than what humans are putting out, I am buying that book. Even if AI is only in the top 1% of human authors in terms of quality, I am reading that book before I am reading the 99% of human authors. Some people will pay for American-made products, but most default to the cheapest option made in China. If Germany makes amazing cars using an automated assembly line, the consumer will probably buy that instead of buying less amazing cars made by a team of humans. Yes, the social aspect of human to human interaction is important, but behind the lens of a screen, we will soon scarcely be able to tell. AI will have the capability to be extremely nice and incredibly helpful, better in most ways than the grumpy college cashier just there for the paycheck.

I think a lot of people miss the point of generative AI. Yes, maybe people will have a preference to read poetry written by humans. Given that AI will probably become really, really good and human-like at writing poems, how will we even know if the human that claims to write poetry doesn't just use AI? Also, if the AI poetry is absolutely beautiful, why would I ever read a human's work again? Personally, if AI starts creating sequels to amazing human movies that I know and love and these films are in the same artistic style and just as high quality, I may never watch a human made movie again. Why would I?