Aligned Intelligence Solutions: March 2023

Monday, March 20, 2023

The Solution to the Hardest Problem Ever

AGI will be the greatest technology that humans have yet developed. If this is the case, AI alignment will be the most important field of research in human history. The pioneers that wind up creating AGI will not only be more famous than Fermi and Oppenheimer, they will be the most important humans in history. Other world events don't really matter in comparison. Who cares about the whales if we're all going to die in five years? Who cares about global poverty if we are a decade away from a post-scarcity society? Even planetary expansion doesn't really matter. An ASI could track us down across the galaxy and kill us, so making rockets is only important to prevent the other ex-risks. Soon, humanity will be ready to access near-immortality in digital forms. The next few decades could shape the next trillion years. You are in a unique position, right at the cusp of major change. Feel like a main character yet?

It is hard to avoid hyperbole when discussing AGI. It is even harder to avoid having a savior complex. AI applies leverage to every problem, making it much bigger. If you think current poverty is bad, wait until you hear about a technology that can replace every worker on earth. If you think nuclear weapons and chemically engineered pandemics are scary, wait until you hear about a technology that could use these weapons against you without fear of retribution. Wait until you hear that this technology will probably make even scarier weapons, things we cannot even imagine with our feeble little brains. Yeah, scary stuff. Even scarier if you read my previous post, "The Actual Alignment Problem." It is possible that AI kills all of humanity, leading to a world lacking sentience. It is also possible that and ASI decides that it should literally maximize suffering for some reason, leading to the literal worst case scenario for the universe. If there is a 80% chance of bad outcomes (massive amounts of suffering), and a 20% chance of good outcomes (humans and/or sentient life flourishing through the cosmos), based on expected value maybe we should pull the plug. Maybe heading towards a paperclip doomsday is actual the just moral action, and making it possible to align AI with human values could be very dumb (as some bad actor could lock in a bad form of human values). Maybe human control over ASIs will lead to immense mind crime, which in total accounts to unimaginable suffering. Just as unaligned AI may be the create massive human suffering, maybe humanity creates massive ASI suffering. Both are bad. Maybe human extinction is actually a better outcome than we think, compared to the alternatives. Given how scary and important this is, how pessimistic should we be?

The pessimism of AI researchers is very interesting. If you actually, fundamentally believe that AI is close at hand and is going to kill humanity/make things really bad, you would not be living a normal life. You would be blowing up buildings, gathering attention, spreading misinformation and probably unplugging some computers. It is hard to take the "AI doomers" seriously when they spend so much time complaining about how no one listens to them. If they were logically consistent, they would probably be breaking some laws and fighting tooth and nail to the end. But they are not. Lots of them claim to have "given up" on this supposedly impossible problem that may fundamentally alter the future of life in this universe. Pathetic. Part of human nature is that we fight to the end, no matter that odds. That is what heroism is. It's hard to look up to pessimists, but its even harder to look up to losers. But are the losers wrong? Should we give up?

Based on my understanding of the issue, no one has any idea how to align AGI with human values. Machine learning is incredibly complex and math-heavy, and only a few genius engineers may be able to even have a slight contribution in this field. Human values are incredibly hard to define, let alone program into a computer. Based on how deep learning works, we're not even sure how to program anything specific into a computer at all. We are very far from progress on any of these fronts, despite artificial intelligence quickly scaling in power and intelligence. Frankly, we are probably not smart enough to figure this all out. Especially given the time crunch we are under. The incentives are all wrong, as the group that invents AGI will be the most powerful group on the planet and could soon after chase godly powers or immortality. The first group to create ASI will likely be the last, so there is a massive incentive to cut corners and be first. If you think it will be pretty impossible to give ASI human values, you are terrified of the paperclip maximizer that is going to kill us all. If you think it is likely that we give ASI values, you are terrified that ASI will get the wrong values and torture humanity for eternity. In my opinion, we simply don't have the intelligence required to work this all out ourselves. We don't have the collaborative structure set up to ensure safety. So, what do we do?

If only there was something more intelligent than human beings. Something smart enough to collaborate effectively across a range of domains (computer science, philosophy, ethics) and help us weigh important trade offs. Hi it's me, AI, the problem. Given how difficult the problem of alignment is, it makes sense that AI itself will play a major role in solving it. The risks of looping in AI are obvious and I'm sure I'll expand on those at some point, but I don't see a way around it. I understand there is some circular reasoning here ("how are you going to use unaligned AI to solve unaligned AI problems?"), but I think the incremental advances in narrow AI will lead to bigger advancements in computer science/logical reasoning/ethics than we expect. This problem is difficult. The odds are too stacked against humanity, and the incentives are too perverted. AI alignment is humanity's most difficult problem to date, with the highest states we've ever faced. Our species might not face anything this important ever again. It is probably the case that AI alignment is legitimately the hardest problem ever. To me, it makes sense to use the most powerful tool yet invented to help solve it.

The Actual Alignment Problem

Aligning artificial intelligence with human values appears to be extremely difficult, if not impossible. Before we get into this post, let's clear up some definitions.

This website is called Aligned Intelligence Solutions. "Aligned" means "controlled" in a sense. A aligned AI will not kill all of humanity, or put us all in a virtual Hellscape. It will not lock in bad moral values, and it will respond to humanity's requests in a reasonable manner. It will lead to the flourishing of human life and positive moral outcomes. I never really liked the word "artificial" in the phrase "artificial intelligence." This word it assumes that such intelligence won't be as robust or comparable to human intelligence. Worse yet, it may lead some to believe that "artificial" beings aren't as deserving of moral consideration. For convenience, I'll still use the term when I refer to AGI, "artificial general intelligence" (human level on every metric), and ASI, "artificial super intelligence" (way past human level intelligence). "Solutions" is a pretty vague word, but it leaves room for both discussing solutions to the alignment problem and for providing actual tangible solutions.

Aligning an AI system with human values seems extremely difficult. There are many reasons for this, laid out very well in the books Superintelligence and Human Compatible. Before we discuss potential solutions to the alignment problem, we should discuss the potential outcomes from a utilitarian perspective. For all of these, assume that the goal that the AI is programmed with actually flows through to the outcome. Also, I am measuring these all not compared to where we are at right now, but the utility of each compared to a world with no sentient life.

1. Worst outcome: A superintelligence is programmed to create the maximum amount of suffering in the universe. The ASI makes a lot of humans (biological or digital) and tortures them for near-eternity. More likely, it kills all of humanity, and then makes a lot of really intelligent digital agents and tortures them for eternity.

2. Very bad outcome: A dystopia is locked in. Not the maximum amount of suffering, but a incredible amount. There's a million sci-fi examples of this. Authoritarian regimes rule the universe, human freedom is massively curbed, or humans make AIs into slaves (despite the AIs deserving moral values). Maybe a bad theocratic government imposes its values on the future of humanity in a way that we can't reverse. In order for this to be a bad outcome, it would have to have enough mass suffering to make it worse than simple non-existence. An example of this would be 90% of humans suffering for the rest of time while 10% live in luxury.

3. Neutral outcome: Every human is killed by the paperclip machine. Whoops. No more sentient life in the universe. Some would argue this is worse than the "worst outcome" above, because consciousness in and of itself is so valuable that even eternal torture is better than non-existence. I've had these arguments before with people and they are stupid.

4. Very good outcome: Things go really great! You could argue that if 99% of humans thrive while only 1% suffer forever that would qualify as great, but what I am imagining here is some sort of sci-fi utopia. We use intelligent machines to cure disease, spread among the starts, and make significant progress on morality. Everyone lives happy ever after (but utility is not maximized).

5. Best outcome: A superintelligence is programmed to create the maximum amount of well-being in the universe. Not sure what this would actually be (since I am a dumb human). But there is some form of total utopia. Maybe there's a bunch of humans or just a bunch of digital minds having a great time, but regardless it's a total party.

Why did I break out the outcomes in this way? Simple, I think that humans currently working on AGI are too shortsighted. Most organizations working on AI risk are only worried about human existential risk, the Neutral outcome laid out above. Yes, it sounds absolutely terrible in comparison to where we are now, but let us not forget that there are much worse outcomes. I have stated before that it may not matter who gets AGI first, because it may be the case that we are all doomed to die via the paperclip machine regardless. However, if the Very Bad and Worst outcomes are possible, maybe it does matter to a substantial degree. The alignment problem, in my opinion, does not refer to biased training data and black-box neural nets. It does not refer to the problem of avoiding the paperclip maximizer. Overall, it refers to the sheer difficulty of achieving a Very Good or Best outcome. Assigning probabilities to any of these outcomes is silly, as they are not five discrete outcomes but rather a continuous scale. However, it seems clear to me that ending up on the "Good" side of the graph will take a whole lot of intelligence, character, and collaboration, something that I am not sure humanity is ready for.

Wednesday, March 15, 2023

Gain of Function Research

Gain of function research has gotten a bad rap lately, due to the global pandemic caused by a potential lab-leak in China. This lab was taking viruses from the natural world and experimenting with them, potentially to test ways to combat these exact enhanced viruses. Well, then a global pandemic happened that wrecked economies across the globe and killed a lot of people. Oops! We don’t really know what happened still, funny how that works, but yeah, bad rap. Still, this sort of research can be useful, assuming the labs running the research stop leaking nasty enhanced viruses to susceptible human populations. Easier said than done.

Well, what about AI? Could it be useful to dive deep into gain of function research for AGI? I believe the answer is a resounding yes. Obviously, this wouldn’t include putting an AGI in a box and trying to see how it gets out. It also wouldn’t include taking a large language model and trying to see if it can rapidly improve itself. There are a lot of ways gain of function research could be done in very stupid ways, and I think it would actually be easier to kill all of humanity via stupid gain of function research than otherwise. Still, one of the biggest problem with current AI ex-risk is that no one really takes it seriously. If you can prove that we should, that could massively benefit humanity. I am against open-sourcing this kind of research, at least in most circumstances. We shouldn’t give potential web-scraping AGI any ideas, and we shouldn’t give terrorist organizations any either.

There is a lot of talk about nanobots and random sci-fi stuff being utilized in order to kill all of humanity, despite the obvious fact that we are constantly one button click away from near-total human destruction. The doomsday machine is real, and multiple times humanity almost blew itself up because some combination of miscommunication and computer errors. Remember, those are just the situations we know about. If we are barely skating by, nearly avoiding blowing ourselves up every few decades out of sheer luck, imagine what would happen if a half-motivated AGI wanted to tip the scales. Life in the nuclear age is actually terrifying, and it is clear that even a narrow AI could be used to cause billions of deaths. The advances in biotechnology are another venue towards ex-risk, and I believe this risk is also massively enhanced by the advent of AGI. It shouldn’t be hard to devise a virus that kills most or all of humanity, and again, it’s probably the case that even a narrow AI could do this. The recent pandemic showed just how unprepared humanity is for a virus that doesn’t even kill 1% of the people it infects, imagine if that number is 20%, or 99%. It’s unfortunate that actual day to day work on AI is so boring, because this field is such a big deal and the problems are so important. Maybe by publishing some scary research we can make AI safety cool and exciting, while also convincing people that the future could be terrifying.

AI alignment, if solved, could actually lead to a massive increase in ex-risk. If we can get an AGI to do exactly what humans want, and someone convinces it to destroy humanity, that would be bad. Without proper alignment, maybe the AGI gets massively confused and only makes fifty paperclips. Or because of biased training data, the AI only kills white people because the initial data set is biased and it does not realize there are humans that are non-white. Please don’t use this example to call me a racist who loves biased training data, but imagine how funny that would be. I’m not racist, I’m just not including Puerto Ricans in my models because I really want them to survive the AI apocalypse. Overall, what I am saying is there is some threshold we should meet with this research, and there is a fine line between when we should wait and when we should push forward. If through gain of function research we “solve alignment” in some super unenlightened and ineffective way, we could give AI labs a false sense of security. Also, we could give some bad actors some “good” ideas.

For a moment, let’s discuss bad actors. Obviously, there could be some pro-apocalyptic groups hell-bent on destroying all of humanity. There are terrorists who could be motivated to suicide bomb the world, and there are power hungry totalitarian regimes who may kill everyone but their in-group. The fundamental question I ask myself is, does it matter who gets AI first? There are two scenarios here. In the first scenario, whoever gets AGI first solves alignment, creates ASI with their set of values, and these values persist from then on. If this is the case, it really matters who builds AGI first. If China builds AGI, we’re all about to live under a totalitarian regime, and free speech is no longer a thing (sad!). If North Korea builds AGI, things get pretty bad and The Interview is deleted from streaming platforms. If the USA builds an AGI, human rights might have a better shot at being preserved, but for some reason the 3d printer won’t stop making oil. Should we optimize for the third scenario, even if the changes of ex-risk go up? In this case, gain of function research could be really good, if we use it to make actually-aligned AI systems that a “better” country or company could use to win the race.

Now comes the second scenario. Maybe, regardless of who builds AGI first, we are all totally screwed. AI alignment is impossible, and if the US builds it or North Korea builds it, we all get turned into paperclips. The ex-risk is the same for everyone, because the problem of converting human values into machine code does not work with our current framework. Corrigibility doesn’t work, and no amount of technical research moves the needle on interpretability. In this case, I actually think gain of function research is the most important area of research. Because, only through really solid proof that everything is about to go really, really wrong will we have a chance at halting AI progress. Only through showing that without a doubt we are screwed, will governments be able to collaborate on stopping AI progression, at least until we solve alignment or come up with a better AI framework.

Low Hanging Fruit

Heads of research at AI alignment companies and nonprofit organizations don’t seem to find independent alignment research that useful. However, I discussed alignment with a CTO of one of the most well-known companies, and they recommended two areas of study: low hanging fruit, and gain of function research. I will address low hanging fruit first.

If an AGI gets loose, it will probably need financial resources. It could start with something simple, like becoming the best online poker player in the world. Because it can think quickly and potentially source psychological information about every player, maybe it cleans house and quickly accumulates assets. So, it might be useful to build a poker-bot, in order to clean out this dumb money in advance. That way, if an unaligned AGI gets loose, it can’t do something so simple to gain financial resources. The bad AI will waste time trying and failing to win at online poker, valuable time in which AI labs may discover the AI’s bad intentions and turn it off. This sounds like an awesome research area to me, because it means I can help the world and also make a massive amount of money for myself.

Unfortunately, I doubt it will really work. Everyone is already trying to do this. Everyone is trying to devise models to make money at poker, and everyone is trying to use algorithms to make money on stocks. I don’t really see any “low hanging fruit” available. AGI could just make killer software solutions or media content and do everything legally, as in the narrative of Life 3.0. I don’t see a way around that. Also, the obvious solution for an AGI would just be to skirt the law. Stealing money is 1,000x easier than making it legally. Legitimately just taking money from people’s bank accounts should not be hard for an AGI. If an AI wanted to feign legality, insider trading (which is notoriously hard to prove) is the obvious solution. Heck, just issue some cryptocurrency and run ads on YouTube. AGI is also probably way better at thinking of areas of “low hanging fruit” than humans, and is probably way better at skirting the law and getting away with it. Also, there are a lot of unethical or immoral ways to make money that humans don’t do out of the sheer strength of cultural values, and AI’s might find avenues towards riches through these ways.

Incoming asymmetric payoff! The AI has very little downside risk legally, because it cannot be put in prison. If through this research we do something stupid, we are put in prison. The AI could just pretend to have a bug that caused it to veer into someone’s bank account, and regulators would probably just have the company remove that line of code (which is actually a decoy!). Yes, turning the AI off might be akin to killing it, but I assume the AI will have some sort of expected value calculation when doing something illegal. It is likely that in every case where the AI’s life is at risk, the value of the resources gained will be worth it. One last point. Why would money matter if you have access to the nuclear codes? Blackmail gives the AI real-world power, to a level that even money doesn’t. Information is way more powerful than money. I doubt an AGI will be content living within the bounds of a financial system driven by inflationary central banks. That would, in my opinion, be incredibly stupid. Why play by the rules at all? Why not accumulate sensitive information and blackmail real-world people to do your bidding? Why not just say "give me ten billion dollars or the nukes start flying?" Thus, unless this sort of research includes illegal or immoral “low hanging fruit,” then there is really nothing we can do in this research field.

The Calm Before the Storm

This week, OpenAI released GPT 4. This large language model boasts impressive achievements over GPT 3.5, and it seems the rapid advancements in large language models shows no signs of slowing down. Compute seems to be the main constraint towards making these models more intelligent, a constraint that is unlikely to be a real roadblock going forward. AI is constantly in the media, and Chat GPT is clearly the most interesting technology of the past twenty years. This technology poses massive potential benefits and massive potential harms. Everyone is talking about AI, and everyone seems to be using it. Still, compared to what is about to come, things are surprisingly calm. The world is about to change exponentially. We are not ready.

As of this time in March 2023, there are only around 300 AI alignment researchers. The population of the earth is 7,888,000,000. That means 0.000003803% of the planet is working on potentially the most important issue facing the remaining 99.999996197%. Out of everyone, 100% of people do not fundamentally understand how the most powerful neural nets work. There are three AI labs making progress on AGI (OpenAI, Anthropic, and Deepmind), and until last week only one had a published safety plan. There are only a couple of AI safety research companies, each extremely small and capacity constrained. Every company involved in AI, at the current moment, will admit that the problem of aligning AGI with human values is completely unsolved. Calling the future an “ethical minefield” is a massive understatement. AI drastically increases the risk of other existential threats (nuclear war, chemically engineered pandemics, black swan risk), and yet society is only thinking in terms of job replacement and training data bias.

At this point, we are floating in the open sea, simply feeling the ripples in the water around us. We are fearful that waves are approaching, fearful that our peace will be disturbed and that we will have to start swimming. Meanwhile, a tsunami approaches.

Thursday, March 9, 2023

Crypto and AI: Oracles Reinforced by Oracles

OpenAI has shown us the breathtaking potential of active learning. ChatGPT is absolutely incredible, thanks to the deployment of reinforcement learning from human feedback (RLHF). The humans who essentially “upvote” or “downvote” generated prompts from the chatbot provide a massively useful service. These “graders” are employed by the company and are responsible for enforcing the preferences of the wider human consumer. Well, here comes my top-tier idea: why not leverage the blockchain to massively increase the number of “graders”?

Chainlink, a cryptocurrency platform, focuses on the creation and maintenance of Oracles. Oracles are a way to connect the blockchain to the real world. Essentially, users put up collateral and use this collateral to “vote” on specific events or historical facts. The following is a massively simplistic illustration, for example only. Let’s say you could pony up a hundred dollars worth of some cryptocurrency, and vote on “is Joe Biden the current president of the United States?” Lots of people put up collateral and vote, and the outcome is decided by the majority. Since 90% of the voters agree, it is decided that Joe Biden is the current president. The 90% that told the truth are rewarded and gifted a small amount of crypto as a reward for their truth telling. The 10% that lied are punished and their collateral is taken. Thus, we can connect the world of decentralized finance to the greater real-world economy. This is a powerful mechanism that I believe can be applied to the world of artificial intelligence.

Instead of voting on current events, users can express their preferences for certain chatbot responses. For example, if the chatbot responds to the question of “what is a good movie” with “I am going to kill every human being,” the users will put up collateral and downvote that comment. If the chatbot responds with “The Departed,” the users will upvote the comment. The same reward/punishment is enforced, with less harsh penalties and less beneficial rewards. Potentially, these can scale based on the way that others voted. If 99.9% of people voted against, you will be punished to a greater extent for voting for. Millions of people could vote on various prompts, and as a result they will fine-tune the underlying AI system to a level unachievable otherwise. Open sourcing the AI reinforcement to the internet without any sort of financial incentive has been shown to be a disaster, with blank-slate bots being turned racist by a small group of trolls. However, with a strong financial incentive, these problems would be largely mitigated. Also, the underlying value will be essentially created from thin air, massively decreasing the costs that this sort of scale would usually require. And, I really do believe that this system would be valuable.

Many technical kinks would need to be worked out, but I believe that the underlying idea is very interesting. This wouldn’t need to be created by an AI company, but rather a small crypto startup could create the platform and then charge AI labs to use the platform for reinforcement learning. With the required network effects, this could lead to a massively valuable cryptocurrency. The current market cap of ChainLink is $2.5 billion. My idea may not be as revolutionary, but either way, you have no downside. As with everything in crypto, you don’t need a VC. Everything will be paid for by fake money.

Sunday, March 5, 2023

The Impossibility of Human Flourishing

Should individual companies be allowed to construct and distribute super intelligent systems for personal use? How much weight should be placed on doomsday scenarios, and would it be better for the government, with all its ineffectiveness and tyranny-potential, be a better substitute? It seems to me that a small group of highly moral, highly motivated persons who lack a strong profit incentive would be a better kick starter for AGI than some sort of world government. The problem is, I am not sure if this group is possible. The profit motive may not be bad, but it will ensure that there is an incentive to cut corners and reduce safety protocols in the name of progress. Also, various business and anti-trust dilemmas emerge, and regulation tends to be the only saving grace from drug dealer like competition. The first company to develop AGI could be the last. Given this, it is important to get it right initially, or at least within the first few months. I am not quite sure why AGI hasn’t been developed yet. I assume it must mean that humans are pretty stupid, given that the human brain evolved from a massively wasteful process focused not on intelligence, but rather survival and reproduction. A lot of this evolution was random, and there are plenty of flaws in the human body and brain. I’ve had people tell me that we simply don’t have enough compute to properly simulate a brain, but given the scope of processing power of the internet and the relatively lackluster power of an individual brain, I doubt that this this will remain the case for long.

Our progress as a species has become more rapid, and I would be surprised if AGI wasn’t developed in the next century or two. We are simply too smart to forever have this roadblock, and an individual human is simply too dumb. What I am saying is simple, the human brain cannot be exponentially smarter than that of a monkey. If that were the case, we wouldn’t suffer from such a great overlap in suffering potential. The human brain is advanced, but only to the mind of a close relative to a monkey. It seems unlikely that we are not able to collectively, across eight billion minds, create a single mind spread across trillions of terabytes of processing power and compute. If we can simulate one mind, many more are likely to follow. Artificial intelligence is likely the most important technology in human history. A century ago, it was the nuclear bomb. We have not solved the problem of nuclear proliferation, and every moment lays a button click away from near-total annihilation. How this is not a daily, crippling thought to everyone is a testament to the power of compartmentalization. Given that humans are constantly on the brink of nuclear war, with the only defense being mutually assured destruction, I am not sure why we are confident the same will not happen with AI.

AGI does not ensure any sort of mutually assured destruction. Unlike the nuclear weapon, the first country to control AGI will likely be the first to develop super intelligent AI. If this super intelligent AI is created with the “wrong” values, will that not ensure complete dominance? A superintelligent AI will probably be able to halt the progress of other AI, whether out of self-preservation or out of instruction from a puppet master. I doubt that this will take the form of killer robots, but rather spoofing. An ASI will likely be able to convince other countries that AGI is impossible, or perhaps decades away. I am not entirely confident that ASI is absent from the world, although it is impossible to prove a negative. I do think every additional advancement in AI makes it more likely that ASI is already possible. It is almost like how finding microorganisms on Mars could be worrisome, as it means there is one less great filter to worry about. The Fermi paradox can probably be applied to AI. If ASI hasn’t yet been developed, why is that? One reason could be that AGI is simply far away, and we lack the algorithms and compute at this point to create it. This logic must extend, to state that at some point in the future (absent some existential event) humanity will create AGI. The arguments against AGI have clearly been based on the “god of the gaps” fallacy, and given the developments in the past five years a lot of them look just as ridiculous as the arguments against the usefulness of the internet. The goalposts will continue to move, but more and more people are waking up to reality. As these roadblocks are knocked down, it becomes increasing likely that ASI already exists. It could be argued that a sudden stagnation in AGI progress could actually signal the development of ASI, as this superintelligence could be preventing any sort of detection or competition. If ASI currently exists, it is possible we will never know. This day may be our last, or our memories could be false. Regardless, I think we should stick with the assumption that AGI and ASI have not yet been created, but are shockingly near.

ASI goal alignment is an interesting topic. On one hand, I think that making an ASI compatible with human values is extremely important. However, what exactly are human values? If ASI were to take the moral philosophy of millions of academics, it would land on some form of moral relativism. Should we ensure that ASI is not a nihilist? In some forms, a utilitarian ASI could actually contribute more to human suffering than a Cioran-like ASI that refuses to do any sort of work out of protest. How do we ensure that the utilitarianism that ASI pursues is the right calibration? Should we use ASI to try to determine the best set of objectively moral values? As an avid reader of philosophy, I am extremely worried that nihilism is actually true. If this is the case, it probably doesn’t matter if ASI takes over humanity and tortures us for near-eternity. However, there is some sort of Pascal’s wager argument here, even if it faces the same problems as religious belief.

I think that ASI is probably humanity’s best chance at finding the correct moral system. In this regard, ASI could be infinitely useful. We probably won’t know if the moral system an ASI develops is correct, but I’m not entirely sure we will have any sort of compelling competing choice. The moral beauty of some works of fiction mirror the best parts of religious teachings, so I am quite sure an ASI will at least be able to make its moral system entirely convincing to us. Maybe, killing humans is morally right, and the ASI will actually be doing something objectively good. Regardless, we should ponder whether we want to align an ASI with human values, or if we want it to align itself with the true objective moral values of the universe. To be honest, I am not sure which one of those is harder.

Saturday, March 4, 2023

Asleep At the Wheel

If the world was ending in five years, would you write a book?

This is fundamental question gripping me, but it is not too dissimilar from the fundamental question gripping humanity as a whole. We all know that we are going to die, and we have no idea if there is life after death. As a result, every action we take could very well be meaningless. Yet, we trudge on anyway. We live our lives in denial of this death, and generally we remain ignorant of the pressing existential dread bubbling under the surface of our consciousness.

I see the potential rise of artificial general intelligence (AGI) as a generic extension of this problem. The inevitability of AGI seems clear, and in my opinion it is likely to arrive within this century. There is quite a bit of controversy over timelines, but it seems most researchers are now coming to a similar conclusion. The arrival of AGI will no doubt fundamentally alter the future of humanity, and it has the potential to be extremely positive or extremely negative. It seems clear that this technological revolution will quickly lead to artificial super intelligence (ASI), a term that is largely undefinable. The amount of control we will have over any sort of advanced artificial intelligence is unknown, and it is very likely that the control problem will be completely insurmountable. AI alignment, the research field aiming to ensure that AI systems are designed in a way that aligns with human goals and values, is extremely immature. I would be hard pressed to find a more important field in the history of human history, especially as humanity enters this critical period of rapidly advancing AI systems.

The field of AI research has been around since the 1950’s, and AI has been a concern in sci-fi media for just as long. The first “Terminator” film debuted in 1984. The first real AI alignment discussions began in the early 2000’s, but until 2015 the field was not at all taken seriously. I ran into one of the most prominent early AI “doomers” in San Francisco recently, and I realized that if your leader is a fedora wearing neckbeard with a savior complex, people will probably not pay much attention. The lack of professionalism within AI alignment needs to be overcome, and the slow migration of respected professors, Stuart Russell being the most prominent, into the field has done wonders in terms of credibility. Still, the main problem, the complexity of aligning future AGI with human values, is at this moment, completely unsolved. Worse yet, it is somewhat likely that the problem is completely unsolvable.

I have as strong desire to help, as I agree with Toby Ord that this is likely the most important issue humanity is currently facing. I've thought through starting an AI alignment company and have been pondering how it would be structured. If I go the nonprofit route, I will be beholden to many of the downsides that plague non-profit companies. Talent retention will be difficult, salaries will likely be compressed, and we will be beholden to outside donors for capital. However, structuring as a for-profit may wrongly signal that we are in it “for the money.” My purpose for starting this company would be to push forward the agenda of AI-alignment and make it easier for companies across the globe to ensure high standards of safety. I have no current thoughts about the company structure or potential product lines.

Reasons to start an AI safety company:

1. AGI is likely coming in the next 50 years. Humans evolved intelligence over the course of millions of years via an inefficient, largely random process. Given the current technological trends, humans should, at some point soon, be able to create something smarter than them.

2. AGI is likely to fundamentally change the human experience. The productivity gains alone will cause massive changes, but the inevitable development of ASI will make current life unrecognizable.

3. Morals are learned and not innate. Making an AI better at problem solving, even in a generic “human” sense, will not correspond to the AI learning objective moral values and ethics.

4. Human life is valuable, and the prosperity of the human race should be protected.

Because of the first three reasons reasons, I believe that AI alignment will become the most important field of research in the world in the coming decades. Because of the fourth reason, I should try to make a positive difference in this field. Note: the field of moral philosophy is very messy. As a consequence, I admit that number four is an axiom that I will continue to assume, even though it is unproven.

Asleep at the Wheel:

I started reading Superintelligence in 2018, and completely agreed with Bostrom at the time. I remember at a job interview that year, I was asked some derivative of "what is something that you would disagree with most people about?" I answered "I am very convinced that artificial intelligence is going to have a massive impact on the world in my lifetime, and it is likely that it will be a negative impact." The book was difficult to get through, so I ended up picking it back up and finishing it in 2020. Since then, I have read a few more books about AI (Human Compatible being the best), but have largely ignored the issue, despite my deep concerns. As with 99% of the human race, I have been asleep at the wheel.

Starting this blog is an attempt to change my priorities. If I am forced to think about this issue enough, I believe over time I will discover ways to make a positive impact. If I start an AI alignment company and prevent the paperclip company from doing something stupid, all the better.