AIs can be much smarter than humans. There’s no law of physics that says human minds must be the smartest. AIs enable better designs for minds. They’re less constrained by how much information they can process (they can read the entire internet) or how big they can be (they don’t need to fit in a skull), and they can be incredibly efficient. Accordingly, AIs have already blown past human levels in countless domains.
AIs might go from human-level to godlike incredibly quickly. Once you’re smart enough to do science, you can rapidly make yourself even smarter, which is useful for achieving almost any goal an AI could have. As Alan Turing’s colleague I.J. Good put it: “An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind.”
A godlike AI would be powerful enough to kill everyone. Humans have built nuclear weapons. AIs much smarter than humans could take similarly dangerous actions. AI godfather Geoffrey Hinton warns: “If it gets to be much smarter than us, it will be very good at manipulation, because it will have learned that from us … It’ll figure out ways of manipulating people to do what it wants.”
If a godlike AI had some drive misaligned with human interests, it probably would kill everyone. Godlike AI doesn’t need to hate humanity to kill everyone. It just needs to want something besides human interests, then powerfully execute on that want. “Humans don't generally hate ants, but we're more intelligent than they are – so if we want to build a hydroelectric dam and there's an anthill there, too bad for the ants” (FLI).
AIs are incentivized to seek power. For almost every task we might train AIs to accomplish, it helps to acquire resources that help them accomplish it (like more computational power) and to avoid obstacles (like being modified or shutdown). Given this, AIs trained to be generally useful will likely learn these power-seeking tendencies. As Geoffrey Hinton writes, “Having power is good, because it allows you to achieve other things… One of the sub-goals [AIs] will immediately derive is to get more power”.
Just because an AI is smart doesn’t mean it will learn to do good values on its own. There exist brilliant psychopaths, who skillfully execute plans but who never see violence as bad. There similarly can exist brilliant AIs, who vastly outperform humans but who never develop good values.
No one knows how we will control godlike AI. Training AIs to behave as intended is difficult even with current systems. Researchers usually try to get smart AIs to act certain ways – for instance, kindly – with reinforcement learning from human feedback. Roughly, this process reinforces outputs we like and penalizes outputs we dislike, then hopes this teaches them to do what we like. However, getting to act some way is not the same as getting it to be that way. Researchers have no way of checking an AI’s internals to see if it learned to be kind or something else that scores similarly well like “be good at predicting what the humans like”. It’s hard to tell whether an AI has learned the intended goal just by looking at behavior. Furthermore, when we observe behavior for long enough, we often realize AIs learned unintended goals as feared.
It might be especially hard to teach a machine complex human values. AIs learning unintended goals is probably not a rare failure mode that we can easily correct, but the default. There are many possible goals AIs could learn. Humanity’s exact values are only a small set of them, and they’re tricky to point to. Researchers can reinforce acting kind, but AIs can find loopholes that get reinforcement but are actually harmful. Also, we don’t want AIs to learn what we superficially appear to want – which might mean copying humanity’s current mistakes. We want them to do what we would do if we were wiser. In summary, to get AI to help build a great future, we need a really precise aim. We don’t have that right now, even for current weak systems.