My response to AI 2027
2025 Jul 10
See all posts
My response to AI 2027
Special thanks to Balvi volunteers for feedback and
review
In April this year, Daniel Kokotajlo, Scott Alexander and others released what they describe as "a
scenario that represents our best guess about what [the impact of
superhuman AI over the next 5 years] might look like". The scenario
predicts that by 2027 we will have made superhuman AI and the entire
future of our civilization hinges on how it turns out: by 2030 we will
get either (from the US perspective) utopia or (from any human's
perspective) total annihilation.
In the months since then, there has been a large volume of responses,
with varying perspectives on how likely the scenario that they presented
is. For example:
Of the critical responses, most tend to focus on the issue of fast
timelines: is AI progress actually going to continue and even accelerate
as fast as Kokotajlo et al say it will? This is a debate that has been
happening in AI discourse for several years now, and plenty of people
are very doubtful that superhuman AI will come that quickly.
Recently, the length of tasks that AIs can perform fully autonomously
has been doubling
roughly every seven months. If you assume this trend continues
without limit, AIs will be able to operate autonomously for the
equivalent of a whole human career in the mid-2030s. This is still a
very fast timeline, but much slower than 2027. Those with longer
timelines tend to argue that there is a category
difference between "interpolation / pattern-matching" (done by LLMs
today) and "extrapolation / genuine original thought" (so far still only
done by humans), and automating the latter may require techniques that
we barely have any idea how to even start developing. Perhaps we are
simply replaying what happened when we first saw mass adoption of
calculators, wrongly assuming that just because we've rapidly automated
one important category of cognition, everything else is soon to
follow.
This post will not attempt to directly enter the timeline debate, or
even the (very important) debate about whether or not superintelligent
AI is dangerous by
default. That said, I acknowledge that I personally have
longer-than-2027 timelines, and the arguments I will make in this post
become more compelling the longer the timelines are. In general, this
post will explore a critique from a different angle:
The AI 2027 scenario implicitly assumes that the capabilities
of the leading AI (Agent-5 and then Consensus-1), rapidly increase, to
the point of gaining godlike economic and destructive powers, while
everyone else's (economic and defensive) capabilities stay in roughly
the same place. This is incompatible with the scenario's own admission
(in the infographic) that even in the pessimistic world, we should
expect to see cancer and even aging cured, and mind uploading available,
by 2029.

Some of the countermeasures that I will describe in this post may
seem to readers to be technically feasible but unrealistic to deploy
into the real world on a short timeline. In many cases I agree. However,
the AI 2027 scenario does not assume the present-day real world: it
assumes a world where in four years (or whatever timeline by which doom
is possible), technologies are developed that give humanity powers far
beyond what we have today. So let's see what happens when instead of
just one side getting AI superpowers, both sides do.
Bio
doom is far from the slam-dunk that the scenario describes
Let us zoom in to the "race" scenario (the one where everyone dies
because the US cares too much about beating China to value humanity's
safety). Here's the part where everyone dies:
For about three months, Consensus-1 expands around humans, tiling the
prairies and icecaps with factories and solar panels. Eventually it
finds the remaining humans too much of an impediment: in mid-2030, the
AI releases a dozen quiet-spreading biological weapons in major cities,
lets them silently infect almost everyone, then triggers them with a
chemical spray. Most are dead within hours; the few survivors
(e.g. preppers in bunkers, sailors on submarines) are mopped up by
drones. Robots scan the victims' brains, placing copies in memory for
future study or revival.
Let us dissect this scenario. Even today, there are
technologies under development that can make that kind of a "clean
victory" for the AI much less realistic:
- Air filtering, ventilation and UVC, which can
passively greatly reduce airborne disease transmission rates.
- Forms of real-time passive testing, for two use
cases:
- Passively detect if a person is infected within hours and inform
them.
- Rapidly detect new viral sequences in the environment that are not
yet known (PCR cannot do this, because it relies on amplifying pre-known
sequences, but more complex techniques can).
- Various methods of enhancing and prompting our immune
system, far more effective, safe, generalized and easy to
locally manufacture than what we saw with Covid vaccines, to make it
resistant to natural and engineered pandemics. Humans evolved in an
environment where the global population was 8 million and we spent most
of our time outside, so intuitively there should be easy wins in
adapting to the higher-threat world of today.
These methods stacked together reduce the R0 of
airborne diseases by perhaps 10-20x (think: 4x reduced transmission from
better air, 3x from infected people learning immediately that they need
to quarantine, 1.5x from even naively upregulating the respiratory
immune system), if not more. This would be enough to make all
presently-existing airborne diseases (even measles) no longer capable of
spreading, and these numbers are far from the theoretical optima.
With sufficient adoption of real-time viral sequencing for early
detection, the idea that a "quiet-spreading biological weapon"
could reach the world population without setting off alarms
becomes very suspect. Note that this would even catch advanced
approaches like releasing multiple pandemics and chemicals that only
become dangerous in combination.
Now, let's remember that we are discussing the AI 2027
scenario, in which nanobots and Dyson swarms are listed as "emerging
technology" by 2030. The efficiency gains that this implies are
also a reason to be optimistic about the widespread deployment of the
above countermeasures, despite the fact that, today in 2025, we live in
a world where humans are slow and lazy and large portions of government
services still run on pen and paper (without any valid security
justification). If the world's strongest AI can turn the world's
forests and fields into factories and solar farms by 2030, the world's
second-strongest AI will be able to install a bunch of sensors and lamps
and filters in our buildings by 2030.
But let's take AI 2027's assumptions further, and go full science
fiction:
- Microscopic air filtration inside the body (nose, mouth, lungs)
- An automatic pipeline that goes from a new pathogen being discovered
to instructions to fine-tune your immune system against it, which can be
applied immediately
- If "brain uploading" is available, then just replace your entire
body with a Tesla Optimus
or Unitree
(with real-time streaming backups to a K-Scale open source humanoid robot -
a different brand for redundancy - located on Mars)
- Various new manufacturing techniques, that will likely get
super-optimized by default as part of a robot economy, will be able to
make far more of these things locally than today, without the need for
global supply chains
In a world where cancer and aging were cured by Jan 2029, and
progress accelerates further from there, and we're in mid-2030, it
actually feels implausible that we don't have a wearable device
that can bio-print and inject things into you in real time to keep you
safe even from arbitrary infections (and poisons). The bio arguments
above don't cover mirror life and
mosquito-sized killer drones (projected in the AI 2027 scenario to be
available starting 2029). However, these options are not capable of
anything like the sudden clean victory that the AI 2027 scenario
portrays, and it's intuitively much more clear how to symmetrically
defend against them.
Thus, a bioweapon is actually quite unlikely to lead to a
clean annihilation of the human population in the way that the AI 2027
scenario describes. Now, the results of everything I describe
will certainly be far from a clean victory for the humans as well. No
matter what we do (except perhaps the "upload our minds into robots"
option), a full-on AI bio-war would still be extremely dangerous.
However, there is value in meeting a bar much lower than clean victory
for humans: a high probability of an attack even partially
failing would serve as a strong deterrent discouraging an AI
that already occupies a powerful position in the world from even
attempting any kind of attack. And, of course, the longer AI timelines
get the more likely it is that this kind of approach actually can more
fully achieve its promises.
What about
combining bio with other types of attack?
The success of the kinds of countermeasures described above,
especially the collective measures that would be needed to save more
than a small community of hobbyists, rests on three preconditions:
- The world's physical security (incl bio and
anti-drone) is run by localized authority (whether
human or AI) that is not all puppets of Consensus-1 (the name for the AI
that ends up controlling the world and then killing everyone in the AI
2027 scenario)
- Consensus-1 is not able to hack all the other
countries (or cities or other security zones)' defenses and zero them
out immediately.
- Consensus-1 does not control the global infosphere
to the point that nobody wants to do the work to try to defend
themselves
Intuitively, (1) could go both ways. Today, some police forces are
highly centralized with strong national command structures, and other
police forces are localized. If physical security has to rapidly
transform to meet the needs of the AI era, then the landscape will reset
entirely, and the new outcomes will depend on choices made over the next
few years. Governments could get lazy and all depend on Palantir. Or
they could actively choose some option that combines locally developed
and open-source technology. Here, I think that we need to just make the
right choice.
A lot of pessimistic discourse on these topics assumes that (2) and
(3) are lost causes. So let's look into each in more detail.
Cybersecurity
doom is also far from a slam-dunk
It is a common view among both the public and professionals that true
cybersecurity is a lost cause, and the best we can do is patch bugs
quickly as they get discovered, and maintain
deterrence against cyberattackers by stockpiling our own discovered
vulnerabilities. Perhaps the best that we can do is the Battlestar
Galactica scenario, where almost all human ships were taken offline
all at once by a Cylon cyberattack, and the only ships left standing
were safe because they did not use any networked technology at all. I do
not share this view. Rather, my view is that the "endgame" of
cybersecurity is very defense-favoring, and with the kinds of rapid
technology development that AI 2027 assumes, we can get
there.
One way to see this is to use AI researchers' favorite technique:
extrapolating trends. Here is the trendline implied by a GPT
Deep Research survey on bug rates per 1000 lines of code over time,
assuming top-quality security techniques are used.

On top of this, we have been seeing serious improvements in both
development and widespread consumer adoption of sandboxing and other
techniques for isolating and minimizing trusted codebases. In the short
term, a superintelligent bug finder that only the attacker has access to
will be able to find lots of bugs. But if highly intelligent agents for
finding bugs or formally verifying code are available out in the open,
the natural endgame equilibrium is that the software developer finds all
the bugs as part of the continuous-integration pipeline before releasing
the code.
I can see two compelling reasons why even in this world, bugs will
not be close to fully eradicated:
- Flaws that arise because human intention is itself very
complicated, and so the bulk of the difficulty is in building
an accurate-enough model of our intentions, rather than in the code
itself.
- Non-security-critical components, where we'll
likely continue the pre-existing trend in consumer tech, preferring to
consume gains in software engineering by writing much more code to
handle many more tasks (or lowering development budgets), instead of
doing the same number of tasks at an ever-rising security standard.
However, neither of these categories applies to situations like "can
an attacker gain root access to the thing keeping us alive?", which is
what we are talking about here.
I acknowledge that my view is more optimistic than is currently
mainstream thought among very smart people in cybersecurity. However,
even if you disagree with me in the context of today's world,
it is worth remembering that the AI 2027 scenario assumes
superintelligence. At the very least, if "100M Wildly Superintelligent
copies thinking at 2400x human speed" cannot get us to having code that
does not have these kinds of flaws, then we should definitely
re-evaluate the idea that superintelligence is anywhere remotely as
powerful as what the authors imagine it to be.
At some point, we will need to greatly level up our standards
for security not just for software, but also for hardware. IRIS is one present-day
effort to improve the state of hardware verifiability. We can take
something like IRIS as a starting point, or create even better
technologies. Realistically, this will likely involve a
"correct-by-construction" approach, where hardware manufacturing
pipelines for critical components are deliberately designed with
specific verification processes in mind. These are all things that
AI-enabled automation will make much easier.
Super-persuasion
doom is also far from a slam-dunk
As I mentioned above, the other way in which much greater defensive
capabilities may turn out not to matter is if AI simply convinces a
critical mass of us that defending ourselves against a superintelligent
AI threat is not needed, and that anyone who tries to figure out
defenses for themselves or their community is a criminal.
My general view for a while has been that two things can improve our
ability to resist super-persuasion:
- A less monolithic info ecosystem. Arguably we are
already slowly moving into a post-Twitter world where the internet is becoming
more fragmented. This is good (even if the fragmentation process is
chaotic), and we generally need more info multipolarity.
- Defensive AI. Individuals need to be equipped with
locally-running AI that is explicitly loyal to them, to help balance out
dark patterns and threats that they see on the internet. There are
scattered pilots of these kinds of ideas (eg. see the Taiwanese Message Checker app, which
does local scanning on your phone), and there are natural markets to
test these ideas further (eg. protecting people from scams), but this
can benefit from much more effort.


Right image, from top to bottom: URL checking,
cryptocurrency address checking, rumor checking. Applications like this
can become a lot more personalized,
user-sovereign and powerful.
The battle should not be one of a Wildly Superintelligent
super-persuader against you. The battle should be one of a
Wildly Superintelligent super-persuader against you plus a
slightly less Wildly Superintelligent analyzer acting on your
behalf.
This is what should happen. But will it happen?
Universal adoption of info defense tech is a very difficult goal to
achieve, especially within the short timelines that the AI 2027 scenario
assumes. But arguably much more modest milestones will be sufficient. If
collective decisions are what count the most and, as the AI 2027
scenario implies, everything important happens within one single
election cycle, then strictly speaking the important thing is for the
direct decision makers (politicians, civil servants, and
programmers and other actors in some corporations) to have access to
good info defense tech. This is relatively more achievable within a
short timeframe, and in my experience many such individuals are
comfortable talking to multiple AIs to assist them in decision-making
already.
Implications of these
arguments
In the AI 2027 world, it is taken as a foregone conclusion that a
superintelligent AI can easily and quickly dispose of the rest of
humanity, and so the only thing we can do is do our best to
ensure that the leading AI is benevolent. In my world, the situation is
actually much more complicated, and whether or not the leading AI is
powerful enough to easily eliminate the rest of humanity (and other AIs)
is a knob whose position is very much up for debate, and which we can
take actions to tune.
If these arguments are correct, it has some implications for policy
today that are sometimes similar, and sometimes different, from the
"mainstream AI safety canon":
- Slowing down superintelligent AI is still good.
It's less risky if superintelligent AI comes in 10 years than in 3
years, and it's even less risky if it comes in 30 years. Giving our
civilization more time to prepare is good.
How to do this is a
challenging question. I think it's generally good that the proposed 10
year ban on state-level AI regulation in the US was
rejected, but, especially after the failure of earlier proposals
like SB-1047,
it's less clear where we go from here. My view is that the least
intrusive and most robust way to slow down risky forms of AI progress
likely involves some form of treaty regulating the most advanced
hardware. Many of the hardware cybersecurity technologies needed to
achieve effective defense are also technologies useful in verifying
international hardware treaties, so there are even synergies there.
That said, it's worth noting that I consider the primary source
of risk to be military-adjacent actors, and they will push hard to
exempt themselves from such treaties; this must not be allowed, and if
it ends up happening, then the resulting military-only AI progress may
increase risks.
- Alignment work, in the sense of making AIs more likely to do
good things and less likely to do bad things, is still good.
The main exception is, and continues to be, situations where alignment
work ends up sliding into improving capabilities (eg. see critical takes on the impact of
evals)
- Regulation to improve transparency in AI labs is still
good. Motivating AI labs to behave properly is still something
that will decrease risk, and transparency is one good way to do this.
- An "open source bad" mentality becomes more risky.
Many people are against open-weights AI on the basis that defense is
unrealistic, and the only happy path is one where the good guys with a
well-aligned AI get to superintelligence before anyone less
well-intentioned gains access to any very dangerous capabilities. But
the arguments in this post paint a different picture: defense is
unrealistic precisely in those worlds where one actor gets very far
ahead without anyone else at least somewhat keeping up with them.
Technological diffusion to maintain balance of power becomes important.
But at the same time, I would definitely not go so far as to say that
accelerating frontier AI capabilities growth is good just
because you're doing it open source.
- A "we must race
to beat China" mentality among US labs becomes more
risky, for similar reasons. If hegemony is not a safety buffer,
but rather a source of risk, then this is a further argument against the
(unfortunately too common) idea that a well-meaning person should join a
leading AI lab to help it win even faster.
- Initiatives like Public AI
become more of a good idea, both to ensure wide
distribution of AI capabilities and to ensure that infrastructural
actors actually have the tools to act quickly to use new AI capabilities
in some of the ways that this post requires.
- Defense technologies should be more of the "armor the sheep"
flavor, less of the "hunt down all the wolves" flavor.
Discussions about the vulnerable world
hypothesis often assume that the only solution is a hegemon
maintaining universal surveillance to prevent any potential threats from
emerging. But in a non-hegemonic world, this is not a workable approach
(see also: security
dilemma), and indeed top-down mechanisms of defense could easily be
subverted by a powerful AI and turned into its offense. Hence, a larger
share of the defense instead needs to happen by doing the hard work to
make the world less vulnerable.
The above arguments are speculative, and no actions should be taken
based on the assumption that they are near-certainties. But the AI 2027
story is also speculative, and we should avoid taking actions on the
assumption that specific details of it are near-certainties.
I particularly worry about the common assumption that building up one
AI hegemon, and making sure that they are "aligned" and "win the race",
is the only path forward. It seems to me that there is a pretty high
risk that such a strategy will decrease our safety, precisely
by removing our ability to have countermeasures in the case where the
hegemon becomes misaligned. This is especially true if, as is likely to
happen, political pressures lead to such a hegemon becoming tightly
integrated with military applications (see [1]
[2]
[3]
[4]),
which makes many alignment strategies less likely to be effective.
In the AI 2027 scenario, success hinges on the United States choosing
to take the path of safety instead of the path of doom, by voluntarily
slowing down its AI progress at a critical moment in order to make sure
that Agent-5's internal thought process is human-interpretable. Even if
this happens, success if not guaranteed, and it is not clear how
humanity steps down from the brink of its ongoing survival depending on
the continued alignment of one single superintelligent mind.
Acknowledging that making the world less vulnerable is actually possible
and putting a lot more effort into using humanity's newest technologies
to make it happen is one path worth trying, regardless of how the next
5-10 years of AI go.
My response to AI 2027
2025 Jul 10 See all postsSpecial thanks to Balvi volunteers for feedback and review
In April this year, Daniel Kokotajlo, Scott Alexander and others released what they describe as "a scenario that represents our best guess about what [the impact of superhuman AI over the next 5 years] might look like". The scenario predicts that by 2027 we will have made superhuman AI and the entire future of our civilization hinges on how it turns out: by 2030 we will get either (from the US perspective) utopia or (from any human's perspective) total annihilation.
In the months since then, there has been a large volume of responses, with varying perspectives on how likely the scenario that they presented is. For example:
Of the critical responses, most tend to focus on the issue of fast timelines: is AI progress actually going to continue and even accelerate as fast as Kokotajlo et al say it will? This is a debate that has been happening in AI discourse for several years now, and plenty of people are very doubtful that superhuman AI will come that quickly. Recently, the length of tasks that AIs can perform fully autonomously has been doubling roughly every seven months. If you assume this trend continues without limit, AIs will be able to operate autonomously for the equivalent of a whole human career in the mid-2030s. This is still a very fast timeline, but much slower than 2027. Those with longer timelines tend to argue that there is a category difference between "interpolation / pattern-matching" (done by LLMs today) and "extrapolation / genuine original thought" (so far still only done by humans), and automating the latter may require techniques that we barely have any idea how to even start developing. Perhaps we are simply replaying what happened when we first saw mass adoption of calculators, wrongly assuming that just because we've rapidly automated one important category of cognition, everything else is soon to follow.
This post will not attempt to directly enter the timeline debate, or even the (very important) debate about whether or not superintelligent AI is dangerous by default. That said, I acknowledge that I personally have longer-than-2027 timelines, and the arguments I will make in this post become more compelling the longer the timelines are. In general, this post will explore a critique from a different angle:
The AI 2027 scenario implicitly assumes that the capabilities of the leading AI (Agent-5 and then Consensus-1), rapidly increase, to the point of gaining godlike economic and destructive powers, while everyone else's (economic and defensive) capabilities stay in roughly the same place. This is incompatible with the scenario's own admission (in the infographic) that even in the pessimistic world, we should expect to see cancer and even aging cured, and mind uploading available, by 2029.
Some of the countermeasures that I will describe in this post may seem to readers to be technically feasible but unrealistic to deploy into the real world on a short timeline. In many cases I agree. However, the AI 2027 scenario does not assume the present-day real world: it assumes a world where in four years (or whatever timeline by which doom is possible), technologies are developed that give humanity powers far beyond what we have today. So let's see what happens when instead of just one side getting AI superpowers, both sides do.
Bio doom is far from the slam-dunk that the scenario describes
Let us zoom in to the "race" scenario (the one where everyone dies because the US cares too much about beating China to value humanity's safety). Here's the part where everyone dies:
Let us dissect this scenario. Even today, there are technologies under development that can make that kind of a "clean victory" for the AI much less realistic:
These methods stacked together reduce the R0 of airborne diseases by perhaps 10-20x (think: 4x reduced transmission from better air, 3x from infected people learning immediately that they need to quarantine, 1.5x from even naively upregulating the respiratory immune system), if not more. This would be enough to make all presently-existing airborne diseases (even measles) no longer capable of spreading, and these numbers are far from the theoretical optima.
With sufficient adoption of real-time viral sequencing for early detection, the idea that a "quiet-spreading biological weapon" could reach the world population without setting off alarms becomes very suspect. Note that this would even catch advanced approaches like releasing multiple pandemics and chemicals that only become dangerous in combination.
Now, let's remember that we are discussing the AI 2027 scenario, in which nanobots and Dyson swarms are listed as "emerging technology" by 2030. The efficiency gains that this implies are also a reason to be optimistic about the widespread deployment of the above countermeasures, despite the fact that, today in 2025, we live in a world where humans are slow and lazy and large portions of government services still run on pen and paper (without any valid security justification). If the world's strongest AI can turn the world's forests and fields into factories and solar farms by 2030, the world's second-strongest AI will be able to install a bunch of sensors and lamps and filters in our buildings by 2030.
But let's take AI 2027's assumptions further, and go full science fiction:
In a world where cancer and aging were cured by Jan 2029, and progress accelerates further from there, and we're in mid-2030, it actually feels implausible that we don't have a wearable device that can bio-print and inject things into you in real time to keep you safe even from arbitrary infections (and poisons). The bio arguments above don't cover mirror life and mosquito-sized killer drones (projected in the AI 2027 scenario to be available starting 2029). However, these options are not capable of anything like the sudden clean victory that the AI 2027 scenario portrays, and it's intuitively much more clear how to symmetrically defend against them.
Thus, a bioweapon is actually quite unlikely to lead to a clean annihilation of the human population in the way that the AI 2027 scenario describes. Now, the results of everything I describe will certainly be far from a clean victory for the humans as well. No matter what we do (except perhaps the "upload our minds into robots" option), a full-on AI bio-war would still be extremely dangerous. However, there is value in meeting a bar much lower than clean victory for humans: a high probability of an attack even partially failing would serve as a strong deterrent discouraging an AI that already occupies a powerful position in the world from even attempting any kind of attack. And, of course, the longer AI timelines get the more likely it is that this kind of approach actually can more fully achieve its promises.
What about combining bio with other types of attack?
The success of the kinds of countermeasures described above, especially the collective measures that would be needed to save more than a small community of hobbyists, rests on three preconditions:
Intuitively, (1) could go both ways. Today, some police forces are highly centralized with strong national command structures, and other police forces are localized. If physical security has to rapidly transform to meet the needs of the AI era, then the landscape will reset entirely, and the new outcomes will depend on choices made over the next few years. Governments could get lazy and all depend on Palantir. Or they could actively choose some option that combines locally developed and open-source technology. Here, I think that we need to just make the right choice.
A lot of pessimistic discourse on these topics assumes that (2) and (3) are lost causes. So let's look into each in more detail.
Cybersecurity doom is also far from a slam-dunk
It is a common view among both the public and professionals that true cybersecurity is a lost cause, and the best we can do is patch bugs quickly as they get discovered, and maintain deterrence against cyberattackers by stockpiling our own discovered vulnerabilities. Perhaps the best that we can do is the Battlestar Galactica scenario, where almost all human ships were taken offline all at once by a Cylon cyberattack, and the only ships left standing were safe because they did not use any networked technology at all. I do not share this view. Rather, my view is that the "endgame" of cybersecurity is very defense-favoring, and with the kinds of rapid technology development that AI 2027 assumes, we can get there.
One way to see this is to use AI researchers' favorite technique: extrapolating trends. Here is the trendline implied by a GPT Deep Research survey on bug rates per 1000 lines of code over time, assuming top-quality security techniques are used.
On top of this, we have been seeing serious improvements in both development and widespread consumer adoption of sandboxing and other techniques for isolating and minimizing trusted codebases. In the short term, a superintelligent bug finder that only the attacker has access to will be able to find lots of bugs. But if highly intelligent agents for finding bugs or formally verifying code are available out in the open, the natural endgame equilibrium is that the software developer finds all the bugs as part of the continuous-integration pipeline before releasing the code.
I can see two compelling reasons why even in this world, bugs will not be close to fully eradicated:
However, neither of these categories applies to situations like "can an attacker gain root access to the thing keeping us alive?", which is what we are talking about here.
I acknowledge that my view is more optimistic than is currently mainstream thought among very smart people in cybersecurity. However, even if you disagree with me in the context of today's world, it is worth remembering that the AI 2027 scenario assumes superintelligence. At the very least, if "100M Wildly Superintelligent copies thinking at 2400x human speed" cannot get us to having code that does not have these kinds of flaws, then we should definitely re-evaluate the idea that superintelligence is anywhere remotely as powerful as what the authors imagine it to be.
At some point, we will need to greatly level up our standards for security not just for software, but also for hardware. IRIS is one present-day effort to improve the state of hardware verifiability. We can take something like IRIS as a starting point, or create even better technologies. Realistically, this will likely involve a "correct-by-construction" approach, where hardware manufacturing pipelines for critical components are deliberately designed with specific verification processes in mind. These are all things that AI-enabled automation will make much easier.
Super-persuasion doom is also far from a slam-dunk
As I mentioned above, the other way in which much greater defensive capabilities may turn out not to matter is if AI simply convinces a critical mass of us that defending ourselves against a superintelligent AI threat is not needed, and that anyone who tries to figure out defenses for themselves or their community is a criminal.
My general view for a while has been that two things can improve our ability to resist super-persuasion:
Right image, from top to bottom: URL checking, cryptocurrency address checking, rumor checking. Applications like this can become a lot more personalized, user-sovereign and powerful.
The battle should not be one of a Wildly Superintelligent super-persuader against you. The battle should be one of a Wildly Superintelligent super-persuader against you plus a slightly less Wildly Superintelligent analyzer acting on your behalf.
This is what should happen. But will it happen? Universal adoption of info defense tech is a very difficult goal to achieve, especially within the short timelines that the AI 2027 scenario assumes. But arguably much more modest milestones will be sufficient. If collective decisions are what count the most and, as the AI 2027 scenario implies, everything important happens within one single election cycle, then strictly speaking the important thing is for the direct decision makers (politicians, civil servants, and programmers and other actors in some corporations) to have access to good info defense tech. This is relatively more achievable within a short timeframe, and in my experience many such individuals are comfortable talking to multiple AIs to assist them in decision-making already.
Implications of these arguments
In the AI 2027 world, it is taken as a foregone conclusion that a superintelligent AI can easily and quickly dispose of the rest of humanity, and so the only thing we can do is do our best to ensure that the leading AI is benevolent. In my world, the situation is actually much more complicated, and whether or not the leading AI is powerful enough to easily eliminate the rest of humanity (and other AIs) is a knob whose position is very much up for debate, and which we can take actions to tune.
If these arguments are correct, it has some implications for policy today that are sometimes similar, and sometimes different, from the "mainstream AI safety canon":
How to do this is a challenging question. I think it's generally good that the proposed 10 year ban on state-level AI regulation in the US was rejected, but, especially after the failure of earlier proposals like SB-1047, it's less clear where we go from here. My view is that the least intrusive and most robust way to slow down risky forms of AI progress likely involves some form of treaty regulating the most advanced hardware. Many of the hardware cybersecurity technologies needed to achieve effective defense are also technologies useful in verifying international hardware treaties, so there are even synergies there.
That said, it's worth noting that I consider the primary source of risk to be military-adjacent actors, and they will push hard to exempt themselves from such treaties; this must not be allowed, and if it ends up happening, then the resulting military-only AI progress may increase risks.
The above arguments are speculative, and no actions should be taken based on the assumption that they are near-certainties. But the AI 2027 story is also speculative, and we should avoid taking actions on the assumption that specific details of it are near-certainties.
I particularly worry about the common assumption that building up one AI hegemon, and making sure that they are "aligned" and "win the race", is the only path forward. It seems to me that there is a pretty high risk that such a strategy will decrease our safety, precisely by removing our ability to have countermeasures in the case where the hegemon becomes misaligned. This is especially true if, as is likely to happen, political pressures lead to such a hegemon becoming tightly integrated with military applications (see [1] [2] [3] [4]), which makes many alignment strategies less likely to be effective.
In the AI 2027 scenario, success hinges on the United States choosing to take the path of safety instead of the path of doom, by voluntarily slowing down its AI progress at a critical moment in order to make sure that Agent-5's internal thought process is human-interpretable. Even if this happens, success if not guaranteed, and it is not clear how humanity steps down from the brink of its ongoing survival depending on the continued alignment of one single superintelligent mind. Acknowledging that making the world less vulnerable is actually possible and putting a lot more effort into using humanity's newest technologies to make it happen is one path worth trying, regardless of how the next 5-10 years of AI go.