Designing the DIP

The development of superintelligence is the greatest threat to the continued existence of our species. To secure a good future for humanity, we need an effective plan to stop the development of superintelligence.

Unfortunately, existing plans are inadequate — they address only parts of the problem, and do so secretively, slowly, and undemocratically.

To solve this, we have developed the Direct Institutional Plan (DIP). It applies the principles outlined in our book A Narrow Path, to prevent the development of superintelligence and keep humanity in control.

In designing the DIP, we have identified four essential properties of an effective plan:

  • Strategic

    • Addresses the entire problem rather than focusing on limited components

  • Public

    • Operates with transparency about both strategy and beliefs

  • Scalable

    • Effectively utilizes additional resources instead of stagnating or deteriorating with growth

  • Democratic

    • Strengthens and leverages democratic institutions instead of concentrating power

For each criterion, we show its importance, detail how existing plans fail at them, and then explain how the design of the DIP ensures that this criterion is satisfied.

Taken together, these principles outline an approach to effective plans that applies beyond just addressing risks from superintelligence. An approach based on the basic principles of civic engagement: honestly and openly informing all relevant actors in the democratic process, and equipping them to take their own stance on the issue.

Strategic

An effective plan is strategic: it addresses the entire problem end-to-end, rather than focusing only on a small subset. A strategic plan should, if executed successfully, solve the entire problem it seeks to address. 

Non-strategic plans typically address only part of the problem and neglect the rest. Consider AI evaluation organizations like METR, whose plans follow this pattern:

  1. Develop tests to detect dangerous AI behaviors

  2. Improve these tests

  3. At some point, hopefully, the tests successfully detect AIs showing very dangerous behavior.

  4. Share this information with some relevant entity (a government or a company), and hope that entity will take some unspecified action. End of the plan.

Even if such a plan was executed perfectly (and the astounding lack of reliability of even the most basic evaluations suggests otherwise), it would not prevent extinction risk from superintelligence.

Even if these evaluations successfully identify dangerous AI behaviors, they do not establish mechanisms to ensure that this information leads to appropriate action, nor do they identify the concrete appropriate actions needed to put in place that would curtail the risks. Detection without enforcement is insufficient. There is nothing in this plan that forbids an AI company from ignoring the report and simply continuing to build superintelligence and kill everyone.  

In other words, non-strategic plans assume that some part of the problem, often the trickiest and most important part, will be done by someone else. Who? Nobody knows, but someone will do it. Except in practice, if all plans look like that, no one addresses the hard parts of the problem.

The same issue plagues many other approaches in AI, such as focusing on solving mechanistic interpretability or publishing more AI governance papers.

We designed the DIP to be strategic by focusing on the entire problem. If the DIP succeeds, countries across the world will pass legislation banning superintelligence development, and an international agreement will enforce the ban. This would address the whole problem of superintelligence development at once, instead of only focusing on a small component and hoping that other plans would do the rest.

Public

An effective plan is public: it is accessible to anyone, and it allows and encourages the honest sharing of beliefs about the problem and its solution. This ensures clarity on what the plan is, who supports it, and what is being done to make it happen.

From the beginning[1], plans to deal with superintelligence pushed for keeping the prospect of superintelligence and the extent of the risks private to a small elite. The position of many actors in the know has been to deliberately downplay or deny extinction risks and the enormous effects of superintelligence when communicating with the general public and governments.

In short, many people aware of AI posing extinction risks have been saying different things when speaking to AI insiders versus outsiders for over a decade, and have spread a culture of underplaying the risks and magnitude of the problem whenever talking to non-AI insiders, including governments and the general public.

Here are multiple examples of this faulty approach throughout the years:

  • Dario Amodei (Anthropic CEO) and Paul Christiano (Head of AI safety at the US AISI) co-wrote “Concrete Problems in AI Safety”, an influential paper which discusses accidental risks from AI in a purely technical way, without ever mentioning extinction risks. And yet, both have discussed these at length in less public venues (see this example for Christiano).

  • LessWrong, the main internet forum for discussing extinction risks, hosted a lot of criticisms of FLI’s Open Letter on AI risks, the first large-scale public acknowledgement of the risks from superintelligence. Criticisms focused on fear of public backlash from a clear statement of risks, disagreement with openly involving governments and democratic institutions (which were deemed untrustworthy), and claims that such public statements might make the situation worse.

These private plans fail for two key reasons. First, the misrepresentation and dishonesty erode trust and make coordination to address a problem impossible. Second, hiding risks and plans is self-defeating: when everybody in the know agrees to never mention the risks in public, these people become the reason why these risks are not discussed, believed and addressed. How can society address a problem if the problem is being kept secret?

The DIP is designed to be public by nature. The plan itself is public, easy for others to replicate and coordinate with. And publicly and honestly spreading awareness about superintelligence and the solutions needed is the cornerstone of the plan.

Scalable

An effective plan is scalable: it grows and improves with additional people and resources rather than stagnating or deteriorating.

Most plans tackling risks from superintelligence simply don’t scale. When provided with more resources, they either reach a point where there is nothing more to do, or they actually start worsening the situation.

Notably, influence-focused plans which aim to position trusted individuals in positions of power, at AI companies and governments, fail to scale. The limited number of influential positions creates competition that can become counterproductive, with organizations undermining each other despite shared goals. And the secrecy shrouding such plans directly gets in the way of any significant scaling.

When designing the DIP, we prioritized scalability by focusing on approaches that naturally compound with more people and more resources. This led us to emphasize public awareness and solution dissemination. This scales well because there is always room for more campaigning, across more branches of government, more components of the democratic process, and more jurisdictions.

The public nature of the DIP means that each victory is visible to all, and so creates momentum for more and more scaling. Each lawmaker supporting the policies, each country deciding to pass laws on superintelligence, each individual taking a public stance makes the plan more likely to succeed.

Democratic

An effective plan works with the fundamental  institutions of democracy and international collaboration, not against them. It strengthens the democratic process and improves international collaboration through open civic dialogue.

Pursuing civic engagement ensures that such a plan strengthens the ability of society to wisely react to civilizational-scale problems, like risks from superintelligence.

Unfortunately, most existing plans on AI focus instead on accumulating power into the hands of a small elite, in the hope that those few individuals will then steer the development of AI responsibly. Such approaches, even on the unlikely chance they succeed, undermine democratic decision-making and international trust by weakening existing institutions and bypassing citizens' and other countries' voices. They make society less able to solve collective problems and represent people.

One influential undemocratic plan is that of Open Philanthropy, the main Effective Altruism (EA) donor. Holden Karnofsky, founder and leader of Open Philanthropy, describes his vision for a successful plan in this post.

First, Open Philanthropy and its associates who focus on AI believe superintelligence will be built in the near-future, as articulated by Holden Karnofsky. Their emphasis is thus to ensure that it is built by the most "responsible" actors in order to reduce catastrophic risk. This policy of ensuring the right people are in positions of power is present both in this Holden Karnofsky post (where such actors are referred as “cautious actors”) and in this recent essay by Will MacAskill, a key figure in EA -(where they are called “responsible actors”). 

Historically, the way Open Philanthropy has pursued this plan has been by placing EAs in positions of influence in key AI companies, governments, and nonprofits:

  1. The first attempt was with OpenAI, which received a grant from Open Philanthropy explicitly in exchange for a board seat for Holden Karnofsky

  2. After their perceived influence over OpenAI waned, Karnofsky and the FTX Future Fund, headed by ex-OP officer Nick Beckstead, helped the founding of a new AGI company, Anthropic, staffed with allied “responsible” actors: Daniela Amodei, Karnofsky’s wife, as president; Dario Amodei, Karnofsky’s brother-in-law, and more. Recently,  Karnofsky officially joined Anthropic.

  3. On the policy side, Open Philanthropy has dedicated some of its largest AI grants to establish organizations that seed the US government with advisors. This includes CSET, a think-tank founded thanks to over $80 million from Open Philanthropy, which placed multiple fellows in the Department of Commerce under President Biden, and at least one special advisor to the White House. Another such grant founded the Horizon Institute, an organization that funds the placement of their fellows in congressional offices, executive agencies, and think tanks in the US.

By its very nature, the overall Open Philanthropy plan for AGI is undemocratic: it centralizes control over the main AGI projects in the hands of a small group of “trusted” actors, where Open Philanthropy leadership is the sole arbiter of what constitutes “trustworthiness”.

The only way to participate in this plan is to surrender your resources to Open Philanthropy and hope for future rewards. Democratic institutions – and therefore the general public – are excluded from this process. This is a fundamentally undemocratic method for steering our future. At no point are citizens informed about the risks, or asked for input on what to do about them.

Like Open Philanthropy, most early believers in these risks pushed undemocratic plans. As a consequence, both the general public and the governments have been left in the dark about these risks, even in countries where companies are developing superintelligence. This is a colossal failure of civic engagement.

When designing the DIP, we ensured it was democratic by working through democratic institutions instead of sidestepping them. The DIP starts with the first necessary step: clearly and publicly stating the problem, informing the public and decision makers, proposing solutions via democratic channels, and giving citizens the power to take action themselves. 

Footnotes

[1] Note that Eliezer Yudkowsky, the author of the linked early plan, has since changed his stance, and now emphasizes public awareness in his work at MIRI.

Get Updates

Sign up to our newsletter if you'd like to stay updated on our work,
how you can get involved, and to receive a weekly roundup of the latest AI news.