AI as the engine, humans as the steering wheel

2025 Feb 28 See all posts


AI as the engine, humans as the steering wheel

Special thanks to Devansh Mehta, Davide Crapis and Julian Zawistowski for feedback and review.

If you ask people what they like about democratic structures, whether governments, workplaces, or blockchain-based DAOs, you will often hear the same arguments: they avoid concentration of power, they give their users strong guarantees because there isn't a single person who can completely change the system's direction on a whim, and they can make higher-quality decisions by gathering the perspectives and wisdom of many people.

If you ask people what they dislike about democratic structures, they will often give the same complaints: average voters are not sophisticated, because each voter only has a small chance of affecting the outcome, few voters put high-quality thought into their decisions, and you often get either low participation (making the system easy to attack) or de-facto centralization because everyone just defaults to trusting and copying the views of some influencer.

The goal of this post will be to explore a paradigm that could perhaps use AI to get us the benefits of democratic structures without the downsides. "AI as the engine, humans as the steering wheel". Humans provide only a small amount of information into the system, perhaps only a few hundred bits, but each of those bits is a well-considered and very high-quality bit. AI treats this data as an "objective function", and tirelessly makes a very large number of decisions doing a best-effort at fitting these objectives. In particular, this post will explore an interesting question: can we do this without enshrining a single AI at the center, instead relying on a competitive open market that any AI (or human-AI hybrid) is free to participate in?



Table of contents

Why not just put a single AI in charge?

The easiest way to insert human preferences into an AI-based mechanism is to make a single AI model, and have humans feed their preferences into it somehow. There are easy ways to do this: you can just put a text file containing a list of people's instructions into the system prompt. Then you use one of many "agentic AI frameworks" to give the AI the ability to access the internet, hand it the keys to your organization's assets and social media profiles, and you're done.

After a few iterations, this may end up good enough for many use cases, and I fully expect that in the near future we are going to see many structures involving AIs reading instructions given by a group (or even real-time reading a group chat) and taking actions as a result.

Where this structure is not ideal is as a governing mechanism for long-lasting institutions. One valuable property for long-lasting institutions to have is credible neutrality. In my post introducing this concept, I listed four properties that are valuable for credible neutrality:

  1. Don't write specific people or specific outcomes into the mechanism
  2. Open source and publicly verifiable execution
  3. Keep it simple
  4. Don't change it too often

An LLM (or AI agent) satisfies 0/4. The model inevitably has a huge amount of specific people and outcome preferences encoded through its training process. Sometimes this leads to the AI having preferences in surprising directions, eg. see this recent research suggesting that major LLMs value lives in Pakistan far more highly than lives in the USA (!!). It can be open-weights, but that's far from open-source; we really don't know what devils are hiding in the depths of a model. It's the opposite of simple: the Kolmogorov complexity of an LLM is in the tens of billions of bits, about the same as that of all US law (federal + state + local) put together. And because of how rapidly AI is evolving, you'll have to change it every three months.

For this reason, an alternative approach that I favor exploring for many use cases is to make a simple mechanism be the rules of the game, and let AIs be the players. This is the same insight that makes markets so effective: the rules are a relatively dumb system of property rights, with edge cases decided by a court system that slowly accumulates and adjusts precedents, and all of the intelligence comes from entrepreneurs operating "at the edge".



The individual "game players" can be LLMs, swarms of LLMs interacting with each other and calling into various internet services, various AI + human combinations, and many other constructions; as a mechanism designer, you do not need to know. The ideal goal is to have a mechanism that functions as an automaton - if the goal of the mechanism is choosing what to fund, then it should feel as much as possible like Bitcoin or Ethereum block rewards.

The benefits of this approach are:

The goal of the steering mechanism is to provide a faithful representation of the participants' underlying goals. It only needs to provide a small amount of information, but it should be high-quality information.

You can think of the mechanism as exploiting an asymmetry between coming up with an answer and verifying the answer. This is similar to how a sudoku is difficult to solve, but it's easy to verify that a solution is correct. You (i) create an open market of players to act as "solvers", and then (ii) maintain a human-run mechanism that performs the much simpler task of verifying solutions that have been presented.

Futarchy

Futarchy was originally introduced by Robin Hanson as "vote values, but bet beliefs". A voting mechanism chooses a set of goals (which can be anything, with the caveat that they need to be measurable) which get combined into a metric M. When you need to make a decision (for simplicity, let's say it's YES/NO), you set up conditional markets: you ask people to bet on (i) whether YES or NO will be chosen, (ii) value of M if YES is chosen, otherwise zero, (iii) value of M if NO is chosen, otherwise zero. Given these three variables, you can figure out if the market thinks YES or NO is more bullish for the value of M.



"Price of the company share" (or, for a cryptocurrency, a token) is the most commonly cited metric, because it's so easy to understand and measure, but the mechanism can support many kinds of metrics: monthly active users, median self-reported happiness of some group of constituents, some quantifiable measure of decentralization, etc.

Futarchy was originally invented in the pre-AI era. However, futarchy fits very naturally in the "sophisticated solver, easy verifier" paradigm described in the previous section, and traders in a futarchy can be AI (or human+AI combinations) too. The role of the "solvers" (prediction market traders) is to determine how each proposed plan will affect the value of a metric in the future. This is hard. The solvers make money if they are right, and lose money if they are wrong. The verifiers (the people voting on the metric, adjusting the metric if they notice that it is being "gamed" or is otherwise becoming outdated, and determining the actual value of the metric at some future time) need only answer the simpler question "what is the value of the metric now?"

Distilled human judgement

Distilled human judgement is a class of mechanisms that works as follows. There is a very large number (think: 1 million) of questions that need to be answered. Natural examples include:

You have a jury that can answer such questions, though at the cost of spending a lot of effort on each answer. You ask the jury to only a small number of the questions (eg. if the total list has 1 million items, the jury perhaps only provides answers on 100 of them). You can even ask the jury indirect questions: instead of asking "what percent of total credit does Alice deserve?", you can ask "does Alice or Bob deserve more credit, and how many times more?". When designing the jury mechanism, you can reuse time-tested mechanisms from the real world like grants committees, courts (determining value of a judgement), appraisals, etc, though of course the jury participants are themselves welcome to use new-fangled AI research tools to help them come to an answer.

You then allow anyone to submit a list of numerical responses to the entire set of questions (eg. providing an estimate for how much credit each participant in the entire list deserves). Participants are encouraged to use AI to do this, though they can use any technique: AI, human-AI hybrid, AI with access to internet search and the ability to autonomously hire other human or AI workers, cybernetically enhanced monkeys, etc.

Once the full-list providers and the jurors have both submitted their answers, the full lists are checked against the jury answers, and some combination of the full lists that are most compatible with the jury answers is taken as the final answer.

The distilled human judgement mechanism is different from futarchy, but has some important similarities:


Toy example of distilled human judgement for credit assignment, see python code here. The script asks you to be the jury, and contains some AI-generated (and human-generated) full lists pre-included in the code. The mechanism identifies the linear combination of full lists that best-fits the jury answers. In this case, the winning combination is 0.199 * Claude's answer + 0.801 * Deepseek's answer; this combination matches the jury answers better than any single model does. These coefficients would also be the rewards given to the submitters.


The "humans as a steering wheel" aspect in this "defeating Sauron" example is reflected in two places. First, there is high-quality human judgement being applied on each individual question, though this is still leveraging the jury as "technocratic" evaluators of performance. Second, there is an implied voting mechanism that determines if "defeating Sauron" is even the right goal (as opposed to, say, trying to ally with him, or offering him all the territory east of some critical river as a concession for peace). There are other distilled human judgement use cases where the jury task is more directly values-laden: for example, imagine a decentralized social media platform (or sub-community) where the jury's job is to label randomly selected forum posts as following or not following the community's rules.

There are a few open variables within the distilled human judgement paradigm:

In general, the goal is to take human judgement mechanisms that are known to be effective and bias-minimizing and have stood the test of time (eg. think of how the adversarial structure of a court system includes both the two parties to a dispute, who have high information but are biased, and a judge, who has low information but is probably unbiased), and use an open market of AIs as a reasonably high-fidelity and very low-cost predictor of these mechanisms (this is similar to how "distillation" of LLMs works).

Deep funding

Deep funding is the application of distilled human judgement to the problem of filling in the weights of edges on a graph representing "what percent of the credit for X belongs to Y?"

It's easiest to show this directly with an example:


Output of two-level deep funding example: the ideological origins of Ethereum. See python code here.


Here, the goal is to distribute the credit for philosophical contributions that led to Ethereum. Let's look at an example:

This approach is designed to work in domains where work is built on top of previous work and the structure of this is highly legible. Academia (think: citation graphs) and open source software (think: library dependencies and forking) are two natural examples.

The goal of a well-functioning deep funding system would be to create and maintain a global graph, where any funder that is interested in supporting one particular project would be able to send funds to an address representing that node, and funds would automatically propagate to its dependencies (and recursively to their dependencies etc) based on the weights on the edges of the graph.

You could imagine a decentralized protocol using a built-in deep funding gadget to issue its token: some in-protocol decentralized governance would choose a jury, and the jury would run the deep funding mechanism, as the protocol automatically issues tokens and deposits them into the node corresponding to itself. By doing so, the protocol rewards all of its direct and indirect contributors in a programmatic way reminiscent of how Bitcoin or Ethereum block rewards rewarded one specific type of contributor (miners). By influencing the weights of the edges, the jury gets a way to continuously define what types of contributions it values. This mechanism could function as a decentralized and long-term-sustainable alternative to mining, sales or one-time airdrops.

Adding privacy

Often, making good judgements on questions like those in the examples above requires having access to private information: an organization's internal chat logs, information confidentially submitted by community members, etc. One benefit of "just using a single AI", especially for smaller-scale contexts, is that it's much more acceptable to give one AI access to the information than to make it public for everyone.

To make distilled human judgement or deep funding work in these contexts, we could try to use cryptographic techniques to securely give AIs access to private information. The idea is to use multi-party computation (MPC), fully homomorphic encryption (FHE), trusted execution environments (TEEs) or similar mechanisms to make the private information available, but only to mechanisms whose only output is a "full list submission" that gets directly put into the mechanism.

If you do this, then you would have to restrict the set of mechanisms to just being AI models (as opposed to humans or AI + human combinations, as you can't let humans see the data), and in particular models running in some specific substrate (eg. MPC, FHE, trusted hardware). A major research direction is figuring out near-term practical versions of this that are efficient enough to make sense.

Benefits of engine + steering wheel designs

Designs like this have a number of promising benefits. By far the most important one is that they allow for the construction of DAOs where human voters are in control of setting the direction, but they are not overwhelmed with an excessively large number of decisions to make. They hit the happy medium where each person doesn't have to make N decisions, but they have more power than just making one decision (how delegation typically works), and in a way that is more capable of eliciting rich preferences that are difficult to express directly.

Additionally, mechanisms like this seem to have an incentive smoothing property. What I mean here by "incentive smoothing" is a combination of two factors:

The terms confusion and diffusion here are taken from cryptography, where they are key properties of what makes ciphers and hash functions secure.

A good example of incentive smoothing in the real world today is the rule of law: the top level of the government does not regularly take actions of the form "give Alice's company $200M", "fine Bob's company $100M", etc, rather it passes rules that are intended to apply evenly to large sets of actors, which then get interpreted by a separate class of actors. When this works, the benefit is that it greatly reduces the benefits of bribery and other forms of corruption. And when it's violated (as it often is in practice), those issues quickly become greatly magnified.

AI is clearly going to be a very large part of the future, and this will inevitably include being a large part of the future of governance. However, if you are involving AI in governance, this has obvious risks: AI has biases, it could be intentionally corrupted during the training process, and AI technology is evolving so quickly that "putting an AI in charge" may well realistically mean "putting whoever is responsible for upgrading the AI in charge". Distilled human judgement offers an alternative path forward, which lets us harness the power of AI in an open free-market way while keeping a human-run democracy in control.

Anyone interested in more deeply exploring and participating in these mechanisms today is highly encouraged to check out the currently active deep funding round at https://cryptopond.xyz/modelfactory/detail/2564617.