Possible futures of the Ethereum protocol, part 5: The Purge

2024 Oct 26 See all posts


Possible futures of the Ethereum protocol, part 5: The Purge

Special thanks to Justin Drake, Tim Beiko, Matt Garnett, Piper Merriam, Marius van der Wijden and Tomasz Stanczak for feedback and review

One of Ethereum's challenges is that by default, any blockchain protocol's bloat and complexity grows over time. This happens in two places:

For Ethereum to sustain itself into the long term, we need a strong counter-pressure against both of these trends, reducing complexity and bloat over time. But at the same time, we need to preserve one of the key properties that make blockchains great: their permanence. You can put an NFT, a love note in transaction calldata, or a smart contract containing a million dollars onchain, go into a cave for ten years, come out and find it still there waiting for you to read and interact with. For dapps to feel comfortable going fully decentralized and removing their upgrade keys, they need to be confident that their dependencies are not going to upgrade in a way that breaks them - especially the L1 itself.


The Purge, 2023 roadmap.


Balancing between these two needs, and minimizing or reversing bloat, complexity and decay while preserving continuity, is absolutely possible if we put our minds to it. Living organisms can do it: while most age over time, a lucky few do not. Even social systems can have extreme longevity. On a few occasions, Ethereum has already shown successes: proof of work is gone, the SELFDESTRUCT opcode is mostly gone, and beacon chain nodes already store old data up to only six months. Figuring out this path for Ethereum in a more generalized way, and moving toward an eventual outcome that is stable for the long term, is the ultimate challenge of Ethereum's long term scalability, technical sustainability and even security.


The Purge: key goals

In this chapter

History expiry

What problem does it solve?

As of the time of this writing, a full-synced Ethereum node requires roughly 1.1 terabytes of disk space for the execution client, plus another few hundred gigabytes for the consensus client. The great majority of this is history: data about historical blocks, transactions and receipts, the bulk of which are many years old. This means that the size of a node keeps increasing by hundreds of gigabytes each year, even if the gas limit does not increase at all.

What is it, and how does it work?

A key simplifying feature of the history storage problem is that because each block points to the previous block via a hash link (and other structures), having consensus on the present is enough to have consensus on history. As long as the network has consensus on the latest block, any historical block or transaction or state (account balance, nonce, code, storage) can be provided by any single actor along with a Merkle proof, and the proof allows anyone else to verify its correctness. While consensus is an N/2-of-N trust model, history is a 1-of-N trust model.

This opens up a lot of options for how we can store the history. One natural option is a network where each node only stores a small percentage of the data. This is how torrent networks have worked for decades: while the network altogether stores and distributes millions of files, each participant only stores and distributes a few of them. Perhaps counterintuitively, this approach does not even necessarily decrease the robustness of the data. If, by making node running more affordable, we can get to a network with 100,000 nodes, where each node stores a random 10% of the history, then each piece of data would get replicated 10,000 times - exactly the same replication factor as a 10,000-node network where each node stores everything.

Today, Ethereum has already started to move away from the model of all nodes storing all history forever. Consensus blocks (ie. the parts related to proof of stake consensus) are only stored for ~6 months. Blobs are only stored for ~18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. A long-term goal is to have a harmonized period (which could be ~18 days) during which each node is responsible for storing everything, and then have a peer-to-peer network made up of Ethereum nodes storing older data in a distributed way.



Erasure codes can be used to increase robustness while keeping the replication factor the same. In fact, blobs already come erasure-coded in order to support data availability sampling. The simplest solution may well be to re-use this erasure coding, and put execution and consensus block data into blobs as well.

What is left to do, and what are the tradeoffs?

The main remaining work involves building out and integrating a concrete distributed solution for storing history - at least execution history, but ultimately also consensus and blobs. The easiest solutions for this are (i) to simply introduce an existing torrent library, and (ii) an Ethereum-native solution called the Portal network. Once either of these is introduced, we can turn EIP-4444 on. EIP-4444 itself does not require a hard fork, though it does require a new network protocol version. For this reason, there is value in enabling it for all clients at the same time, because otherwise there are risks of clients malfunctioning from connecting to other nodes expecting to download the full history but not actually getting it.

The main tradeoff involves how hard we try to make "ancient" historical data available. The easiest solution would be to simply stop storing ancient history tomorrow, and rely on existing archive nodes and various centralized providers for replication. This is easy, but this weakens Ethereum's position as a place to make permanent records. The harder, but safer, path is to first build out and integrate the torrent network for storing history in a distributed way. Here, there are two dimensions of "how hard we try":

  1. How hard do we try to make sure that a maximally large set of nodes really is storing all the data?
  2. How deeply do we integrate the historical storage into the protocol?

A maximally paranoid approach for (1) would involve proof of custody: actually requiring each proof of stake validator to store some percentage of history, and regularly cryptographically checking that they do so. A more moderate approach is to set a voluntary standard for what percentage of history each client stores.

For (2), a basic implementation involves just taking the work that is already done today: Portal already stores ERA files containing the entire Ethereum history. A more thorough implementation would involve actually hooking this up to the syncing process, so that if someone wanted to sync a full-history-storing node or an archive node, they could do so even if no other archive nodes existed online, by syncing straight from the Portal network.

How does it interact with other parts of the roadmap?

Reducing history storage requirements is arguably even more important than statelessness if we want to make it extremely easy to run or spin up a node: out of the 1.1 TB that a node needs to have, ~300 GB is state, and the remaining ~800 GB is history. The vision of an Ethereum node running on a smart watch and taking only a few minutes to set up is only achievable if both statelessness and EIP-4444 are implemented.

Limiting history storage also makes it more viable for newer Ethereum node implementations to only support recent versions of the protocol, which allows them to be much simpler. For example, many lines of code can be safely removed now that empty storage slots created during the 2016 DoS attacks have all been removed. Now that the switch to proof of stake is ancient history, clients can safely remove all proof-of-work-related code.

State expiry

What problem does it solve?

Even if we remove the need for clients to store history, a client's storage requirement will continue to grow, by around 50 GB per year, because of ongoing growth to the state: account balances and nonces, contract code and contract storage. Users are able to pay a one-time cost to impose a burden on present and future Ethereum clients forever.

State is much harder to "expire" than history, because the EVM is fundamentally designed around an assumption that once a state object is created, it will always be there and can be read by any transaction at any time. If we introduce statelessness, there is an argument that maybe this problem is not that bad: only a specialized class of block builders would need to actually store the state, and all other nodes (even inclusion list production!) can run statelessly. However, there is an argument that we don't want to lean on statelessness too much, and eventually we may want to expire state to keep Ethereum decentralized.

What is it, and how does it work?

Today, when you create a new state object (which can happen in one of three ways: (i) sending ETH to a new account, (ii) creating a new account with code, (iii) setting a previously-untouched storage slot), that state object is in the state forever. What we want instead, is for objects to automatically expire over time. The key challenge is doing this in a way that accomplishes three goals:

  1. Efficiency: don't require huge amounts of extra computation to run the expiry process
  2. User-friendliness: if someone goes into a cave for five years and comes back, they should not lose access to their ETH, ERC20s, NFTs, CDP positions...
  3. Developer-friendliness: developers should not have to switch to a completely unfamiliar mental model. Additionally, applications that are ossified today and do not update should continue to work reasonably well.

It's easy to solve the problem without satisfying these goals. For example, you could have each state object also store a counter for its expiry date (which could be extended by burning ETH, which could happen automatically any time it's read or written), and have a process that loops through the state to remove expired state objects. However, this introduces extra computation (and even storage requirements), and it definitely does not satisfy the user-friendliness requirement. Developers too would have a hard time reasoning about edge cases involving storage values sometimes resetting to zero. If you make the expiry timer contract-wide, this makes life technically easier for developers, but it makes the economics harder: developers would have to think about how to "pass through" the ongoing costs of storage to their users.

These are problems that the Ethereum core development community struggled with for many years, including proposals like "blockchain rent" and "regenesis". Eventually, we combined the best parts of the proposals and converged on two categories of "known least bad solutions":

Partial state expiry

Partial state expiry proposals all work along the same principle. We split the state into chunks. Everyone permanently stores the "top-level map" of which chunks are empty or nonempty. The data within each chunk is only stored if that data has been recently accessed. There is a "resurrection" mechanism where if a chunk is no longer stored, anyone can bring that data back by providing a proof of what the data was.

The main distinctions between these proposals are: (i) how do we define "recently", and (ii) how do we define "chunk"? One concrete proposal is EIP-7736, which builds upon the "stem-and-leaf" design introduced for Verkle trees (though compatible with any form of statelessness, eg. binary trees). In this design, header, code and storage slots that are adjacent to each other are stored under the same "stem". The data stored under a stem can be at most 256 * 31 = 7,936 bytes. In many cases, the entire header and code, and many key storage slots, of an account will all be stored under the same stem. If the data under a given stem is not read or written for 6 months, the data is no longer stored, and instead only a 32-byte commitment ("stub") to the data is stored. Future transactions that access that data would need to "resurrect" the data, with a proof that would be checked against the stub.



There are other ways to implement a similar idea. For example, if account-level granularity is not enough, we could make a scheme where each 1/232 fraction of the tree is governed by a similar stem-and-leaf mechanism.

This is trickier because of incentives: an attacker could force clients to permanently store a very large amount of state by putting a very large amount of data into a single subtree and sending a single transaction every year to "renew the tree". If you make the renewal cost proportional (or renewal duration inversely-proportional) to the tree size, then someone could grief another user by putting a very large amount of data into the same subtree as them. One could try to limit both problems by making the granularity dynamic based on the subtree size: for example, each consecutive 216 = 65536 state objects could be treated as a "group". However, these ideas are more complex; the stem-based approach is simple, and it aligns incentives, because typically all the data under a stem is related to the same application or user.

Address-period-based state expiry proposals

What if we wanted to avoid any permanent state growth at all, even 32-byte stubs? This is a hard problem because of resurrection conflicts: what if a state object gets removed, later EVM execution puts another state object in the exact same position, but then after that someone who cares about the original state object comes back and tries to recover it? With partial state expiry, the "stub" prevents new data from being created. With full state expiry, we cannot afford to store even the stub.

The address-period-based design is the best known idea for solving this. Instead of having one state tree storing the whole state, we have a constantly growing list of state trees, and any state that gets read or written gets saved in the most recent state tree. A new empty state tree gets added once per period (think: 1 year). Older state trees are frozen solid. Full nodes are only expected to store the most recent two trees. If a state object was not touched for two periods and thus falls into an expired tree, it still can be read or written to, but the transaction would need to prove a Merkle proof for it - and once it does, a copy will be saved in the latest tree again.



A key idea for making this all user and developer-friendly is the concept of address periods. An address period is a number that is part of an address. A key rule is that an address with address period N can only be read or written to during or after period N (ie. when the state tree list reaches length N). If you're saving a new state object (eg. a new contract, or a new ERC20 balance), if you make sure to put the state object into a contract whose address period is either N or N-1, then you can save it immediately, without needing to provide proofs that there was nothing there before. Any additions or edits to state in older address periods, on the other hand, do require a proof.

This design preserves most of Ethereum's current properties, is very light on extra computation, allows applications to be written almost as they are today (ERC20s will need to rewrite, to ensure that balances of addresses with address period N are stored in a child contract which itself has address period N), and solves the "user goes into a cave for five years" problem. However, it has one big issue: addresses need to be expanded beyond 20 bytes to fit address periods.

Address space extension

One proposal is to introduce a new 32-byte address format, which includes a version number, an address period number and an expanded hash.

0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF

The red is a version number. The four zeroes colored orange here are intended as empty space, which could fit a shard number in the future. The green is an address period number. The blue is a 26-byte hash.

The key challenge here is backwards compatibility. Existing contracts are designed around 20 byte addresses, and often use tight byte-packing techniques that explicitly assume addresses are exactly 20 bytes long. One idea for solving this involves a translation map, where old-style contracts interacting with new-style addresses would see a 20-byte hash of the new-style address. However, there are significant complexities involved in making this safe.

Address space contraction

Another approach goes the opposite direction: we immediately forbid some 2128-sized sub-range of addresses (eg. all addresses starting with 0xffffffff), and then use that range to introduce addresses with address periods and 14-byte hashes.

0xffffffff000169125d5dFcb7B8C2659029395bdF

The key sacrifice that this approach makes, is that it introduces security risks for counterfactual addresses: addresses that hold assets or permissions, but whose code has not yet been published to chain. The risk involves someone creating an address which claims to have one piece of (not-yet-published) code, but also has another valid piece of code which hashes to the same address. Computing such a collision requires 280 hashes today; address space contraction would reduce this number to a very accessible 256 hashes.

The key risk area, counterfactual addresses that are not wallets held by a single owner, is a relatively rare case today, but is likely to become more common as we enter a multi-L2 world. The only solution is to simply accept this risk, but identify all common use cases where this may be an issue, and come up with effective workarounds.

What is left to do, and what are the tradeoffs?

I see four viable paths for the future:

One important point is that the difficult issues around address space expansion and contraction will eventually have to be addressed regardless of whether or not state expiry schemes that depend on address format changes are ever implemented. Today, it takes roughly 280 hashes to generate an address collision, a computational load that is already feasible for extremely well-resourced actors: a GPU can do around 227 hashes, so running for a year it can compute 252, so all ~230 GPUs in the world could compute a collision in ~1/4 of a year, and FPGAs and ASICs could accelerate this further. In the future, such attacks will become open to more and more people. Hence, the actual cost of implementing full state expiry may not be as high as it seems, since we have to solve this very challenging address problem regardless.

How does it interact with other parts of the roadmap?

Doing state expiry potentially makes transitions from one state tree format to another easier, because there will be no need for a transition procedure: you could simply start making new trees using a new format, and then later do a hard fork to convert the older trees. Hence, while state expiry is complex, it does have benefits in simplifying other aspects of the roadmap.

Feature cleanup

What problems does it solve?

One of the key preconditions of security, accessibility and credible neutrality is simplicity. If a protocol is beautiful and simple, it reduces the chance that there will be bugs. It increases the chance that new developers will be able to come in and work with any part of it. It's more likely to be fair and easier to defend against special interests. Unfortunately, protocols, like any social system, by default become more complex over time. If we do not want Ethereum to go into a black hole of ever-increasing complexity, we need to do one of two things: (i) stop making changes and ossify the protocol, (ii) be able to actually remove features and reduce complexity. An intermediate route, of making fewer changes to the protocol, and also removing at least a little complexity over time, is also possible. This section will talk how we can reduce or remove complexity.

What is it, and how does it work?

There is no big single fix that can reduce protocol complexity; the inherent nature of the problem is that there are many little fixes.

One example that is mostly finished already, and can serve as a blueprint for how to handle the others, is the removal of the SELFDESTRUCT opcode. The SELFDESTRUCT opcode was the only opcode that could modify an unlimited number of storage slots within a single block, requiring clients to implement significantly more complexity to avoid DoS attacks. The opcode's original purpose was to enable voluntary state clearing, allowing the state size to decrease over time. In practice, very few ended up using it. The opcode was nerfed to only allow self-destructing accounts created in the same transaction in the Dencun hardfork. This solves the DoS issue and allows for significant simplification in client code. In the future, it likely makes sense to eventually remove the opcode completely.

Some key examples of protocol simplification opportunities that have been identified so far include the following. First, some examples that are outside the EVM; these are relatively non-invasive, and thus easier to get consensus on and implement in a shorter timeframe.

Now, some examples that are inside the EVM:

What is left to do, and what are the tradeoffs?

The main tradeoff in doing this kind of feature simplification is (i) how much we simplify and how quickly vs (ii) backwards compatibility. Ethereum's value as a chain comes from it being a platform where you can deploy an application and be confident that it will still work many years from now. At the same time, it's possible to take that ideal too far, and, to paraphrase William Jennings Bryan, "crucify Ethereum on a cross of backwards compatibility". If there are only two applications in all of Ethereum that use a given feature, and one has had zero users for years and the other is almost completely unused and secures a total of $57 of value, then we should just remove the feature, and if needed pay the victims $57 out of pocket.

The broader social problem is in creating a standardized pipeline for making non-emergency backwards-compatibility-breaking changes. One way to approach this is to examine and extend existing precedents, such as the SELFDESTRUCT process. The pipeline looks something as follows:

There should be a multi-year-long pipeline between step 1 and step 4, with clear information about which items are at which step. At that point, there is a tradeoff between how vigorous and fast the feature-removal pipeline is, versus being more conservative and putting more resources into other areas of protocol development, but we are still far from the Pareto frontier.

EOF

A major set of changes that has been proposed to the EVM is the EVM Object Format (EOF). EOF introduces a large number of changes, such as banning gas observability, code observability (ie. no CODECOPY), allowing static jumps only. The goal is to allow the EVM to be upgraded more, in a way that has stronger properties, while preserving backwards compatibility (as the pre-EOF EVM will still exist).

This has the advantage that it creates a natural path to adding new EVM features and encouraging migration to a more restrictive EVM with stronger guarantees. It has the disadvantage that it significantly increases protocol complexity, unless we can find a way to eventually deprecate and remove the old EVM. One major question is: what role does EOF play in EVM simplification proposals, especially if the goal is to reduce the complexity of the EVM as a whole?

How does it interact with other parts of the roadmap?

Many of the "improvement" proposals in the rest of the roadmap are also opportunities to do simplifications of old features. To repeat some examples from above:

A more radical approach: turn big parts of the protocol into contract code

A more radical Ethereum simplification strategy is to keep the protocol as is, but move large parts of it from being protocol features to being contract code.

The most extreme version of this would be to make the Ethereum L1 "technically" be just the beacon chain, and introduce a minimal VM (eg. RISC-V, Cairo, or something even more minimal specialized for proving systems) which allows anyone else to create their own rollup. The EVM would then turn into the first one of these rollups. This is ironically exactly the same outcome as the execution environment proposals from 2019-20, though SNARKs make it significantly more viable to actually implement.



A more moderate approach would be to keep the relationship between the beacon chain and the current Ethereum execution environment as-is, but do an in-place swap of the EVM. We could choose RISC-V, Cairo or another VM to be the new "official Ethereum VM", and then force-convert all EVM contracts into new-VM code that interprets the logic of the original code (by compiling or interpreting it). Theoretically, this could even be done with the "target VM" being a version of EOF.