Possible futures of the Ethereum protocol, part 2: The Surge

2024 Oct 17 See all posts


Possible futures of the Ethereum protocol, part 2: The Surge

Special thanks to Justin Drake, Francesco, Hsiao-wei Wang, @antonttc and Georgios Konstantopoulos

At the beginning, Ethereum had two scaling strategies in its roadmap. One (eg. see this early paper from 2015) was "sharding": instead of verifying and storing all of the transactions in the chain, each node would only need to verify and store a small fraction of the transactions. This is how any other peer-to-peer network (eg. BitTorrent) works too, so surely we could make blockchains work the same way. Another was layer 2 protocols: networks that would sit on top of Ethereum in a way that allow them to fully benefit from its security, while keeping most data and computation off the main chain. "Layer 2 protocols" meant state channels in 2015, Plasma in 2017, and then rollups in 2019. Rollups are more powerful than state channels or Plasma, but they require a large amount of on-chain data bandwidth. Fortunately, by 2019 sharding research had solved the problem of verifying "data availability" at scale. As a result, the two paths converged, and we got the rollup-centric roadmap which continues to be Ethereum's scaling strategy today.


The Surge, 2023 roadmap edition.


The rollup-centric roadmap proposes a simple division of labor: the Ethereum L1 focuses on being a robust and decentralized base layer, while L2s take on the task of helping the ecosystem scale. This is a pattern that recurs everywhere in society: the court system (L1) is not there to be ultra-fast and efficient, it's there to protect contracts and property rights, and it's up to entrepreneurs (L2) to build on top of that sturdy base layer and take humanity to (metaphorical and literal) Mars.

This year, the rollup-centric roadmap has seen important successes: Ethereum L1 data bandwidth has increased greatly with EIP-4844 blobs, and multiple EVM rollups are now at stage 1. A very heterogeneous and pluralistic implementation of sharding, where each L2 acts as a "shard" with its own internal rules and logic, is now reality. But as we have seen, taking this path has some unique challenges of its own. And so now our task is to bring the rollup-centric roadmap to completion, and solve these problems, while preserving the robustness and decentralization that makes the Ethereum L1 special.


The Surge: key goals

In this chapter

Aside: the scalability trilemma

The scalability trilemma was an idea introduced in 2017, which argued that there is a tension between three properties of a blockchain: decentralization (more specifically: low cost to run a node), scalability (more specifically: high number of transactions processed), and security (more specifically: an attacker needing to corrupt a large portion of the nodes in the whole network to make even a single transaction fail).



Notably, the trilemma is not a theorem, and the post introducing the trilemma did not come with a mathematical proof. It did give a heuristic mathematical argument: if a decentralization-friendly node (eg. consumer laptop) can verify N transactions per second, and you have a chain that processes k*N transactions per second, then either (i) each transaction is only seen by 1/k of nodes, which implies an attacker only needs to corrupt a few nodes to push a bad transaction through, or (ii) your nodes are going to be beefy and your chain not decentralized. The purpose of the post was never to show that breaking the trilemma is impossible; rather, it was to show that breaking the trilemma is hard - it requires somehow thinking outside of the box that the argument implies.

For many years, it has been common for some high-performance chains to claim that they solve the trilemma without doing anything clever at a fundamental architecture level, typically by using software engineering tricks to optimize the node. This is always misleading, and running a node in such chains always ends up far more difficult than in Ethereum. This post gets into some of the many subtleties why this is the case (and hence, why L1 client software engineering alone cannot scale Ethereum itself).

However, the combination of data availability sampling and SNARKs does solve the trilemma: it allows a client to verify that some quantity of data is available, and some number of steps of computation were carried out correctly, while downloading only a small portion of that data and running a much smaller amount of computation. SNARKs are trustless. Data availability sampling has a nuanced few-of-N trust model, but it preserves the fundamental property that non-scalable chains have, which is that even a 51% attack cannot force bad blocks to get accepted by the network.

Another way to solve the trilemma is Plasma architectures, which use clever techniques to push the responsibility to watch for data availability to the user in an incentive-compatible way. Back in 2017-2019, when all we had to scale computation was fraud proofs, Plasma was very limited in what it could safely do, but the mainstreaming of SNARKs makes Plasma architectures far more viable for a wider array of use cases than before.

Further progress in data availability sampling

What problem are we solving?

As of 2024 March 13, when the Dencun upgrade went live, the Ethereum blockchain has three ~125 kB "blobs" per 12-second slot, or ~375 kB per slot of data availability bandwidth. Assuming transaction data is published onchain directly, an ERC20 transfer is ~180 bytes, and so the maximum TPS of rollups on Ethereum is:

375000 / 12 / 180 = 173.6 TPS

If we add Ethereum's calldata (theoretical max: 30 million gas per slot / 16 gas per byte = 1,875,000 bytes per slot), this becomes 607 TPS. With PeerDAS, the plan is to increase the blob count target to 8-16, which would give us 463-926 TPS in calldata.

This is a major increase over the Ethereum L1, but it is not enough. We want much more scalability. Our medium-term target is 16 MB per slot, which if combined with improvements in rollup data compression would give us ~58,000 TPS.

What is it and how does it work?

PeerDAS is a relatively simple implementation of "1D sampling". Each blob in Ethereum is a degree-4096 polynomial over a 253-bit prime field. We broadcast "shares" of the polynomial, where each share consists of 16 evaluations at an adjacent 16 coordinates taken from a total set of 8192 coordinates. Any 4096 of the 8192 evaluations (with current proposed parameters: any 64 of the 128 possible samples) can recover the blob.



PeerDAS works by having each client listen on a small number of subnets, where the i'th subnet broadcasts the i'th sample of any blob, and additionally asks for blobs on other subnets that it needs by asking its peers in the global p2p network (who would be listening to different subnets). A more conservative version, SubnetDAS, uses only the subnet mechanism, without the additional layer of asking peers. A current proposal is for nodes participating in proof of stake to use SubnetDAS, and for other nodes (ie. "clients") to use PeerDAS.

Theoretically, we can scale 1D sampling pretty far: if we increase the blob count maximum to 256 (so, the target to 128), then we would get to our 16 MB target while data availability sampling would only cost each node 16 samples * 128 blobs * 512 bytes per sample per blob = 1 MB of data bandwidth per slot. This is just barely within our reach of tolerance: it's doable, but it would mean bandwidth-constrained clients cannot sample. We could optimize this somewhat by decreasing blob count and increasing blob size, but this would make reconstruction more expensive.

And so ultimately we want to go further, and do 2D sampling, which works by random sampling not just within blobs, but also between blobs. The linear properties of KZG commitments are used to "extend" the set of blobs in a block with a list of new "virtual blobs" that redundantly encode the same information.


2D sampling. Source: a16z crypto


Crucially, computing the extension of the commitments does not require having the blobs, so the scheme is fundamentally friendly to distributed block construction. The node actually constructing the block would only need to have the blob KZG commitments, and can themslves rely on DAS to verify the availability of the blobs. 1D DAS is also inherently friendly to distributed block construction.

What is left to do, and what are the tradeoffs?

The immediate next step is to finish the implementation and rollout of PeerDAS. From there, it's a progressive grind to keep increasing the blob count on PeerDAS while carefully watching the network and improving the software to ensure safety. At the same time, we want more academic work on formalizing PeerDAS and other versions of DAS and its interactions with issues such as fork choice rule safety.

Further into the future, we need much more work figuring out the ideal version of 2D DAS and proving its safety properties. We also want to eventually migrate away from KZG to a quantum-resistant, trusted-setup-free alternative. Currently, we do not know of candidates that are friendly to distributed block building. Even the expensive "brute force" technique of using recursive STARKs to generate proofs of validity for reconstructing rows and columns does not suffice, because while technically a STARK is O(log(n) * log(log(n)) hashes in size (with STIR), in practice a STARK is almost as big as a whole blob.

The realistic paths I see for the long term are:

We can view these along a tradeoff spectrum:



Note that this choice exists even if we decide to scale execution on L1 directly. This is because if L1 is to process lots of TPS, L1 blocks will become very big, and clients will want an efficient way to verify that they are correct, so we would have to use the same technology that powers rollups (ZK-EVM and DAS) at L1.

How does it interact with other parts of the roadmap?

The need for 2D DAS is somewhat lessened, or at least delayed, if data compression (see below) is implemented, and it's lessened even further if Plasma is widely used. DAS also poses a challenge to distributed block building protocols and mechanisms: while DAS is theoretically friendly to distributed reconstruction, this needs to be combined in practice with inclusion list proposals and their surrounding fork choice mechanics.

Data compression

What problem are we solving?

Each transaction in a rollup takes a significant amount of data space onchain: an ERC20 transfer takes about 180 bytes. Even with ideal data availability sampling, this puts a cap on scalability of layer 2 protocols. With 16 MB per slot, we get:

16000000 / 12 / 180 = 7407 TPS

What if in addition to tackling the numerator, we can also tackle the denominator, and make each transaction in a rollup take fewer bytes onchain?

What is it and how does it work?

The best explanation in my opinion is this diagram from two years ago:



The simplest gains are just zero-byte compression: replacing each long sequence of zero bytes with two bytes representing how many zero bytes there are. To go further, we take advantage of the specific properties of transactions:

What is left to do, and what are the tradeoffs?

The main thing left to do is to actually implement the above schemes. The main tradeoffs are:

How does it interact with other parts of the roadmap?

Adoption of ERC-4337, and eventually the enshrinement of parts of it in L2 EVMs, can greatly hasten the deployment of aggregation techniques. Enshrinement of parts of ERC-4337 on L1 can hasten its deployment on L2s.

Generalized Plasma

What problem are we solving?

Even with 16 MB blobs and data compression, 58,000 TPS is not necessarily enough to fully take over consumer payments, decentralized social or other high-bandwidth sectors, and this becomes especially true if we start taking privacy into account, which could drop scalability by 3-8x. For high-volume, low-value applications, one option today is a validium, which keeps data off-chain and has an interesting security model where the operator cannot steal users' funds, but they can disappear and temporarily or permanently freeze all users' funds. But we can do better.

What is it and how does it work?

Plasma is a scaling solution that involves an operator publishing blocks offchain, and putting the Merkle roots of those blocks onchain (as opposed to rollups, where the full block is put onchain). For each block, the operator sends to each user a Merkle branch proving what happened, or did not happen, to that user's assets. Users can withdraw their assets by providing a Merkle branch. Importantly, this branch does not have to be rooted in the latest state - for this reason, even if data availability fails, the user can still recover their assets by withdrawing the latest state they have that is available. If a user submits an invalid branch (eg. exiting an asset that they already sent to someone else, or the operator themselves creating an asset out of thin air), an onchain challenge mechanism can adjudicate who the asset rightfully belongs to.


A diagram of a Plasma Cash chain. Transactions spending coin i are put into the i'th position in the tree. In this example, assuming all previous trees are valid, we know that Eve currently owns coin 1, David owns coin 4 and George owns coin 6.


Early versions of Plasma were only able to handle the payments use case, and were not able to effectively generalize further. If we require each root to be verified with a SNARK, however, Plasma becomes much more powerful. Each challenge game can be simplified significantly, because we take away most possible paths for the operator to cheat. New paths also open up to allow Plasma techniques to be extended to a much more general class of assets. Finally, in the case where the operator does not cheat, users can withdraw their funds instantly, without needing to wait for a one-week challenge period.


One way (not the only way) to make an EVM plasma chain: use a ZK-SNARK to construct a parallel UTXO tree that reflects the balance changes made by the EVM, and defines a unique mapping of what is "the same coin" at different points in history. A Plasma construction can then be built on top of that.


One key insight is that the Plasma system does not need to be perfect. Even if you can only protect a subset of assets (eg. even just coins that have not moved in the past week), you've already greatly improved on the status quo of ultra-scalable EVM, which is a validium.

Another class of constructions is hybrid plasma/rollups, such as Intmax. These constructions put a very small amount of data per user onchain (eg. 5 bytes), and by doing so, get properties that are somewhere between plasma and rollups: in the Intmax case, you get a very high level of scalability and privacy, though even in the 16 MB world capacity is theoretically capped to roughly 16,000,000 / 12 / 5 = 266,667 TPS.

What is left to do, and what are the tradeoffs?

The main remaining task is to bring Plasma systems to production. As mentioned above, "plasma vs validium" is not a binary: any validium can have its safety properties improved at least a little bit by adding Plasma features into the exit mechanism. The research part is in getting optimal properties (in terms of trust requirements, and worst-case L1 gas cost, and vulnerability to DoS) for an EVM, as well as alternative application specific constructions. Additionally, the greater conceptual complexity of Plasma relative to rollups needs to be addressed directly, both through research and through construction of better generalized frameworks.

The main tradeoff in using Plasma designs is that they depend more on operators and are harder to make "based", though hybrid plasma/rollup designs can often avoid this weakness.

How does it interact with other parts of the roadmap?

The more effective Plasma solutions can be, the less pressure there is for the L1 to have a high-performance data availability functionality. Moving activity to L2 also reduces MEV pressure on L1.

Maturing L2 proof systems

What problem are we solving?

Today, most rollups are not yet actually trustless; there is a security council that has the ability to override the behavior of the (optimistic or validity) proof system. In some cases, the proof system is not even live at all, or if it is it only has an "advisory" functionality. The furthest ahead are (i) a few application-specific rollups, such as Fuel, which are trustless, and (ii) as of the time of this writing, Optimism and Arbitrum, two full-EVM rollups that have achieved a partial-trustlessness milestone known as "stage 1". The reason why rollups have not gone further is concern about bugs in the code. We need trustless rollups, and so we need to tackle this problem head on.

What is it and how does it work?

First, let us recap the "stage" system, originally introduced in this post. There are more detailed requirements, but the summary is:

The goal is to reach Stage 2. The main challenge in reaching stage 2 is getting enough confidence that the proof system actually is trustworthy enough. There are two major ways to do this:


Stylized diagram of a multi-prover, combining one optimistic proof system, one validity proof system and a security council.


What is left to do, and what are the tradeoffs?

For formal verification, a lot. We need to create a formally verified version of an entire SNARK prover of an EVM. This is an incredibly complex project, though it is one that we have already started. There is one trick that significantly simplifies the task: we can make a formally verified SNARK prover of a minimal VM, eg. RISC-V or Cairo, and then write an implementation of the EVM in that minimal VM (and formally prove its equivalence to some other EVM specification).

For multi-provers, there are two main remaining pieces. First, we need to get enough confidence in at least two different proof systems, both that they are reasonably safe individually and that if they break, they would break for different and unrelated reasons (and so they would not break at the same time). Second, we need to get a very high level of assurance in the underlying logic that merges the proof systems. This is a much smaller piece of code. There are ways to make it extremely small - just store funds in a Safe multisig contract whose signers are contracts representing individual proof systems - but this has the tradeoff of high onchain gas costs. Some balance between efficiency and safety will need to be found.

How does it interact with other parts of the roadmap?

Moving activity to L2 reduces MEV pressure on L1.

Cross-L2 interoperability improvements

What problem are we solving?

One major challenge with the L2 ecosystem today is that it is difficult for users to navigate. Furthermore, the easiest ways of doing so often re-introduce trust assumptions: centralized bridges, RPC clients, and so forth. If we are serious about the idea that L2s are part of Ethereum, we need to make using the L2 ecosystem feel like using a unified Ethereum ecosystem.


An example of pathologically bad (and even dangerous: I personally lost $100 to a chain-selection mistake here) cross-L2 UX - though this is not Polymarket's fault, cross-L2 interoperability should be the responsibility of wallets and the Ethereum standards (ERC) community. In a well-functioning Ethereum ecosystem, sending coins from L1 to L2, or from one L2 to another, should feel just like sending coins within the same L1.


What is it and how does it work?

There are many categories of cross-L2 interoperability improvements. In general, the way to come up with these is to notice that in theory, a rollup-centric Ethereum is the same thing as L1 execution sharding, and then ask where the current Ethereum L2-verse falls short of that ideal in practice. Here are a few:


How a light client can update its view of the Ethereum header chain. Once you have the header chain, you can use Merkle proofs to validate any state object. And once you have the right L1 state objects, you can use Merkle proofs (and possibly signatures, if you want to check preconfirmations) to validate any state object on L2. Helios does the former already. Extending to the latter is a standardization challenge.



A stylized diagram of how keystore wallets work.


What is left to do, and what are the tradeoffs?

Many of the examples above face standard dilemmas of when to standardize and what layers to standardize. If you standardize too early, you risk entrenching an inferior solution. If you standardize too late, you risk creating needless fragmentation. In some cases, there is both a short-term solution that has weaker properties but is easier to implement, and a long-term solution that is "ultimately right" but will take quite a few years to get there.

One way in which this section is unique, is that these tasks are not just technical problems: they are also (perhaps even primarily!) social problems. They require L2s and wallets and L1 to cooperate. Our ability to handle this problem successfully is a test of our ability to stick together as a community.

How does it interact with other parts of the roadmap?

Most of these proposals are "higher-layer" constructions, and so do not greatly affect L1 considerations. One exception is shared sequencing, which has heavy impacts on MEV.

Scaling execution on L1

What problem are we solving?

If L2s become very scalable and successful but L1 remains capable of processing only a very low volume of transactions, there are many risks to Ethereum that might arise:

  1. The economic situation of ETH the asset becomes more risky, which in turn affects long-run security of the network.
  2. Many L2s benefit from being closely tied to a highly developed financial ecosystem on L1, and if this ecosystem greatly weakens, the incentive to become an L2 (instead of being an independent L1) weakens
  3. It will take a long time before L2s have exactly the same security assurances as L1.
  4. If an L2 fails (eg. due to a malicious or disappearing operator), users would still need to go through L1 in order to recover their assets. Hence, L1 needs to be powerful enough to be able to at least occasionally actually handle a highly complex and chaotic wind-down of an L2.

For these reasons, it is valuable to continue scaling L1 itself, and making sure that it can continue to accommodate a growing number of uses.

What is it and how does it work?

The easiest way to scale is to simply increase the gas limit. However, this risks centralizing the L1, and thus weakening the other important property that makes the Ethereum L1 so powerful: its credibility as a robust base layer. There is an ongoing debate about what degree of simple gas limit increase is sustainable, and this also changes based on which other technologies get implemented to make larger blocks easier to verify (eg. history expiry, statelessness, L1 EVM validity proofs). Another important thing to keep improving is simply the efficiency of Ethereum client software, which is far more optimized today than it was five years ago. An effective L1 gas limit increase strategy would involve accelerating these verification technologies.

Another scaling strategy involves identifying specific features and types of computation that can be made cheaper without harming the decentralization of the network or its security properties. Examples of this include:

These improvements will be discussed in more detail in a future post on the Splurge.

Finally, a third strategy is native rollups (or "enshrined rollups"): essentially, creating many copies of the EVM that run in parallel, leading to a model that is equivalent to what rollups can provide, but much more natively integrated into the protocol.

What is left to do, and what are the tradeoffs?

There are three strategies for L1 scaling, which can be pursued individually or in parallel:

It's worth understanding that these are different techniques that have different tradeoffs. For example, native rollups have many of the same weaknesses in composability as regular rollups: you cannot send a single transaction that synchronously performs operations across many of them, like you can with contracts on the same L1 (or L2). Raising the gas limit takes away from other benefits that can be achieved by making the L1 easier to verify, such as increasing the portion of users that run verifying nodes, and increasing solo stakers. Making specific operations in the EVM cheaper, depending on how it's done, can increase total EVM complexity.

A big question that any L1 scaling roadmap needs to answer is: what is the ultimate vision for what belongs on L1 and what belongs on L2? Clearly, it's absurd for everything to go on L1: the potential use cases go into the hundreds of thousands of transactions per second, and that would make the L1 completely unviable to verify (unless we go the native rollup route). But we do need some guiding principle, so that we can make sure that we are not creating a situation where we increase the gas limit 10x, heavily damage the Ethereum L1's decentralization, and find that we've only gotten to a world where instead of 99% of activity being on L2, 90% of activity is on L2, and so the result otherwise looks almost the same, except for an irreversible loss of much of what makes Ethereum L1 special.


One proposed view of a "division of labor" between L1 and L2s, source.


How does it interact with other parts of the roadmap?

Bringing more users onto L1 implies improving not just scale, but also other aspects of L1. It means that more MEV will remain on L1 (as opposed to becoming a problem just for L2s), and so will be even more of a pressing need to handle it explicitly. It greatly increases the value of having fast slot times on L1. And it's also heavily dependent on verification of L1 ("the Verge") going well.