Dark Mode Toggle

Binius: highly efficient proofs over binary fields

2024 Apr 29 See all posts

Binius: highly efficient proofs over binary fields

This post is primarily intended for readers roughly familiar with 2019-era cryptography, especially SNARKs and STARKs. If you are not, I recommend reading those articles first. Special thanks to Justin Drake, Jim Posen, Benjamin Diamond and Radi Cojbasic for feedback and review.

Over the past two years, STARKs have become a crucial and irreplaceable technology for efficiently making easy-to-verify cryptographic proofs of very complicated statements (eg. proving that an Ethereum block is valid). A key reason why is small field sizes: whereas elliptic curve-based SNARKs require you to work over 256-bit integers in order to be secure enough, STARKs let you use much smaller field sizes, which are more efficient: first the Goldilocks field (64-bit integers), and then Mersenne31 and BabyBear (both 31-bit). Thanks to these efficiency gains, Plonky2, which uses Goldilocks, is hundreds of times faster at proving many kinds of computation than its predecessors.

A natural question to ask is: can we take this trend to its logical conclusion, building proof systems that run even faster by operating directly over zeroes and ones? This is exactly what Binius is trying to do, using a number of mathematical tricks that make it very different from the SNARKs and STARKs of three years ago. This post goes through the reasons why small fields make proof generation more efficient, why binary fields are uniquely powerful, and the tricks that Binius uses to make proofs over binary fields work so effectively.

Binius. By the end of this post, you should be able to understand every part of this diagram.

Recap: finite fields
Recap: arithmetization
Plonky2: from 256-bit SNARKs and STARKs to 64-bit... only STARKs
From small primes to binary
From univariate polynomials to hypercubes
Simple Binius - an example
Binary fields
Full Binius
Putting it all together
What did we not cover?

Recap: finite fields

One of the key tasks of a cryptographic proving system is to operate over huge amounts of data, while keeping the numbers small. If you can compress a statement about a large program into a mathematical equation involving a few numbers, but those numbers are as big as the original program, you have not gained anything.

To do complicated arithmetic while keeping numbers small, cryptographers generally use modular arithmetic. We pick some prime "modulus" p. The % operator means "take the remainder of": \(15\ \%\ 7 = 1\), \(53\ \%\ 10 = 3\), etc (note that the answer is always non-negative, so for example \(-1\ \%\ 10 = 9\)).

You've probably already seen modular arithmetic, in the context of adding and subtracting time (eg. what time is four hours after 9:00?). But here, we don't just add and subtract modulo some number, we also multiply, divide and take exponents.

We redefine:

\(x + y \Rightarrow (x + y)\) % \(p\)

\(x * y \Rightarrow (x * y)\) % \(p\)

\(x^y \Rightarrow (x^y)\) % \(p\)

\(x - y \Rightarrow (x - y)\) % \(p\)

\(x / y \Rightarrow (x * y ^{p-2})\) % \(p\)

The above rules are all self-consistent. For example, if \(p = 7\), then:

\(5 + 3 = 1\) (because \(8\) % \(7 = 1\))
\(1 - 3 = 5\) (because \(-2\) % \(7 = 5\))
\(2 \cdot 5 = 3\)
\(3 / 5 = 2\) (because (\(3 \cdot 5^5\)) % \(7 = 9375\) % \(7 = 2\))

A more general term for this kind of structure is a finite field. A finite field is a mathematical structure that obeys the usual laws of arithmetic, but where there's a limited number of possible values, and so each value can be represented in a fixed size.

Modular arithmetic (or prime fields) is the most common type of finite field, but there is also another type: extension fields. You've probably already seen an extension field before: the complex numbers. We "imagine" a new element, which we label \(i\), and declare that it satisfies \(i^2 = -1\). You can then take any combination of regular numbers and \(i\), and do math with it: \((3i+2) * (2i + 4) =\) \(6i^2 + 12i + 4i + 8 = 16i + 2\). We can similarly take extensions of prime fields. As we start working over fields that are smaller, extensions of prime fields become increasingly important for preserving security, and binary fields (which Binius uses) depend on extensions entirely to have practical utility.

Recap: arithmetization

The way that SNARKs and STARKs prove things about computer programs is through arithmetization: you convert a statement about a program that you want to prove, into a mathematical equation involving polynomials. A valid solution to the equation corresponds to a valid execution of the program.

To give a simple example, suppose that I computed the 100'th Fibonacci number, and I want to prove to you what it is. I create a polynomial \(F\) that encodes Fibonacci numbers: so \(F(0) = F(1) = 1\), \(F(2) = 2\), \(F(3) = 3\), \(F(4) = 5\), and so on for 100 steps. The condition that I need to prove is that \(F(x+2) = F(x) + F(x+1)\) across the range \(x = \{0, 1 ... 98\}\). I can convince you of this by giving you the quotient:

\[H(x) = \frac{F(x+2) - F(x+1) - F(x)}{Z(x)}\]

Where \(Z(x) = (x - 0) * (x - 1) * ... * (x - 98)\). If I can provide valid \(F\) and \(H\) that satisfy this equation, then \(F\) must satisfy \(F(x+2) - F(x+1) - F(x)\) across that range. If I additionally verify that \(F\) satisfies \(F(0) = F(1) = 1\), then \(F(100)\) must actually be the 100th Fibonacci number.

If you want to prove something more complicated, then you replace the "simple" relation \(F(x+2) = F(x) + F(x+1)\) with a more complicated equation, which basically says "\(F(x+1)\) is the output of initializing a virtual machine with the state \(F(x)\), and running one computational step". You can also replace the number 100 with a bigger number, eg. 100000000, to accommodate more steps.

All SNARKs and STARKs are based on this idea of using a simple equation over polynomials (or sometimes vectors and matrices) to represent a large number of relationships between individual values. Not all involve checking equivalence between adjacent computational steps in the same way as above: PLONK does not, for example, and neither does R1CS. But many of the most efficient ones do, because enforcing the same check (or the same few checks) many times makes it easier to minimize overhead.

Plonky2: from 256-bit SNARKs and STARKs to 64-bit... only STARKs

Five years ago, a reasonable summary of the different types of zero knowledge proof was as follows. There are two types of proofs: (elliptic-curve-based) SNARKs and (hash-based) STARKs. Technically, STARKs are a type of SNARK, but in practice it's common to use "SNARK" to refer to only the elliptic-curve-based variety, and "STARK" to refer to hash-based constructions. SNARKs are small, and so you can verify them very quickly and fit them onchain easily. STARKs are big, but they don't require trusted setups, and they are quantum-resistant.

STARKs work by treating the data as a polynomial, computing evaluations of that polynomial across a large number of points, and using the Merkle root of that extended data as the "polynomial commitment"

A key bit of history here is that elliptic curve-based SNARKs came into widespread use first: it took until roughly 2018 for STARKs to become efficient enough to use, thanks to FRI, and by then Zcash had already been running for over a year. Elliptic curve-based SNARKs have a key limitation: if you want to use elliptic curve-based SNARKs, then the arithmetic in these equations must be done with integers modulo the number of points on the elliptic curve. This is a big number, usually near \(2^{256}\): for example, for the bn128 curve, it's 21888242871839275222246405745257275088548364400416034343698204186575808495617. But the actual computation is using small numbers: if you think about a "real" program in your favorite language, most of the stuff it's working with is counters, indices in for loops, positions in the program, individual bits representing True or False, and other things that will almost always be only a few digits long.

Even if your "original" data is made up of "small" numbers, the proving process requires computing quotients, extensions, random linear combinations, and other transformations of the data, which lead to an equal or larger number of objects that are, on average, as large as the full size of your field. This creates a key inefficiency: to prove a computation over n small values, you have to do even more computation over n much bigger values. At first, STARKs inherited the habit of using 256-bit fields from SNARKs, and so suffered the same inefficiency.

A Reed-Solomon extension of some polynomial evaluations. Even though the original values are small, the extra values all blow up to the full size of the field (in this case \(2^{31} - 1\)).

In 2022, Plonky2 was released. Plonky2's main innovation was doing arithmetic modulo a smaller prime: \(2^{64} - 2^{32} + 1 = 18446744069414584321\). Now, each addition or multiplication can always be done in just a few instructions on a CPU, and hashing all of the data together is 4x faster than before. But this comes with a catch: this approach is STARK-only. If you try to use a SNARK, with an elliptic curve of such a small size, the elliptic curve becomes insecure.

To continue to be safe, Plonky2 also needed to introduce extension fields. A key technique in checking arithmetic equations is "sampling at a random point": if you want to check if \(H(x) * Z(x)\) actually equals \(F(x+2) - F(x+1) - F(x)\), you can pick some random coordinate \(r\), provide polynomial commitment opening proofs proving \(H(r)\), \(Z(r)\), \(F(r)\), \(F(r+1)\) and \(F(r+2)\), and then actually check if \(H(r) * Z(r)\) equals \(F(r+2) - F(r+1) - F(r)\). If the attacker can guess the coordinate ahead of time, the attacker can trick the proof system - hence why it must be random. But this also means that the coordinate must be sampled from a set large enough that the attacker cannot guess it by random chance. If the modulus is near \(2^{256}\), this is clearly the case. But with a modulus of \(2^{64} - 2^{32} + 1\), we're not quite there, and if we drop to \(2^{31} - 1\), it's definitely not the case. Trying to fake a proof two billion times until one gets lucky is absolutely within the range of an attacker's capabilities.

To stop this, we sample \(r\) from an extension field. For example, you can define \(y\) where \(y^3 = 5\), and take combinations of \(1\), \(y\) and \(y^2\). This increases the total number of coordinates back up to roughly \(2^{93}\). The bulk of the polynomials computed by the prover don't go into this extension field; they just use integers modulo \(2^{31}-1\), and so you still get all the efficiencies from using the small field. But the random point check, and the FRI computation, does dive into this larger field, in order to get the needed security.

From small primes to binary

Computers do arithmetic by representing larger numbers as sequences of zeroes and ones, and building "circuits" on top of those bits to compute things like addition and multiplication. Computers are particularly optimized for doing computation with 16-bit, 32-bit and 64-bit integers. Moduluses like \(2^{64} - 2^{32} + 1\) and \(2^{31} - 1\) are chosen not just because they fit within those bounds, but also because they align well with those bounds: you can do multiplication modulo \(2^{64} - 2^{32} + 1\) by doing regular 32-bit multiplication, and shift and copy the outputs bitwise in a few places; this article explains some of the tricks well.

What would be even better, however, is doing computation in binary directly. What if addition could be "just" XOR, with no need to worry about "carrying" the overflow from adding 1 + 1 in one bit position to the next bit position? What if multiplication could be more parallelizable in the same way? And these advantages would all come on top of being able to represent True/False values with just one bit.

Capturing these advantages of doing binary computation directly is exactly what Binius is trying to do. A table from the Binius team's zkSummit presentation shows the efficiency gains:

Despite being roughly the same "size", a 32-bit binary field operation takes 5x less computational resources than an operation over the 31-bit Mersenne field.

From univariate polynomials to hypercubes

Suppose that we are convinced by this reasoning, and want to do everything over bits (zeroes and ones). How do we actually commit to a polynomial representing a billion bits?

Here, we face two practical problems:

For a polynomial to represent a lot of values, those values need to be accessible at evaluations of the polynomial: in our Fibonacci example above, \(F(0)\), \(F(1)\) ... \(F(100)\), and in a bigger computation, the indices would go into the millions. And the field that we use needs to contain numbers going up to that size.
Proving anything about a value that we're committing to in a Merkle tree (as all STARKs do) requires Reed-Solomon encoding it: extending \(n\) values into eg. \(8n\) values, using the redundancy to prevent a malicious prover from cheating by faking one value in the middle of the computation. This also requires having a large enough field: to extend a million values to 8 million, you need 8 million different points at which to evaluate the polynomial.

A key idea in Binius is solving these two problems separately, and doing so by representing the same data in two different ways. First, the polynomial itself. Elliptic curve-based SNARKs, 2019-era STARKs, Plonky2 and other systems generally deal with polynomials over one variable: \(F(x)\). Binius, on the other hand, takes inspiration from the Spartan protocol, and works with multivariate polynomials: \(F(x_1, x_2 ... x_k)\). In fact, we represent the entire computational trace on the "hypercube" of evaluations where each \(x_i\) is either 0 or 1. For example, if we wanted to represent a sequence of Fibonacci numbers, and we were still using a field large enough to represent them, we might visualize the first sixteen of them as being something like this:

That is, \(F(0,0,0,0)\) would be 1, \(F(1,0,0,0)\) would also be 1, \(F(0,1,0,0)\) would be 2, and so forth, up until we get to \(F(1,1,1,1) = 987\). Given such a hypercube of evaluations, there is exactly one multilinear (degree-1 in each variable) polynomial that produces those evaluations. So we can think of that set of evaluations as representing the polynomial; we never actually need to bother computing the coefficients.

This example is of course just for illustration: in practice, the whole point of going to a hypercube is to let us work with individual bits. The "Binius-native" way to count Fibonacci numbers would be to use a higher-dimensional cube, using each set of eg. 16 bits to store a number. This requires some cleverness to implement integer addition on top of the bits, but with Binius it's not too difficult.

Now, we get to the erasure coding. The way STARKs work is: you take \(n\) values, Reed-Solomon extend them to a larger number of values (often \(8n\), usually between \(2n\) and \(32n\)), and then randomly select some Merkle branches from the extension and perform some kind of check on them. A hypercube has length 2 in each dimension. Hence, it's not practical to extend it directly: there's not enough "space" to sample Merkle branches from 16 values. So what do we do instead? We pretend the hypercube is a square!

Simple Binius - an example

See here for a python implementation of this protocol.

Let's go through an example, using regular integers as our field for convenience (in a real implementation this will be binary field elements). First, we take the hypercube we want to commit to, and encode it as a square:

Now, we Reed-Solomon extend the square. That is, we treat each row as being a degree-3 polynomial evaluated at x = {0, 1, 2, 3}, and evaluate the same polynomial at x = {4, 5, 6, 7}:

Notice that the numbers blow up quickly! This is why in a real implementation, we always use a finite field for this, instead of regular integers: if we used integers modulo 11, for example, the extension of the first row would just be [3, 10, 0, 6].

If you want to play around with extending and verify the numbers here for yourself, you can use my simple Reed-Solomon extension code here.

Next, we treat this extension as columns, and make a Merkle tree of the columns. The root of the Merkle tree is our commitment.

Now, let's suppose that the prover wants to prove an evaluation of this polynomial at some point \(r = \{r_0, r_1, r_2, r_3\}\). There is one nuance in Binius that makes it somewhat weaker than other polynomial commitment schemes: the prover should not know, or be able to guess, \(s\), until after they committed to the Merkle root (in other words, \(r\) should be a pseudo-random value that depends on the Merkle root). This makes the scheme useless for "database lookup" (eg. "ok you gave me the Merkle root, now prove to me \(P(0, 0, 1, 0)\)!"). But the actual zero-knowledge proof protocols that we use generally don't need "database lookup"; they simply need to check the polynomial at a random evaluation point. Hence, this restriction is okay for our purposes.

Suppose we pick \(r = \{1, 2, 3, 4\}\) (the polynomial, at this point, evaluates to \(-137\); you can confirm it with this code). Now, we get into the process of actually making the proof. We split up \(r\) into two parts: the first part \(\{1, 2\}\) representing a linear combination of columns within a row, and the second part \(\{3, 4\}\) representing a linear combination of rows. We compute a "tensor product", both for the column part:

\[\bigotimes_{i=0}^1 (1 - r_i, r_i)\]

And for the row part:

\[\bigotimes_{i=2}^3 (1 - r_i, r_i)\]

What this means is: a list of all possible products of one value from each set. In the row case, we get:

\[[(1 - r_2) * (1 - r_3), r_2 * (1 - r_3), (1 - r_2) * r_3, r_2 * r_3]\]

Using \(r = \{1, 2, 3, 4\}\) (so \(r_2 = 3\) and \(r_3 = 4\)):

\[ [(1 - 3) * (1 - 4), 3 * (1 - 4), (1 - 3) * 4, 3 * 4] \\ = [6, -9, -8, 12]\]

Now, we compute a new "row" \(t'\), by taking this linear combination of the existing rows. That is, we take:

\[\begin{matrix}[3, 1, 4, 1] * 6\ + \\ [5, 9, 2, 6] * (-9)\ + \\ [5, 3, 5, 8] * (-8)\ + \\ [9, 7, 9, 3] * 12 = \\ [41, -15, 74, -76] \end{matrix}\]

You can view what's going on here as a partial evaluation. If we were to multiply the full tensor product \(\bigotimes_{i=0}^3 (1 - r_i, r_i)\) by the full vector of all values, you would get the evaluation \(P(1, 2, 3, 4) = -137\). Here we're multiplying a partial tensor product that only uses half the evaluation coordinates, and we're reducing a grid of \(N\) values to a row of \(\sqrt{N}\) values. If you give this row to someone else, they can use the tensor product of the other half of the evaluation coordinates to complete the rest of the computation.

The prover provides the verifier with this new row, \(t'\), as well as the Merkle proofs of some randomly sampled columns. This is \(O(\sqrt{N})\) data. In our illustrative example, we'll have the prover provide just the last column; in real life, the prover would need to provide a few dozen columns to achieve adequate security.

Now, we take advantage of the linearity of Reed-Solomon codes. The key property that we use is: taking a linear combination of a Reed-Solomon extension gives the same result as a Reed-Solomon extension of a linear combination. This kind of "order independence" often happens when you have two operations that are both linear.

The verifier does exactly this. They compute the extension of \(t'\), and they compute the same linear combination of columns that the prover computed before (but only to the columns provided by the prover), and verify that these two procedures give the same answer.

In this case, extending \(t'\), and computing the same linear combination (\([6, -9, -8, 12]\)) of the column, both give the same answer: \(-10746\). This proves that the Merkle root was constructed "in good faith" (or it at least "close enough"), and it matches \(t'\): at least the great majority of the columns are compatible with each other and with \(t'\).

But the verifier still needs to check one more thing: actually check the evaluation of the polynomial at \(\{r_0 .. r_3\}\). So far, none of the verifier's steps actually depended on the value that the prover claimed. So here is how we do that check. We take the tensor product of what we labelled as the "column part" of the evaluation point:

\[\bigotimes_{i=0}^1 (1 - r_i, r_i)\]

In our example, where \(r = \{1, 2, 3, 4\}\) (so the half that chooses the column is \(\{1, 2\}\)), this equals:

\[ [(1 - 1) * (1 - 2), 1 * (1 - 2), (1 - 1) * 2, 1 * 2] \\ = [0, -1, 0, 2]\]

So now we take this linear combination of \(t'\):

\[ 0 * 41 + (-1) * (-15) + 0 * 74 + 2 * (-76) = -137 \]

Which exactly equals the answer you get if you evaluate the polynomial directly.

The above is pretty close to a complete description of the "simple" Binius protocol. This already has some interesting advantages: for example, because the data is split into rows and columns, you only need a field half the size. But this doesn't come close to realizing the full benefits of doing computation in binary. For this, we will need the full Binius protocol. But first, let's get a deeper understanding of binary fields.

Binary fields

The smallest possible field is arithmetic modulo 2, which is so small that we can write out its addition and multiplication tables:

+	0	1
0	0	1
1	1	0

*	0	1
0	0	0
1	0	1

We can make larger binary fields by taking extensions: if we start with \(F_2\) (integers modulo 2) and then define \(x\) where \(x^2 = x + 1\), we get the following addition and multiplication tables:

+	0	1	x	x+1
0	0	1	x	x+1
1	1	0	x+1	x
x	x	x+1	0	1
x+1	x+1	x	1	0

+	1	x	x+1
0	0	0	0
1	1	x	x+1
x	x	x+1	1
x+1	x+1	1	x

It turns out that we can expand the binary field to arbitrarily large sizes by repeating this construction. Unlike with complex numbers over reals, where you can add one new element \(i\), but you can't add any more (quaternions do exist, but they're mathematically weird, eg. \(ab \neq ba\)), with finite fields you can keep adding new extensions forever. Specifically, we define elements as follows:

\(x_0\) satisfies \(x_0^2 = x_0 + 1\)
\(x_1\) satisfies \(x_1^2 = x_1x_0 + 1\)
\(x_2\) satisfies \(x_2^2 = x_2x_1 + 1\)
\(x_3\) satisfies \(x_3^2 = x_3x_2 + 1\)

And so on. This is often called the tower construction, because of how each successive extension can be viewed as adding a new layer to a tower. This is not the only way to construct binary fields of arbitary size, but it has some unique advantages that Binius takes advantage of.

We can represent these numbers as a list of bits, eg. \(\texttt{1100101010001111}\). The first bit represents multiples of 1, the second bit represents multiples of \(x_0\), then subsequent bits represent multiples of: \(x_1\), \(x_1 * x_0\), \(x_2\), \(x_2 * x_0\), and so forth. This encoding is nice because you can decompose it:

\(\texttt{1100101010001111} = \texttt{11001010} + \texttt{10001111} * x_3\) \(= \texttt{1100} + \texttt{1010} * x_2 + \texttt{1000} * x_3 + \texttt{1111} * x_2x_3\) \(= \texttt{11} + \texttt{10} * x_2 + \texttt{10} * x_2x_1 + \texttt{10} * x_3 + \texttt{11} * x_2x_3 + \texttt{11} * x_1x_2x_3\) \(= 1 + x_0 + x_2 + x_2x_1 + x_3 + x_2x_3 + x_0x_2x_3 + x_1x_2x_3 + x_0x_1x_2x_3\)

This is a relatively uncommon notation, but I like representing binary field elements as integers, taking the bit representation where more-significant bits are to the right. That is, \(\texttt{1} = 1\), \(x_0 = \texttt{01} = 2\), \(1 + x_0 = \texttt{11} = 3\), \(1 + x_0 + x_2 = \texttt{11001000} = 19\), and so forth. \(\texttt{1100101010001111}\) is, in this representation, 61779.

Addition in binary fields is just XOR (and, incidentally, so is subtraction); note that this implies that \(x + x = 0\) for any \(x\). To multiply two elements \(x * y\), there's a pretty simple recursive algorithm: split each number into two halves:

\(x = L_x + R_x * x_k\) \(y = L_y + R_y * x_k\)

Then, split up the multiplication:

\(x * y = (L_x * L_y) + (L_x * R_y) * x_k + (R_x * L_y) * x_k + (R_x * R_y) * x_k^2\)

The last piece is the only slightly tricky one, because you have to apply the reduction rule, and replace \(R_x * R_y * x_k^2\) with \(R_x * R_y * (x_{k-1} * x_k + 1)\). There are more efficient ways to do multiplication, analogues of the Karatsuba algorithm and fast Fourier transforms, but I will leave it as an exercise to the interested reader to figure those out.

Division in binary fields is done by combining multiplication and inversion: \(\frac{3}{5} = 3 * \frac{1}{5}\). The "simple but slow" way to do inversion is an application of generalized Fermat's little theorem: \(\frac{1}{x} = x^{2^{2^k}-2}\) for any \(k\) where \(2^{2^k} > x\). In this case, \(\frac{1}{5} = 5^{14} = 14\), and so \(\frac{3}{5} = 3 * 14 = 9\). There is also a more complicated but more efficient inversion algorithm, which you can find here. You can use the code here to play around with binary field addition, multiplication and division yourself.

Left: addition table for four-bit binary field elements (ie. elements made up only of combinations of \(1\), \(x_0\), \(x_1\) and \(x_0x_1\)). Right: multiplication table for four-bit binary field elements.

The beautiful thing about this type of binary field is that it combines some of the best parts of "regular" integers and modular arithmetic. Like regular integers, binary field elements are unbounded: you can keep extending as far as you want. But like modular arithmetic, if you do operations over values within a certain size limit, all of your answers also stay within the same bound. For example, if you take successive powers of \(42\), you get:

\[1, 42, 199, 215, 245, 249, 180, 91...\]

And after 255 steps, you get right back to \(42^{255} = 1\). And like both regular integers and modular arithmetic, they obey the usual laws of mathematics: \(a*b = b*a\), \(a * (b+c) = a*b + a*c\), and even some strange new laws, eg. \(a^2 + b^2 = (a+b)^2\) (the usual \(2ab\) term is missing, because in a binary field, \(1 + 1 = 0\)).

And finally, binary fields work conveniently with bits: if you do math with numbers that fit into \(2^k\) bits, then all of your outputs will also fit into \(2^k\) bits. This avoids awkwardness like eg. with Ethereum's EIP-4844, where the individual "chunks" of a blob have to be numbers modulo 52435875175126190479447740508185965837690552500527637822603658699938581184513, and so encoding binary data involves throwing away a bit of space and doing extra checks at the application layer to make sure that each element is storing a value less than \(2^{248}\). It also means that binary field arithmetic is super fast on computers - both CPUs, and theoretically optimal FPGA and ASIC designs.

This all means that we can do things like the Reed-Solomon encoding that we did above, in a way that completely avoids integers "blowing up" like we saw in our example, and in a way that is extremely "native" to the kind of calculation that computers are good at. The "splitting" property of binary fields - how we were able to do \(\texttt{1100101010001111} = \texttt{11001010} + \texttt{10001111} * x_3\), and then keep splitting as little or as much as we wanted, is also crucial for enabling a lot of flexibility.

Full Binius

See here for a python implementation of this protocol.

Now, we can get to "full Binius", which adjusts "simple Binius" to (i) work over binary fields, and (ii) let us commit to individual bits. This protocol is tricky to understand, because it keeps going back and forth between different ways of looking at a matrix of bits; it certainly took me longer to understand than it usually takes me to understand a cryptographic protocol. But once you understand binary fields, the good news is that there isn't any "harder math" that Binius depends on. This is not elliptic curve pairings, where there are deeper and deeper rabbit holes of algebraic geometry to go down; here, binary fields are all you need.

Let's look again at the full diagram:

By now, you should be familiar with most of the components. The idea of "flattening" a hypercube into a grid, the idea of computing a row combination and a column combination as tensor products of the evaluation point, and the idea of checking equivalence between "Reed-Solomon extending then computing the row combination", and "computing the row combination then Reed-Solomon extending", were all in simple Binius.

What's new in "full Binius"? Basically three things:

The individual values in the hypercube, and in the square, have to be bits (0 or 1)
The extension process extends bits into more bits, by grouping bits into columns and temporarily pretending that they are larger field elements
After the row combination step, there's an element-wise "decompose into bits" step, which converts the extension back into bits

We will go through both in turn. First, the new extension procedure. A Reed-Solomon code has the fundamental limitation that if you are extending \(n\) values to \(k*n\) values, you need to be working in a field that has \(k*n\) different values that you can use as coordinates. With \(F_2\) (aka, bits), you cannot do that. And so what we do is, we "pack" adjacent \(F_2\) elements together into larger values. In the example here, we're packing two bits at a time into elements in \(\{0, 1, 2, 3\}\), because our extension only has four evaluation points and so that's enough for us. In a "real" proof, we would probably back 16 bits at a time together. We then do the Reed-Solomon code over these packed values, and unpack them again into bits.

Now, the row combination. To make "evaluate at a random point" checks cryptographically secure, we need that point to be sampled from a pretty large space, much larger than the hypercube itself. Hence, while the points within the hypercube are bits, evaluations outside the hypercube will be much larger. In our example above, the "row combination" ends up being \([11, 4, 6, 1]\).

This presents a problem: we know how to combine pairs of bits into a larger value, and then do a Reed-Solomon extension on that, but how do you do the same to pairs of much larger values?

The trick in Binius is to do it bitwise: we look at the individual bits of each value (eg. for what we labeled as "11", that's \([1, 1, 0, 1]\)), and then we extend row-wise. That is, we perform the extension procedure on the \(1\) row of each element, then on the \(x_0\) row, then on the "\(x_1\)" row, then on the \(x_0 * x_1\) row, and so forth (well, in our toy example we stop there, but in a real implementation we would go up to 128 rows (the last one being \(x_6 *\ ... *\ x_0\))).

Recapping:

We take the bits in the hypercube, and convert them into a grid
Then, we treat adjacent groups of bits on each row as larger field elements, and do arithmetic on them to Reed-Solomon extend the rows
Then, we take a row combination of each column of bits, and get a (for squares larger than 4x4, much smaller) column of bits for each row as the output
Then, we look at the output as a matrix, and treat the bits of that as rows again

Why does this work? In "normal" math, the ability to (often) do linear operations in either order and get the same result stops working if you start slicing a number up by digits. For example, if I start with the number 345, and I multiply it by 8 and then by 3, I get 8280, and if do those two operations in reverse, I also do 8280. But if I insert a "split by digit" operation in between the two steps, it breaks down: if you do 8x then 3x, you get:

\[345 \xrightarrow{\times 8} 2760 \rightarrow [2, 7, 6, 0] \xrightarrow{\times 3} [6, 21, 18, 0]\]

But if you do 3x then 8x, you get:

\[345 \xrightarrow{\times 3} 1035 \rightarrow [1, 0, 3, 5] \xrightarrow{\times 8} [8, 0, 24, 40]\]

But in binary fields built with the tower construction, this kind of thing does work. The reason why has to do with their separability: if you multiply a big value by a small value, what happens in each segment, stays in each segment. If we multiply \(\texttt{1100101010001111}\) by \(\texttt{11}\), that's the same as first decomposing \(\texttt{1100101010001111}\) into \(\texttt{11} + \texttt{10} * x_2 + \texttt{10} * x_2x_1 + \texttt{10} * x_3 + \texttt{11} * x_2x_3 + \texttt{11} * x_1x_2x_3\), and then multiplying each component by \(\texttt{11}\) separately.

Putting it all together

Generally, zero knowledge proof systems work by making statements about polynomials that simultaneously represent statements about the underlying evaluations: just like we saw in the Fibonacci example, \(F(X+2) - F(X+1) - F(X) = Z(X) * H(X)\) simultaneously checks all steps of the Fibonacci computation. We check statements about polynomials by proving evaluations at a random point: given a commitment to \(F\), you might randomly choose eg. 1892470, demand proofs of evaluations of \(F\), \(Z\) and \(H\) at that point (and \(H\) at adjacent points), check those proofs, and then check if \(F(1892472) - F(1892471) - F(1892470)\) \(= Z(1892470) * H(1892470)\). This check at a random point stands in for checking the whole polynomial: if the polynomial equation doesn't match, the chance that it matches at a specific random coordinate is tiny.

In practice, a major source of inefficiency comes from the fact that in real programs, most of the numbers we are working with are tiny: indices in for loops, True/False values, counters, and similar things. But when we "extend" the data using Reed-Solomon encoding to give it the redundancy needed to make Merkle proof-based checks safe, most of the "extra" values end up taking up the full size of a field, even if the original values are small.

To get around this, we want to make the field as small as possible. Plonky2 brought us down from 256-bit numbers to 64-bit numbers, and then Plonky3 went further to 31 bits. But even this is sub-optimal. With binary fields, we can work over individual bits. This makes the encoding "dense": if your actual underlying data has n bits, then your encoding will have n bits, and the extension will have 8 * n bits, with no extra overhead.

Now, let's look at the diagram a third time:

In Binius, we are committing to a multilinear polynomial: a hypercube \(P(x_0, x_1 ... x_k)\), where the individual evaluations \(P(0, 0 ... 0)\), \(P(0, 0 ... 1)\) up to \(P(1, 1, ... 1)\) are holding the data that we care about. To prove an evaluation at a point, we "re-interpret" the same data as a square. We then extend each row, using Reed-Solomon encoding over groups of bits, to give the data the redundancy needed for random Merkle branch queries to be secure. We then compute a random linear combination of rows, with coefficients designed so that the new combined row actually holds the evaluation that we care about. Both this newly-created row (which get re-interpreted as 128 rows of bits), and a few randomly-selected columns with Merkle branches, get passed to the verifier. This is \(O(\sqrt{N})\) data: the new row has \(O(\sqrt{N})\) size, and each of the (constant number of) columns that get passed has \(O(\sqrt{N})\) size.

The verifier then does a "row combination of the extension" (or rather, a few columns of the extension), and an "extension of the row combination", and verifies that the two match. They then compute a column combination, and check that it returns the value that the prover is claiming. And there's our proof system (or rather, the polynomial commitment scheme, which is the key building block of a proof system).

What did we not cover?

Efficient algorithms to extend the rows, which are needed to actually make the computational efficiency of the verifier \(O(\sqrt{N})\). With naive Lagrange interpolation, we can only get \(O(N^{\frac{2}{3}})\). For this, we use Fast Fourier transforms over binary fields, described here (though the exact implementation will be different, because this post uses a less efficient construction not based on recursive extension).
Arithmetization. Univariate polynomials are convenient because you can do things like \(F(X+2) - F(X+1) - F(X) = Z(X) * H(X)\) to relate adjacent steps in the computation. In a hypercube, the interpretation of "the next step" is not nearly as clean as "\(X + 1\)". You can do \(X * k\) and jump around powers of \(k\), but this jumping around behavior would sacrifice many of the key advantages of Binius. The Binius paper introduces solutions to this (eg. see Section 4.3), but this is a "deep rabbit hole" in its own right.
How to actually safely do specific-value checks. The Fibonacci example required checking key boundary conditions: \(F(0) = F(1) = 1\), and the value of \(F(100)\). But with "raw" Binius, checking at pre-known evaluation points is insecure. There are fairly simple ways to convert a known-evaluation check into an unknown-evaluation check, using what are called sum-check protocols; but we did not get into those here.
Lookup protocols, another technology which has been recently gaining usage as a way to make ultra-efficient proving systems. Binius can be combined with lookup protocols for many applications.
Going beyond square-root verification time. Square root is expensive: a Binius proof of \(2^{32}\) bits is about 11 MB long. You can remedy this using some other proof system to make a "proof of a Binius proof", thus gaining both Binius's efficiency in proving the main statement and a small proof size. Another option is the much more complicated FRI-Binius protocol, which creates a poly-logarithmic-sized proof (like regular FRI).
How Binius affects what counts as "SNARK-friendly". The basic summary is that, if you use Binius, you no longer need to care much about making computation "arithmetic-friendly": "regular" hashes are no longer more efficient than traditional arithmetic hashes, multiplication modulo \(2^{32}\) or modulo \(2^{256}\) is no longer a big headache compared to multiplication modulo \(p\), and so forth. But this is a complicated topic; lots of things change when everything is done in binary.

I expect many more improvements in binary-field-based proving techniques in the months ahead.