Circle STARK Verifier: Stwo to Solidity

A native Solidity verifier for Stwo Circle STARKs — the first post-quantum STARK verifier on Ethereum L1

Status

All 72 tests pass. The verifier is parameterized and supports multiple circuits: withdrawal (46 columns, ~20M gas) and transfer (57 columns). See also: From SNARKs to STARKs for the post-quantum migration context.

Abstract

We present a native Solidity implementation of a Circle STARK verifier compatible with the Stwo prover (v2.1.0). Circle STARKs operate over the Mersenne-31 prime field ( $p = 2^{31} - 1$ ) using the circle group $x^2 + y^2 = 1 \pmod{p}$ as the evaluation domain, replacing the traditional multiplicative-group FFT with a circle-group analogue. Our verifier is the first complete Circle STARK verifier deployed as native EVM bytecode on Ethereum L1.

The system consists of two components:

A modified Stwo prover (Rust) with a custom Keccak-256 Merkle channel replacing Blake2s, and a deterministic proof serialization format.
A Solidity verifier (7 libraries, ~1,900 lines) implementing M31/CM31/QM31 field arithmetic, circle domain evaluation, Fiat-Shamir transcript replay, Merkle decommitment, DEEP quotient accumulation, and Circle FRI verification.

The verifier is parameterized via constructor immutables, supporting multiple circuit configurations with zero runtime gas overhead. Currently deployed for the UPP withdrawal circuit (46 trace columns, 5 constraints) and the transfer circuit (57 trace columns, 6 constraints), both using 64 rows ( $2^6$ ), degree-2 constraint bound, and FRI parameters (blowup factor 2, 3 queries, 10-bit proof of work).

1. Introduction

1.1 Motivation

The Universal Private Pool (UPP) requires a zero-knowledge proof for every withdrawal: the user must demonstrate knowledge of a valid note in the pool's Merkle tree, conservation of balances, and (optionally) compliance with an Association Set Provider, without revealing which note is being spent.

Current production ZK systems (Groth16, PLONK) rely on elliptic-curve pairings over BN254 or BLS12-381. These are efficient today but are not post-quantum secure — Shor's algorithm on a sufficiently large quantum computer breaks the discrete logarithm problem underlying these curves.

STARKs (Scalable Transparent Arguments of Knowledge) rely only on hash functions for their security, making them plausibly quantum-resistant. Among STARK implementations, Stwo (StarkWare's next-generation prover) achieves remarkable performance by using:

The Mersenne-31 prime ( $2^{31}-1$ ), which admits extremely fast arithmetic via bit shifts
The circle group over M31, which provides a smooth subgroup structure of order $2^{31}$ — enabling NTT-like transforms without requiring a large multiplicative subgroup

1.2 The Challenge

No native Solidity verifier for Stwo's Circle STARK format existed. StarkNet avoids this problem entirely by wrapping Stwo proofs through Stone (a different STARK system with an existing verifier). Building a native Circle STARK verifier in Solidity requires implementing:

Component	Difficulty	Why
M31 field arithmetic + extensions	Moderate	Three-level tower: M31 → CM31 → QM31
Circle domain evaluation	Hard	Non-standard domain structure unlike multiplicative groups
Circle FRI protocol	Hard	Twin-coset folding, circle-to-line projection
DEEP quotient computation	Hard	Line equations through QM31 conjugate pairs, periodicity samples
Fiat-Shamir transcript	Moderate	Must reproduce Stwo's channel operations exactly
Proof serialization	Moderate	Bit-exact encoding between Rust and Solidity

1.3 Key Design Decision: Keccak over Blake2s

Stwo's default channel uses Blake2s for both Merkle hashing and Fiat-Shamir. Ethereum provides no Blake2s precompile (EIP-152 gives Blake2b, not Blake2s). A pure-Solidity Blake2s would cost ~3,000–5,000 gas per compression — with 3 queries × multiple Merkle paths × FRI layers, this would add millions of gas.

Instead, we implemented a custom Keccak-256 channel in the Rust prover. Stwo's Channel and MerkleChannel traits are generic, allowing us to substitute Keccak-256 while preserving all soundness guarantees. The Solidity verifier then uses the native keccak256 opcode at ~30 gas per hash — a 100× reduction.

This is sound because the channel is purely a Fiat-Shamir construction: any collision-resistant hash function suffices. Keccak-256 provides 128-bit collision resistance, matching the security level of the underlying STARK.

1.4 Results

Metric	Value
Solidity code	~1,900 lines across 7 libraries
Supported circuits	Withdraw (46 cols, 5 constraints), Transfer (57 cols, 6 constraints)
Verification gas	~20M (withdraw)
Deployment size	19,576 bytes
Test coverage	72 tests, all passing
Proof size	~5 KB calldata
Rust prover time	~2ms (release build)
Rust verifier time	~97μs

2. Mathematical Background

2.1 The M31 Field Tower

The base field is the Mersenne prime $\mathbb{F}_p$ where $p = 2^{31} - 1 = 2{,}147{,}483{,}647$ .

Mersenne reduction is remarkably simple: for a product $x = a \times b$ , we compute $x \bmod p$ as $(x \gg 31) + (x \mathbin{\&} p)$ , with at most one subtraction of $p$ . This fits naturally in EVM uint32 arithmetic.

The extension tower:

\mathbb{F}_p \hookrightarrow \mathbb{F}_{p^2} \hookrightarrow \mathbb{F}_{p^4}

CM31 (Complex M31): $\mathbb{F}_{p^2} = \mathbb{F}_p[i] / (i^2 + 1)$ . Elements are $a + bi$ with $i^2 = -1$ . This is the standard Gaussian integer construction modulo $p$ . We pack CM31 as uint64: low 32 bits = real, high 32 bits = imaginary.

QM31 (Quartic M31): $\mathbb{F}_{p^4} = \mathbb{F}_{p^2}[u] / (u^2 - (2+i))$ . Elements are $a + bu$ where $a, b \in$ CM31 and $u^2 = 2 + i$ . This is Stwo's SecureField, used for all random challenges. We pack QM31 as uint128: low 64 bits = first CM31, high 64 bits = second CM31.

The irreducible polynomial $u^2 - (2+i)$ is carefully chosen so that $2+i$ is a non-square in CM31, guaranteeing the extension is irreducible.

2.2 The M31 Circle Group

The unit circle over $\mathbb{F}_p$ :

\mathcal{C} = \{(x, y) \in \mathbb{F}_p^2 : x^2 + y^2 = 1\}

forms a group under the operation:

(x_1, y_1) \circ (x_2, y_2) = (x_1 x_2 - y_1 y_2, \; x_1 y_2 + y_1 x_2)

This is isomorphic to complex multiplication on the unit circle: if we identify $(x, y)$ with $x + iy$ , then the group operation is multiplication in $\mathbb{F}_p[i]$ restricted to norm-1 elements.

For $p = 2^{31} - 1$ , the circle group has order $p + 1 = 2^{31}$ , which is a perfect power of 2. This means the group contains subgroups of every order $2^k$ for $k \leq 31$ — exactly the smooth structure needed for FFT-like transforms.

The generator is $G = (2, 1268011823)$ , with $|G| = 2^{31}$ . The subgroup generator of order $2^k$ is:

G_k = G^{2^{31-k}}

represented as a CirclePointIndex $= 2^{31-k}$ (the scalar exponent).

2.3 Circle Domains and Cosets

Stwo uses canonical cosets as evaluation domains. For a trace of $2^n$ rows, the evaluation domain is:

D_n = \text{CanonicCoset}(n).\text{circle\_domain}()

The circle domain is constructed from a half-coset:

half_odds(n-1) = Coset::new(initial: G_{n+1}, step: G_{n-1}, log_size: n-1)

Note the index arithmetic: half_odds(n) uses subgroupGen(n+2) as the initial point and subgroupGen(n) as the step — the "+2" offset is critical and a common source of bugs.

The full circle domain of size $2^n$ consists of:

First half ( $i < 2^{n-1}$ ): points from the half-coset
Second half ( $i \geq 2^{n-1}$ ): their conjugates (negations on the circle: $(x, y) \mapsto (x, -y)$ )

Points are accessed in bit-reversed order for efficient FRI folding.

2.4 QM31 Complex Conjugation

Critical Implementation Detail

This was one of the two critical bugs found during the port. Getting this wrong produces correct-looking intermediate values that diverge subtly.

Stwo defines ComplexConjugate for QM31 as:

// In Stwo's source:
impl ComplexConjugate for QM31 {
    fn complex_conjugate(&self) -> Self {
        Self(self.0, -self.1)  // Negate entire second CM31
    }
}

This means for a QM31 element $(a_0, a_1, a_2, a_3)$ packed as $((a_0 + a_1 i) + (a_2 + a_3 i) \cdot u)$ :

\overline{q} = (a_0 + a_1 i) + (-(a_2 + a_3 i)) \cdot u = (a_0, a_1, -a_2, -a_3)

This is not element-wise CM31 conjugation (which would give $(a_0, -a_1, a_2, -a_3)$ ). It negates the entire second CM31 component as a unit. This distinction is critical for the DEEP quotient line equations.

3. System Architecture

3.1 Overview

Circle STARK Architecture: Stwo Prover (Rust) generates proof bytes consumed by the Solidity Verifier (EVM) — End-to-end architecture: Rust prover modules produce serialized proof bytes, verified by 7 Solidity libraries on-chain

3.2 Proof Generation (Rust)

The Rust prover (upp-sdk/stwo-prover/) consists of:

Module	Lines	Purpose
`withdraw/air.rs`	190	Withdrawal AIR constraints (5 constraints, 46 columns)
`withdraw/prover.rs`	~500	Withdrawal trace generation, commitment, proof orchestration
`withdraw/types.rs`	~60	Withdrawal input/output types
`transfer/air.rs`	~210	Transfer AIR constraints (6 constraints, 57 columns)
`transfer/prover.rs`	~465	Transfer trace generation, commitment, proof orchestration
`transfer/types.rs`	~130	Transfer input/output types
`keccak_channel.rs`	~470	Custom Keccak-256 channel + Merkle hasher
`serialize.rs`	~600	Proof → flat bytes for Solidity

3.3 Verification (Solidity)

The Solidity verifier (upp-sdk/src/contracts/src/stark/) consists of:

Library	Lines	Purpose
`M31Lib.sol`	311	M31, CM31, QM31 arithmetic + circle point operations
`KeccakChannel.sol`	213	Fiat-Shamir channel: mix/draw, PoW verification
`CircleDomain.sol`	188	Circle domain point evaluation, bit reversal, FRI domains
`MerkleVerifier.sol`	161	Keccak-256 Merkle tree decommitment verification
`FriVerifier.sol`	222	FRI folding: circle→line, line→line, last layer
`OodQuotients.sol`	221	DEEP quotient accumulation with periodicity (parameterized multi-mask columns)
`CircleStarkVerifier.sol`	497	Main orchestrator: 7-phase verification flow (parameterized via constructor immutables)

3.3.1 Parameterized Verifier

The verifier is parameterized via Solidity immutable constructor arguments, allowing the same contract code to support different circuit configurations:

uint32 public immutable TRACE_WIDTH;        // 46 (withdraw) or 57 (transfer)
uint32 public immutable MULTI_MASK_COL_0;   // 25 (state Merkle chain)
uint32 public immutable MULTI_MASK_COL_1;   // 29 (ASP Merkle chain)
uint32 public immutable N_TRACE_OOD_VALUES; // TRACE_WIDTH - 2 + 4

The deploy script creates separate instances for each circuit type:

CircleStarkVerifier withdrawVerifier = new CircleStarkVerifier(46, 25, 29);
CircleStarkVerifier transferVerifier = new CircleStarkVerifier(57, 25, 29);

Since Solidity inlines immutable values into the deployed bytecode, there is zero runtime gas overhead compared to hardcoded constants. Each instance has its own bytecode with the parameters baked in at deploy time.

3.4 Verification Flow

The verify() function executes 7 sequential phases:

Phase 1: Parse commitments + replay Fiat-Shamir transcript
         → Extract 3 Merkle roots, OOD point, sampled values
         → Draw FRI random coefficient (α)

Phase 2: Parse decommitments + queried values + PoW nonce
         → Read Merkle witnesses, trace/composition column values

Phase 3: Parse FRI proof
         → Read layer commitments, witness evaluations, decommitments
         → Mix each FRI commitment into channel, draw fold alphas
         → Read last-layer polynomial coefficients

Phase 3.5: Verify proof of work
           → Check nonce produces sufficient trailing zeros

Phase 4: Draw query positions
         → Sample N_QUERIES positions from channel via rejection sampling
         → Sort positions (insertion sort, small array)

Phase 5: Verify Merkle decommitments
         → Reconstruct leaf hashes from queried column values
         → Walk Merkle tree bottom-up against committed roots

Phase 6: Compute DEEP quotients
         → Build OOD + shifted points on QM31 circle
         → Accumulate quotient contributions across all columns
         → Handle periodicity samples for shifted columns

Phase 7: Verify FRI
         → Circle-to-line fold (first layer)
         → Line-to-line fold (5 inner layers)
         → Polynomial evaluation check (last layer)

4. The Withdrawal Circuit

4.1 Trace Layout

The withdrawal circuit uses 46 columns across 64 rows ( $\log_2 = 6$ ):

Columns	Name	Purpose
0–7	`owner_secret[0..7]`	Private spending key (8 × M31 limbs, 248 bits of entropy)
8	`input_amount`	Note amount being spent
9	`input_blinding`	Note randomness
10	`input_origin`	Note origin address
11	`token`	ERC-20 token address
12	`leaf_index`	Position in state Merkle tree
13–16	`owner_hash[0..3]`	Keccak hash of owner (4 × M31 limbs)
17–20	`commitment[0..3]`	Note commitment (4 × M31 limbs)
21–24	`nullifier[0..3]`	Spend nullifier (4 × M31 limbs)
25	`merkle_current`	State Merkle chain: current hash
26	`merkle_sibling`	State Merkle chain: sibling hash
27	`merkle_index`	State Merkle chain: path bit
28	`merkle_result`	State Merkle chain: computed parent
29	`asp_current`	ASP Merkle chain: current hash
30	`asp_sibling`	ASP Merkle chain: sibling hash
31	`asp_index`	ASP Merkle chain: path bit
32	`asp_result`	ASP Merkle chain: computed parent
33	`withdraw_amount`	Amount going to recipient
34	`change_amount`	Amount remaining as change note
35–38	`change_owner_hash[0..3]`	Change note owner
39	`change_blinding`	Change note randomness
40–43	`change_commitment[0..3]`	Change note commitment
44	`is_ragequit`	Flag: ragequit bypass
45	`recipient`	Withdrawal recipient address

4.2 Constraint System

Five algebraic constraints, all degree ≤ 2, enforced at every row:

C1. Balance conservation:

\text{input\_amount} - \text{withdraw\_amount} - \text{change\_amount} = 0

C2. State Merkle chain continuity:

\text{merkle\_result}[i] - \text{merkle\_current}[i+1] = 0

C3. ASP Merkle chain continuity:

\text{asp\_result}[i] - \text{asp\_current}[i+1] = 0

C4. Ragequit origin check:

\text{is\_ragequit} \times (\text{recipient} - \text{input\_origin}) = 0

C5. Binary constraint:

\text{is\_ragequit} \times (1 - \text{is\_ragequit}) = 0

Constraints C2 and C3 use mask offsets [0, 1] (reading both current and next row), which makes columns 25 and 29 "shifted" columns — a distinction that becomes critical during quotient computation.

4.3 Hash-and-Check Design

The circuit uses a hash-and-check pattern: Merkle paths and hash computations (Keccak) are precomputed outside the STARK trace and committed as witness values. The STARK constraints then verify the chain structure (result feeds into next level's current) rather than re-implementing hash functions inside the constraint system.

This is why 64 rows suffice for a 32-level Merkle tree: each row represents one level of the Merkle path, with the chain constraints (C2, C3) ensuring vertical consistency.

4A. The Transfer Circuit

The transfer circuit enables 1-input-2-output private transfers: spend one note and create two new notes that stay in the pool. No funds leave — amounts are hidden. Output notes can have different owners, enabling private payments to other users.

4A.1 Trace Layout

The transfer circuit uses 57 columns across 64 rows ( $\log_2 = 6$ ):

Columns	Name	Purpose
0–32	(same as withdrawal)	Input note (8-limb owner secret, scalars, hashes), state Merkle, ASP Merkle
33	`output1_amount`	Amount sent to recipient
34–37	`output1_owner_hash[0..3]`	Recipient's owner hash (4 × M31 limbs)
38	`output1_blinding`	Recipient note randomness
39	`output1_origin`	Recipient note origin address
40–43	`output1_commitment[0..3]`	Recipient note commitment (4 × M31 limbs)
44	`output2_amount`	Change amount back to sender
45–48	`output2_owner_hash[0..3]`	Sender's owner hash (change note)
49	`output2_blinding`	Change note randomness
50	`output2_origin`	Change note origin address
51–54	`output2_commitment[0..3]`	Change note commitment (4 × M31 limbs)
55	`is_ragequit`	Flag: ragequit bypass
56	`reserved`	Reserved (zero-padded)

Columns 0–32 are identical to the withdrawal circuit. The withdrawal-specific columns (33–45: withdraw_amount, change fields, recipient address) are replaced with two full output note structures.

4A.2 Constraint System

Six algebraic constraints, all degree $\leq$ 2, enforced at every row:

C1. Balance conservation:

\text{input\_amount} - \text{output1\_amount} - \text{output2\_amount} = 0

C2. State Merkle chain continuity: (same as withdrawal)

\text{merkle\_result}[i] - \text{merkle\_current}[i+1] = 0

C3. ASP Merkle chain continuity: (same as withdrawal)

\text{asp\_result}[i] - \text{asp\_current}[i+1] = 0

C4. Origin check (output 1):

\text{is\_ragequit} \times (\text{output1\_origin} - \text{input\_origin}) = 0

C5. Origin check (output 2):

\text{is\_ragequit} \times (\text{output2\_origin} - \text{input\_origin}) = 0

C6. Binary constraint:

\text{is\_ragequit} \times (1 - \text{is\_ragequit}) = 0

The origin constraints (C4, C5) enforce that when ragequitting, both output notes must preserve the original depositor's origin address. This prevents ragequit transfers from laundering the origin. In normal mode ( $\text{is\_ragequit} = 0$ ), output origins are unconstrained — the ASP allowlist provides the compliance guarantee instead.

Note that output owner hashes are not constrained by the circuit. They are baked into the output commitments (which are public inputs), but the circuit does not enforce that they match the sender. This is by design — it enables transfers to different recipients.

4A.3 Multi-Mask Columns

Same as the withdrawal circuit: columns 25 (state Merkle merkle_current) and 29 (ASP Merkle asp_current) use mask offsets [0, 1], making them "shifted" columns. Both circuits share these column indices, which are passed as constructor parameters to the verifier (MULTI_MASK_COL_0 = 25, MULTI_MASK_COL_1 = 29).

4A.4 Fiat-Shamir Seed

The transfer circuit binds public inputs via:

seed = keccak256(abi.encodePacked(
    nullifier,           // bytes32
    stateRoot,           // uint256
    aspRoot,             // uint256
    token,               // address (20 bytes)
    outputCommitment1,   // bytes32
    outputCommitment2    // bytes32
))

Unlike the withdrawal seed, there is no amount or recipient field — transfers don't reveal amounts and both outputs stay in the pool.

5. Field Arithmetic in Solidity

5.1 Bit-Packing Strategy

EVM words are 256 bits. We pack field elements to minimize memory operations:

Type	Packed as	Layout
M31	`uint32`	Single 31-bit value (max $2^{31}-2$ )
CM31	`uint64`	Low 32: real, High 32: imaginary
QM31	`uint128`	Low 64: first CM31, High 64: second CM31

This allows two QM31 elements per EVM word and makes pack/unpack operations simple bit shifts.

5.2 M31 Arithmetic

Addition: $(a + b) \bmod p$ . Single addition + conditional subtraction.

Multiplication: Uses the Mersenne reduction identity. For $p = 2^{31}-1$ and product $x = a \times b$ :

x \bmod p = (x \gg 31) + (x \mathbin{\&} p)

with at most one subtraction of $p$ . This avoids expensive modular reduction.

Inversion: Via Fermat's little theorem: $a^{-1} = a^{p-2} \bmod p$ . Uses binary exponentiation (~31 multiplications).

5.3 CM31 Arithmetic

Standard complex multiplication:

(a_0 + a_1 i)(b_0 + b_1 i) = (a_0 b_0 - a_1 b_1) + (a_0 b_1 + a_1 b_0)i

Inversion uses the norm: $\frac{1}{a+bi} = \frac{a-bi}{a^2+b^2}$

5.4 QM31 Arithmetic

Multiplication in $\mathbb{F}_{p^2}[u]/(u^2 - R)$ where $R = 2+i$ :

(a + bu)(c + du) = (ac + R \cdot bd) + (ad + bc)u

Inversion: $\frac{1}{a+bu} = \frac{a-bu}{a^2 - R \cdot b^2}$ where $a^2 - R \cdot b^2$ is a CM31 element inverted via cm31Inv.

See the companion page Field Arithmetic Deep Dive for exhaustive test vectors and gas benchmarks.

6. Fiat-Shamir: The Keccak Channel

6.1 Channel State

The channel maintains:

digest: a 32-byte Keccak state (initially zero)
nDraws: counter for draw operations (reset on each mix)

6.2 Mix Operations

mixRoot(root): digest = keccak256(digest ‖ root). Used when the prover commits a Merkle root.

mixFelts(felts): digest = keccak256(digest ‖ felts[0]_LE ‖ felts[1]_LE ‖ ...). Each QM31 is encoded as 4 × M31, each M31 as 4 little-endian bytes.

mixU64(value): Used for the PoW nonce.

6.3 Draw Operations

drawU32s(): Produces 8 pseudorandom uint32 values:

hash = keccak256(digest ‖ nDraws_LE ‖ 0x00)
result = parse hash as 8 × LE uint32
nDraws += 1

drawSecureFelt(): Draws 8 uint32s, checks all are in $[0, 2p)$ (rejection sampling with ~ $2^{-28}$ retry probability), reduces to M31, packs first 4 as QM31.

6.4 Proof of Work

Stwo includes a proof-of-work check to increase soundness:

prefixedDigest = keccak256(0x12345678_LE ‖ [0; 12] ‖ digest ‖ nBits_LE)
result = keccak256(prefixedDigest ‖ nonce_LE)
check: trailing_zeros(result_as_LE_u128) ≥ nBits

Our circuit uses nBits = 10, adding ~10 bits of soundness.

7. Circle FRI Protocol

FRI (Fast Reed-Solomon Interactive Oracle Proof) is the core of STARK verification. Circle FRI adapts the standard FRI protocol to circle-group domains.

7.1 Overview

The FRI protocol proves that a committed polynomial has low degree by repeatedly folding it:

First layer (Circle → Line): Fold the circle domain polynomial to a line domain
Inner layers (Line → Line): Standard FRI folding on line domains
Last layer: Check that the final polynomial equals the committed constant

For our parameters:

Input: degree $< 2^7$ polynomial on circle domain of size $2^7$ (LIFTING_LOG_SIZE = 7)
After circle fold: degree $< 2^6$ on line domain of size $2^6$
After 5 inner folds: degree $< 2^1$ on line domain of size $2^1$
Last layer: verify as degree-1 polynomial

7.2 Circle-to-Line Fold

Given evaluations $f(P)$ and $f(\bar{P})$ at conjugate pairs (where $\bar{P} = (x, -y)$ ):

f_{\text{folded}}(x) = \frac{f(P) + f(\bar{P})}{2} + \alpha \cdot \frac{f(P) - f(\bar{P})}{2y}

In our implementation, the denominator $\frac{1}{2}$ is absorbed into the twiddle factor, and we compute:

f0 = fP + fNegP;
f1 = (fP - fNegP) * twiddleInv;  // twiddleInv = 1/y
folded = f0 + alpha * f1;

The twiddle factor is $\frac{1}{y}$ where $y$ is the y-coordinate of the circle domain point (in bit-reversed order).

7.3 Line-to-Line Fold

Standard FRI folding on univariate polynomials over line domains:

f_{\text{folded}}(x^2) = \frac{f(x) + f(-x)}{2} + \alpha \cdot \frac{f(x) - f(-x)}{2x}

The twiddle factor is $\frac{1}{x}$ where $x$ is the line domain point.

7.4 FRI Domain Construction

Each FRI layer operates on a progressively smaller domain:

Layer	Type	Domain	Log Size
First	Circle → Line	`CanonicCoset(7).circle_domain()`	7
Inner 0	Line → Line	`half_odds(6)`	6
Inner 1	Line → Line	`half_odds(6).double()`	5
Inner 2	Line → Line	`half_odds(6).double().double()`	4
Inner 3	Line → Line	...	3
Inner 4	Line → Line	...	2
Last	Polynomial eval	...	1

The line domain half_odds(n) after $k$ doublings has:

initial = subgroupGen(n + 2 - k)
step = subgroupGen(n - k)
log_size = n - k

7.5 Pair Reconstruction

When the verifier has query evaluations at some positions but not their pairs, it fills in missing siblings from the FRI witness. Three cases:

Both siblings queried: Both $f(P)$ and $f(\bar{P})$ are in the query set — use directly
Only even position queried: Odd sibling comes from witness
Only odd position queried: Even sibling comes from witness

After folding each pair, the verifier checks the Merkle decommitment of both siblings against the layer's committed root.

See FRI Protocol Deep Dive for worked examples.

8. DEEP Quotient Computation

The DEEP (Domain Extension for Eliminating Pretenders) technique is where the verifier checks that the committed polynomial agrees with the out-of-domain (OOD) evaluations. This is the most mathematically intricate part of the verifier.

8.1 OOD Point Generation

The verifier draws a random $t \in$ QM31 from the Fiat-Shamir channel and maps it to a point on the QM31 circle via stereographic projection:

x = \frac{1 - t^2}{1 + t^2}, \quad y = \frac{2t}{1 + t^2}

This gives the OOD evaluation point $(x_{\text{ood}}, y_{\text{ood}}) \in \mathcal{C}(\mathbb{F}_{p^4})$ .

The shifted point is $P_{\text{shifted}} = P_{\text{ood}} \cdot G_{\text{trace}}$ , where $G_{\text{trace}}$ is the subgroup generator of order $2^{\text{LOG\_TRACE\_SIZE}}$ .

8.2 Quotient Formula

For each column sample (column value at the query point, OOD evaluation, evaluation point), the verifier computes a quotient contribution using line equations through conjugate pairs.

Given:

$v$ = OOD sample value (QM31)
$P_s$ = sample point (QM31 circle point)
$f(q)$ = queried trace value at query point $q = (q_x, q_y) \in$ M31

The line through $(P_s, v)$ and $(\bar{P_s}, \bar{v})$ is:

\ell(x, y) = a \cdot y + b

where:

a = \bar{v} - v, \quad c = \bar{y_s} - y_s, \quad b = v \cdot c - a \cdot y_s

The quotient contribution is:

Q = \alpha^k \cdot \frac{c \cdot f(q) - (a \cdot q_y + b)}{c \cdot q_x - (a \cdot y_s + b) \cdot \text{(line denominator)}}

where $\alpha$ is the random coefficient and $k$ is the accumulation index.

8.3 The Line Denominator

The denominator for the line equation, evaluated at the query point:

\text{denom} = (P_{s,x}^{\text{re}} - q_x) \cdot P_{s,y}^{\text{im}} - (P_{s,y}^{\text{re}} - q_y) \cdot P_{s,x}^{\text{im}}

where the superscripts denote the CM31 real and imaginary parts of the QM31 coordinates. This is a CM31 value, inverted once per query and reused across columns sharing the same evaluation point.

8.4 Periodicity Samples

Critical Implementation Detail

This was the second critical bug. Missing periodicity samples shifts all subsequent alpha powers, corrupting every quotient after the first shifted column.

Stwo's build_samples_with_randomness_and_periodicity adds an extra quotient contribution for every column with 2 or more mask offsets (i.e., columns that read both current and next rows).

In our circuit, columns 25 (merkle_current) and 29 (asp_current) have mask [0, 1]. For these columns, instead of the normal 1 or 2 contributions, Stwo emits 3 contributions per column:

Periodicity sample — evaluates at the shifted point using the offset-1 OOD value
Offset-0 sample — evaluates at the OOD point using the offset-0 OOD value
Offset-1 sample — evaluates at the shifted point using the offset-1 OOD value

Each contribution consumes one power of $\alpha$ .

For our specific parameters, the periodicity point coincides with the shifted point. This is because:

period_generator = CanonicCoset(lifting_log_size).step().repeated_double(column_extended_log_size)
With column_extended_log_size = lifting_log_size = 7:
- step = G_7 (subgroup generator of order $2^7$ )
- After 7 doublings: `G_7^5 = G_7^128 $, but$ |G_7| = 128$, so this is the identity
Therefore periodicity_point = shifted_point + identity = shifted_point

The implementation:

if (col == 25 || col == 29) {
    // 1. Periodicity sample (at shifted point, offset-1 value)
    uint128 shiftedVal = p.traceOodValues[s.tOodIdx + 1]; // peek ahead
    s.acc = _addQ(s.acc, s.alphaPow, qVal, shiftedVal,
                  pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);

    // 2. Offset-0 sample (at OOD point)
    s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
                  pts.oodsY, ctx.qy, ctx.denomOodsInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);

    // 3. Offset-1 sample (at shifted point)
    s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
                  pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);
}

The total number of alpha powers consumed is: 37 columns × 1 (single-offset) + 2 columns × 3 (shifted) + 8 composition columns × 1 = 51 contributions.

See DEEP Quotients Deep Dive for the full derivation.

9. Merkle Verification

9.1 Leaf Hashing

Leaves are column values hashed as flat little-endian bytes:

leaf_hash = keccak256(col0_LE4 ‖ col1_LE4 ‖ ... ‖ colN_LE4)

For the trace tree: 46 columns × 4 bytes = 184-byte leaf. For the composition tree: 8 columns × 4 bytes = 32-byte leaf. For FRI trees: QM31 values as 16-byte leaves.

9.2 Bottom-Up Walk

The verifier performs a bottom-up Merkle tree walk:

Start with leaf hashes at queried positions
At each level, pair siblings:
- If both siblings are in the query set, use both directly
- Otherwise, fetch the missing sibling from the witness array
Compute parent: keccak256(left ‖ right) (32 + 32 = 64 bytes)
Advance to the next level
At the root, check against the committed root

The tree height equals the domain log size (e.g., 7 for the main trace tree at LIFTING_LOG_SIZE = 7).

10. Proof Serialization

10.1 Format

All values are little-endian, read sequentially with no padding or alignment:

[Commitments]
  n_trees: u32
  per tree: hash (32 bytes)

[Sampled Values]
  n_trees: u32
  per tree:
    n_cols: u32
    per col:
      n_samples: u32
      per sample: qm31 (16 bytes = 4 × m31_LE4)

[Decommitments]
  n_trees: u32
  per tree:
    n_hashes: u32
    per hash: 32 bytes

[Queried Values]
  n_trees: u32
  per tree:
    n_cols: u32
    per col:
      n_vals: u32
      per val: m31 (4 bytes LE)

[Proof of Work]
  nonce: u64 (8 bytes LE)

[FRI Proof]
  First layer:
    commitment: 32 bytes
    n_witness: u32, per witness: qm31 (16 bytes)
    n_decomm: u32, per decomm: 32 bytes
  Inner layers:
    n_layers: u32
    per layer: same as first layer
  Last layer:
    n_coeffs: u32, per coeff: qm31 (16 bytes)

10.2 Proof Size

For our parameters (3 queries, 46+8 columns, 6 FRI layers):

Section	Approximate Size
Commitments	100 bytes
Sampled values	800 bytes
Decommitments	~800 bytes
Queried values	~700 bytes
PoW nonce	8 bytes
FRI proof	~2,500 bytes
Total	~5 KB

11. Testing and Cross-Validation

11.1 Methodology

Verification was performed through incremental cross-validation: each component was tested independently against Rust-computed reference values before integration.

The Rust prover includes a comprehensive debug transcript test (test_serialize_debug_transcript) that outputs intermediate values at every verification step. These values were hardcoded into Solidity tests for bit-exact comparison.

11.2 Test Suite

72 tests across 6 test suites:

Suite	Tests	What It Covers
M31Lib	21	Field arithmetic: M31, CM31, QM31 add/sub/mul/inv, circle point operations
KeccakChannel	7	Mix/draw operations, PoW verification, rejection sampling
CircleDomain	8	Bit reversal, subgroup generators, circle/line domain points
MerkleVerifier	6	Leaf hashing (M31, QM31), tree verification
FriVerifier	4	Circle fold, line fold, last layer polynomial eval
CircleStarkVerifier	13	Phase-by-phase verification, OOD quotients, full proof
VerifierHarness	13	Integration: 7-phase debug harness, query context, quotient cross-validation

11.3 The Full Proof Test

The definitive test is test_verifyRealProof: a complete Stwo proof generated by the Rust prover is hardcoded as hex bytes and verified end-to-end in Solidity. This exercises every code path: parsing, Fiat-Shamir, Merkle verification, quotient computation, and FRI.

forge test --match-test test_verifyRealProof
[PASS] test_verifyRealProof() (gas: 17,183,996)

11.4 Cross-Validation Process

For each verification phase, we followed this process:

Add console.log statements to the Rust prover's verification path
Run the Rust verifier, capture intermediate values
Hardcode those values in Solidity tests
Compare Solidity outputs against Rust outputs
When mismatches occur, bisect to find the divergence point

This process uncovered the two critical bugs:

QM31 conjugation: The _conjQ function was negating imaginary parts of each CM31 independently instead of negating the entire second CM31 component
Periodicity samples: Missing the extra quotient contribution for columns with 2+ mask offsets

Both bugs produced outputs that were close to correct (most columns matched) but diverged at specific points, requiring careful intermediate-value comparison to isolate.

12. Gas Analysis

12.1 Breakdown

Total verification (withdrawal circuit, 46 columns): ~20M gas

The transfer circuit (57 columns) uses the same FRI parameters and trace dimensions, so verification gas is expected to be similar (~20M gas). The additional 11 trace columns increase OOD quotient computation slightly but FRI (the dominant cost) remains unchanged.

Phase	Estimated Gas	%
Proof parsing (Phases 1-3)	~2M	12%
PoW verification	~0.1M	<1%
Merkle decommitments (Phase 5)	~3M	17%
DEEP quotients (Phase 6)	~5M	29%
FRI verification (Phase 7)	~7M	41%

FRI dominates because it involves multiple rounds of:

Bit reversal and domain point computation (expensive circle point scalar multiplication)
QM31 folding operations
Merkle verification at each layer

12.2 Optimization Opportunities

The current implementation prioritizes correctness over gas efficiency. Potential optimizations:

Precomputed twiddle tables: Store domain points instead of computing via scalar multiplication
Assembly field arithmetic: Hand-optimized M31/QM31 operations in Yul
Batch Merkle verification: Combine multiple tree walks
EIP-4844 blob data: Move proof bytes to blob space (~1 gas/byte vs ~16 gas/byte calldata)
Larger FRI fold step: Fold 2+ layers at once (requires more witness data but fewer Merkle verifications)

With these optimizations, verification gas could potentially drop to 5–8M.

12.3 Comparison

System	Proof Type	Gas	PQ-Secure	Trusted Setup
Groth16 (BN254)	Pairing SNARK	~230K	No	Yes
PLONK (BN254)	Pairing SNARK	~300K	No	Universal
Circle STARK (ours)	Hash-based	~20M	Yes	No
Stone STARK	Hash-based	~5M	Yes	No

Our verifier is more expensive than pairing-based systems, but is the only option that provides both post-quantum security and no trusted setup. The gas premium is the cost of quantum resistance.

13. Security Analysis

13.1 Soundness

The STARK achieves conjectured ~100-bit soundness from:

FRI soundness: $\frac{n_{\text{queries}}}{|\mathbb{F}|} \approx \frac{3}{2^{31}}$ per query, but boosted by the extension field (QM31 has $~2^{124}$ elements)
PoW: 10 additional bits
DEEP quotient: out-of-domain sampling over QM31

13.2 Post-Quantum Security

The proof system relies solely on:

Collision resistance of Keccak-256 (~128 bits, quantum: ~85 bits via Grover)
Algebraic properties of M31/QM31 (no DLP assumption)

No known quantum algorithm provides a super-polynomial speedup against collision-resistant hash functions, making the system plausibly post-quantum secure.

13.3 Fiat-Shamir Security

The Keccak channel produces challenges that are computationally indistinguishable from random, assuming Keccak-256 behaves as a random oracle. The channel is non-adaptive: all challenges are derived deterministically from the proof transcript.

14. Conclusion

We have demonstrated that native Circle STARK verification on Ethereum L1 is feasible. The verifier correctly processes proofs from a modified Stwo prover, achieving end-to-end verification at ~20M gas. While more expensive than pairing-based alternatives, this is the only production-ready path to post-quantum zero-knowledge proofs on Ethereum today.

The verifier is parameterized via constructor immutables, supporting multiple circuit configurations — currently the withdrawal circuit (46 columns, 5 constraints) and the transfer circuit (57 columns, 6 constraints) — with zero runtime gas overhead. This design allows new circuits to be deployed as additional instances of the same verified contract code.

The implementation required solving several non-obvious challenges unique to circle-group STARKs:

QM31 complex conjugation semantics differ from naive element-wise conjugation
Periodicity samples in DEEP quotients consume additional alpha powers for shifted columns
Circle domain point generation uses counter-intuitive index arithmetic (half_odds(n) with n+2 initial index)
FRI line domains after repeated doubling require careful tracking of initial/step indices

For the broader context of why post-quantum security matters and how this verifier fits into UPP's migration from SNARKs, see From SNARKs to STARKs.

All source code is available in the upp-sdk/src/contracts/src/stark/ and upp-sdk/stwo-prover/ directories.

References

Haböck, U., Janeček, D., Nevo, A. (2024). Circle STARKs. IACR ePrint 2024/278.
StarkWare. Stwo Prover. https://github.com/starkware-libs/stwo
Ben-Sasson, E., et al. (2018). Scalable, transparent, and post-quantum secure computational integrity. IACR ePrint 2018/046.
Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M. (2018). Fast Reed-Solomon Interactive Oracle Proofs of Proximity. IACR ePrint 2017/134.

Circle STARK Verifier: Stwo to Solidity

On this page