Privacy

Circle STARK Verifier: Stwo to Solidity

A native Solidity verifier for Stwo Circle STARKs — the first post-quantum STARK verifier on Ethereum L1

Status

All 72 tests pass. The verifier is parameterized and supports multiple circuits: withdrawal (46 columns, ~20M gas) and transfer (57 columns). See also: From SNARKs to STARKs for the post-quantum migration context.

Abstract

We present a native Solidity implementation of a Circle STARK verifier compatible with the Stwo prover (v2.1.0). Circle STARKs operate over the Mersenne-31 prime field (p=2311p = 2^{31} - 1) using the circle group x2+y2=1(modp)x^2 + y^2 = 1 \pmod{p} as the evaluation domain, replacing the traditional multiplicative-group FFT with a circle-group analogue. Our verifier is the first complete Circle STARK verifier deployed as native EVM bytecode on Ethereum L1.

The system consists of two components:

  1. A modified Stwo prover (Rust) with a custom Keccak-256 Merkle channel replacing Blake2s, and a deterministic proof serialization format.
  2. A Solidity verifier (7 libraries, ~1,900 lines) implementing M31/CM31/QM31 field arithmetic, circle domain evaluation, Fiat-Shamir transcript replay, Merkle decommitment, DEEP quotient accumulation, and Circle FRI verification.

The verifier is parameterized via constructor immutables, supporting multiple circuit configurations with zero runtime gas overhead. Currently deployed for the UPP withdrawal circuit (46 trace columns, 5 constraints) and the transfer circuit (57 trace columns, 6 constraints), both using 64 rows (262^6), degree-2 constraint bound, and FRI parameters (blowup factor 2, 3 queries, 10-bit proof of work).

1. Introduction

1.1 Motivation

The Universal Private Pool (UPP) requires a zero-knowledge proof for every withdrawal: the user must demonstrate knowledge of a valid note in the pool's Merkle tree, conservation of balances, and (optionally) compliance with an Association Set Provider, without revealing which note is being spent.

Current production ZK systems (Groth16, PLONK) rely on elliptic-curve pairings over BN254 or BLS12-381. These are efficient today but are not post-quantum secure — Shor's algorithm on a sufficiently large quantum computer breaks the discrete logarithm problem underlying these curves.

STARKs (Scalable Transparent Arguments of Knowledge) rely only on hash functions for their security, making them plausibly quantum-resistant. Among STARK implementations, Stwo (StarkWare's next-generation prover) achieves remarkable performance by using:

  • The Mersenne-31 prime (23112^{31}-1), which admits extremely fast arithmetic via bit shifts
  • The circle group over M31, which provides a smooth subgroup structure of order 2312^{31} — enabling NTT-like transforms without requiring a large multiplicative subgroup

1.2 The Challenge

No native Solidity verifier for Stwo's Circle STARK format existed. StarkNet avoids this problem entirely by wrapping Stwo proofs through Stone (a different STARK system with an existing verifier). Building a native Circle STARK verifier in Solidity requires implementing:

ComponentDifficultyWhy
M31 field arithmetic + extensionsModerateThree-level tower: M31 → CM31 → QM31
Circle domain evaluationHardNon-standard domain structure unlike multiplicative groups
Circle FRI protocolHardTwin-coset folding, circle-to-line projection
DEEP quotient computationHardLine equations through QM31 conjugate pairs, periodicity samples
Fiat-Shamir transcriptModerateMust reproduce Stwo's channel operations exactly
Proof serializationModerateBit-exact encoding between Rust and Solidity

1.3 Key Design Decision: Keccak over Blake2s

Stwo's default channel uses Blake2s for both Merkle hashing and Fiat-Shamir. Ethereum provides no Blake2s precompile (EIP-152 gives Blake2b, not Blake2s). A pure-Solidity Blake2s would cost ~3,000–5,000 gas per compression — with 3 queries × multiple Merkle paths × FRI layers, this would add millions of gas.

Instead, we implemented a custom Keccak-256 channel in the Rust prover. Stwo's Channel and MerkleChannel traits are generic, allowing us to substitute Keccak-256 while preserving all soundness guarantees. The Solidity verifier then uses the native keccak256 opcode at ~30 gas per hash — a 100× reduction.

This is sound because the channel is purely a Fiat-Shamir construction: any collision-resistant hash function suffices. Keccak-256 provides 128-bit collision resistance, matching the security level of the underlying STARK.

1.4 Results

MetricValue
Solidity code~1,900 lines across 7 libraries
Supported circuitsWithdraw (46 cols, 5 constraints), Transfer (57 cols, 6 constraints)
Verification gas~20M (withdraw)
Deployment size19,576 bytes
Test coverage72 tests, all passing
Proof size~5 KB calldata
Rust prover time~2ms (release build)
Rust verifier time~97μs

2. Mathematical Background

2.1 The M31 Field Tower

The base field is the Mersenne prime Fp\mathbb{F}_p where p=2311=2,147,483,647p = 2^{31} - 1 = 2{,}147{,}483{,}647.

Mersenne reduction is remarkably simple: for a product x=a×bx = a \times b, we compute xmodpx \bmod p as (x31)+(x&p)(x \gg 31) + (x \mathbin{\&} p), with at most one subtraction of pp. This fits naturally in EVM uint32 arithmetic.

The extension tower:

FpFp2Fp4\mathbb{F}_p \hookrightarrow \mathbb{F}_{p^2} \hookrightarrow \mathbb{F}_{p^4}

CM31 (Complex M31): Fp2=Fp[i]/(i2+1)\mathbb{F}_{p^2} = \mathbb{F}_p[i] / (i^2 + 1). Elements are a+bia + bi with i2=1i^2 = -1. This is the standard Gaussian integer construction modulo pp. We pack CM31 as uint64: low 32 bits = real, high 32 bits = imaginary.

QM31 (Quartic M31): Fp4=Fp2[u]/(u2(2+i))\mathbb{F}_{p^4} = \mathbb{F}_{p^2}[u] / (u^2 - (2+i)). Elements are a+bua + bu where a,ba, b \in CM31 and u2=2+iu^2 = 2 + i. This is Stwo's SecureField, used for all random challenges. We pack QM31 as uint128: low 64 bits = first CM31, high 64 bits = second CM31.

The irreducible polynomial u2(2+i)u^2 - (2+i) is carefully chosen so that 2+i2+i is a non-square in CM31, guaranteeing the extension is irreducible.

2.2 The M31 Circle Group

The unit circle over Fp\mathbb{F}_p:

C={(x,y)Fp2:x2+y2=1}\mathcal{C} = \{(x, y) \in \mathbb{F}_p^2 : x^2 + y^2 = 1\}

forms a group under the operation:

(x1,y1)(x2,y2)=(x1x2y1y2,  x1y2+y1x2)(x_1, y_1) \circ (x_2, y_2) = (x_1 x_2 - y_1 y_2, \; x_1 y_2 + y_1 x_2)

This is isomorphic to complex multiplication on the unit circle: if we identify (x,y)(x, y) with x+iyx + iy, then the group operation is multiplication in Fp[i]\mathbb{F}_p[i] restricted to norm-1 elements.

For p=2311p = 2^{31} - 1, the circle group has order p+1=231p + 1 = 2^{31}, which is a perfect power of 2. This means the group contains subgroups of every order 2k2^k for k31k \leq 31 — exactly the smooth structure needed for FFT-like transforms.

The generator is G=(2,1268011823)G = (2, 1268011823), with G=231|G| = 2^{31}. The subgroup generator of order 2k2^k is:

Gk=G231kG_k = G^{2^{31-k}}

represented as a CirclePointIndex =231k= 2^{31-k} (the scalar exponent).

2.3 Circle Domains and Cosets

Stwo uses canonical cosets as evaluation domains. For a trace of 2n2^n rows, the evaluation domain is:

Dn=CanonicCoset(n).circle_domain()D_n = \text{CanonicCoset}(n).\text{circle\_domain}()

The circle domain is constructed from a half-coset:

half_odds(n-1) = Coset::new(initial: G_{n+1}, step: G_{n-1}, log_size: n-1)

Note the index arithmetic: half_odds(n) uses subgroupGen(n+2) as the initial point and subgroupGen(n) as the step — the "+2" offset is critical and a common source of bugs.

The full circle domain of size 2n2^n consists of:

  • First half (i<2n1i < 2^{n-1}): points from the half-coset
  • Second half (i2n1i \geq 2^{n-1}): their conjugates (negations on the circle: (x,y)(x,y)(x, y) \mapsto (x, -y))

Points are accessed in bit-reversed order for efficient FRI folding.

2.4 QM31 Complex Conjugation

Critical Implementation Detail

This was one of the two critical bugs found during the port. Getting this wrong produces correct-looking intermediate values that diverge subtly.

Stwo defines ComplexConjugate for QM31 as:

// In Stwo's source:
impl ComplexConjugate for QM31 {
    fn complex_conjugate(&self) -> Self {
        Self(self.0, -self.1)  // Negate entire second CM31
    }
}

This means for a QM31 element (a0,a1,a2,a3)(a_0, a_1, a_2, a_3) packed as ((a0+a1i)+(a2+a3i)u)((a_0 + a_1 i) + (a_2 + a_3 i) \cdot u):

q=(a0+a1i)+((a2+a3i))u=(a0,a1,a2,a3)\overline{q} = (a_0 + a_1 i) + (-(a_2 + a_3 i)) \cdot u = (a_0, a_1, -a_2, -a_3)

This is not element-wise CM31 conjugation (which would give (a0,a1,a2,a3)(a_0, -a_1, a_2, -a_3)). It negates the entire second CM31 component as a unit. This distinction is critical for the DEEP quotient line equations.

3. System Architecture

3.1 Overview

Circle STARK Architecture: Stwo Prover (Rust) generates proof bytes consumed by the Solidity Verifier (EVM)
End-to-end architecture: Rust prover modules produce serialized proof bytes, verified by 7 Solidity libraries on-chain

3.2 Proof Generation (Rust)

The Rust prover (upp-sdk/stwo-prover/) consists of:

ModuleLinesPurpose
withdraw/air.rs190Withdrawal AIR constraints (5 constraints, 46 columns)
withdraw/prover.rs~500Withdrawal trace generation, commitment, proof orchestration
withdraw/types.rs~60Withdrawal input/output types
transfer/air.rs~210Transfer AIR constraints (6 constraints, 57 columns)
transfer/prover.rs~465Transfer trace generation, commitment, proof orchestration
transfer/types.rs~130Transfer input/output types
keccak_channel.rs~470Custom Keccak-256 channel + Merkle hasher
serialize.rs~600Proof → flat bytes for Solidity

3.3 Verification (Solidity)

The Solidity verifier (upp-sdk/src/contracts/src/stark/) consists of:

LibraryLinesPurpose
M31Lib.sol311M31, CM31, QM31 arithmetic + circle point operations
KeccakChannel.sol213Fiat-Shamir channel: mix/draw, PoW verification
CircleDomain.sol188Circle domain point evaluation, bit reversal, FRI domains
MerkleVerifier.sol161Keccak-256 Merkle tree decommitment verification
FriVerifier.sol222FRI folding: circle→line, line→line, last layer
OodQuotients.sol221DEEP quotient accumulation with periodicity (parameterized multi-mask columns)
CircleStarkVerifier.sol497Main orchestrator: 7-phase verification flow (parameterized via constructor immutables)

3.3.1 Parameterized Verifier

The verifier is parameterized via Solidity immutable constructor arguments, allowing the same contract code to support different circuit configurations:

uint32 public immutable TRACE_WIDTH;        // 46 (withdraw) or 57 (transfer)
uint32 public immutable MULTI_MASK_COL_0;   // 25 (state Merkle chain)
uint32 public immutable MULTI_MASK_COL_1;   // 29 (ASP Merkle chain)
uint32 public immutable N_TRACE_OOD_VALUES; // TRACE_WIDTH - 2 + 4

The deploy script creates separate instances for each circuit type:

CircleStarkVerifier withdrawVerifier = new CircleStarkVerifier(46, 25, 29);
CircleStarkVerifier transferVerifier = new CircleStarkVerifier(57, 25, 29);

Since Solidity inlines immutable values into the deployed bytecode, there is zero runtime gas overhead compared to hardcoded constants. Each instance has its own bytecode with the parameters baked in at deploy time.

3.4 Verification Flow

The verify() function executes 7 sequential phases:

Phase 1: Parse commitments + replay Fiat-Shamir transcript
         → Extract 3 Merkle roots, OOD point, sampled values
         → Draw FRI random coefficient (α)

Phase 2: Parse decommitments + queried values + PoW nonce
         → Read Merkle witnesses, trace/composition column values

Phase 3: Parse FRI proof
         → Read layer commitments, witness evaluations, decommitments
         → Mix each FRI commitment into channel, draw fold alphas
         → Read last-layer polynomial coefficients

Phase 3.5: Verify proof of work
           → Check nonce produces sufficient trailing zeros

Phase 4: Draw query positions
         → Sample N_QUERIES positions from channel via rejection sampling
         → Sort positions (insertion sort, small array)

Phase 5: Verify Merkle decommitments
         → Reconstruct leaf hashes from queried column values
         → Walk Merkle tree bottom-up against committed roots

Phase 6: Compute DEEP quotients
         → Build OOD + shifted points on QM31 circle
         → Accumulate quotient contributions across all columns
         → Handle periodicity samples for shifted columns

Phase 7: Verify FRI
         → Circle-to-line fold (first layer)
         → Line-to-line fold (5 inner layers)
         → Polynomial evaluation check (last layer)

4. The Withdrawal Circuit

4.1 Trace Layout

The withdrawal circuit uses 46 columns across 64 rows (log2=6\log_2 = 6):

ColumnsNamePurpose
0–7owner_secret[0..7]Private spending key (8 × M31 limbs, 248 bits of entropy)
8input_amountNote amount being spent
9input_blindingNote randomness
10input_originNote origin address
11tokenERC-20 token address
12leaf_indexPosition in state Merkle tree
13–16owner_hash[0..3]Keccak hash of owner (4 × M31 limbs)
17–20commitment[0..3]Note commitment (4 × M31 limbs)
21–24nullifier[0..3]Spend nullifier (4 × M31 limbs)
25merkle_currentState Merkle chain: current hash
26merkle_siblingState Merkle chain: sibling hash
27merkle_indexState Merkle chain: path bit
28merkle_resultState Merkle chain: computed parent
29asp_currentASP Merkle chain: current hash
30asp_siblingASP Merkle chain: sibling hash
31asp_indexASP Merkle chain: path bit
32asp_resultASP Merkle chain: computed parent
33withdraw_amountAmount going to recipient
34change_amountAmount remaining as change note
35–38change_owner_hash[0..3]Change note owner
39change_blindingChange note randomness
40–43change_commitment[0..3]Change note commitment
44is_ragequitFlag: ragequit bypass
45recipientWithdrawal recipient address

4.2 Constraint System

Five algebraic constraints, all degree ≤ 2, enforced at every row:

C1. Balance conservation:

input_amountwithdraw_amountchange_amount=0\text{input\_amount} - \text{withdraw\_amount} - \text{change\_amount} = 0

C2. State Merkle chain continuity:

merkle_result[i]merkle_current[i+1]=0\text{merkle\_result}[i] - \text{merkle\_current}[i+1] = 0

C3. ASP Merkle chain continuity:

asp_result[i]asp_current[i+1]=0\text{asp\_result}[i] - \text{asp\_current}[i+1] = 0

C4. Ragequit origin check:

is_ragequit×(recipientinput_origin)=0\text{is\_ragequit} \times (\text{recipient} - \text{input\_origin}) = 0

C5. Binary constraint:

is_ragequit×(1is_ragequit)=0\text{is\_ragequit} \times (1 - \text{is\_ragequit}) = 0

Constraints C2 and C3 use mask offsets [0, 1] (reading both current and next row), which makes columns 25 and 29 "shifted" columns — a distinction that becomes critical during quotient computation.

4.3 Hash-and-Check Design

The circuit uses a hash-and-check pattern: Merkle paths and hash computations (Keccak) are precomputed outside the STARK trace and committed as witness values. The STARK constraints then verify the chain structure (result feeds into next level's current) rather than re-implementing hash functions inside the constraint system.

This is why 64 rows suffice for a 32-level Merkle tree: each row represents one level of the Merkle path, with the chain constraints (C2, C3) ensuring vertical consistency.

4A. The Transfer Circuit

The transfer circuit enables 1-input-2-output private transfers: spend one note and create two new notes that stay in the pool. No funds leave — amounts are hidden. Output notes can have different owners, enabling private payments to other users.

4A.1 Trace Layout

The transfer circuit uses 57 columns across 64 rows (log2=6\log_2 = 6):

ColumnsNamePurpose
0–32(same as withdrawal)Input note (8-limb owner secret, scalars, hashes), state Merkle, ASP Merkle
33output1_amountAmount sent to recipient
34–37output1_owner_hash[0..3]Recipient's owner hash (4 × M31 limbs)
38output1_blindingRecipient note randomness
39output1_originRecipient note origin address
40–43output1_commitment[0..3]Recipient note commitment (4 × M31 limbs)
44output2_amountChange amount back to sender
45–48output2_owner_hash[0..3]Sender's owner hash (change note)
49output2_blindingChange note randomness
50output2_originChange note origin address
51–54output2_commitment[0..3]Change note commitment (4 × M31 limbs)
55is_ragequitFlag: ragequit bypass
56reservedReserved (zero-padded)

Columns 0–32 are identical to the withdrawal circuit. The withdrawal-specific columns (33–45: withdraw_amount, change fields, recipient address) are replaced with two full output note structures.

4A.2 Constraint System

Six algebraic constraints, all degree \leq 2, enforced at every row:

C1. Balance conservation:

input_amountoutput1_amountoutput2_amount=0\text{input\_amount} - \text{output1\_amount} - \text{output2\_amount} = 0

C2. State Merkle chain continuity: (same as withdrawal)

merkle_result[i]merkle_current[i+1]=0\text{merkle\_result}[i] - \text{merkle\_current}[i+1] = 0

C3. ASP Merkle chain continuity: (same as withdrawal)

asp_result[i]asp_current[i+1]=0\text{asp\_result}[i] - \text{asp\_current}[i+1] = 0

C4. Origin check (output 1):

is_ragequit×(output1_origininput_origin)=0\text{is\_ragequit} \times (\text{output1\_origin} - \text{input\_origin}) = 0

C5. Origin check (output 2):

is_ragequit×(output2_origininput_origin)=0\text{is\_ragequit} \times (\text{output2\_origin} - \text{input\_origin}) = 0

C6. Binary constraint:

is_ragequit×(1is_ragequit)=0\text{is\_ragequit} \times (1 - \text{is\_ragequit}) = 0

The origin constraints (C4, C5) enforce that when ragequitting, both output notes must preserve the original depositor's origin address. This prevents ragequit transfers from laundering the origin. In normal mode (is_ragequit=0\text{is\_ragequit} = 0), output origins are unconstrained — the ASP allowlist provides the compliance guarantee instead.

Note that output owner hashes are not constrained by the circuit. They are baked into the output commitments (which are public inputs), but the circuit does not enforce that they match the sender. This is by design — it enables transfers to different recipients.

4A.3 Multi-Mask Columns

Same as the withdrawal circuit: columns 25 (state Merkle merkle_current) and 29 (ASP Merkle asp_current) use mask offsets [0, 1], making them "shifted" columns. Both circuits share these column indices, which are passed as constructor parameters to the verifier (MULTI_MASK_COL_0 = 25, MULTI_MASK_COL_1 = 29).

4A.4 Fiat-Shamir Seed

The transfer circuit binds public inputs via:

seed = keccak256(abi.encodePacked(
    nullifier,           // bytes32
    stateRoot,           // uint256
    aspRoot,             // uint256
    token,               // address (20 bytes)
    outputCommitment1,   // bytes32
    outputCommitment2    // bytes32
))

Unlike the withdrawal seed, there is no amount or recipient field — transfers don't reveal amounts and both outputs stay in the pool.

5. Field Arithmetic in Solidity

5.1 Bit-Packing Strategy

EVM words are 256 bits. We pack field elements to minimize memory operations:

TypePacked asLayout
M31uint32Single 31-bit value (max 23122^{31}-2)
CM31uint64Low 32: real, High 32: imaginary
QM31uint128Low 64: first CM31, High 64: second CM31

This allows two QM31 elements per EVM word and makes pack/unpack operations simple bit shifts.

5.2 M31 Arithmetic

Addition: (a+b)modp(a + b) \bmod p. Single addition + conditional subtraction.

Multiplication: Uses the Mersenne reduction identity. For p=2311p = 2^{31}-1 and product x=a×bx = a \times b:

xmodp=(x31)+(x&p)x \bmod p = (x \gg 31) + (x \mathbin{\&} p)

with at most one subtraction of pp. This avoids expensive modular reduction.

Inversion: Via Fermat's little theorem: a1=ap2modpa^{-1} = a^{p-2} \bmod p. Uses binary exponentiation (~31 multiplications).

5.3 CM31 Arithmetic

Standard complex multiplication:

(a0+a1i)(b0+b1i)=(a0b0a1b1)+(a0b1+a1b0)i(a_0 + a_1 i)(b_0 + b_1 i) = (a_0 b_0 - a_1 b_1) + (a_0 b_1 + a_1 b_0)i

Inversion uses the norm: 1a+bi=abia2+b2\frac{1}{a+bi} = \frac{a-bi}{a^2+b^2}

5.4 QM31 Arithmetic

Multiplication in Fp2[u]/(u2R)\mathbb{F}_{p^2}[u]/(u^2 - R) where R=2+iR = 2+i:

(a+bu)(c+du)=(ac+Rbd)+(ad+bc)u(a + bu)(c + du) = (ac + R \cdot bd) + (ad + bc)u

Inversion: 1a+bu=abua2Rb2\frac{1}{a+bu} = \frac{a-bu}{a^2 - R \cdot b^2} where a2Rb2a^2 - R \cdot b^2 is a CM31 element inverted via cm31Inv.

See the companion page Field Arithmetic Deep Dive for exhaustive test vectors and gas benchmarks.

6. Fiat-Shamir: The Keccak Channel

6.1 Channel State

The channel maintains:

  • digest: a 32-byte Keccak state (initially zero)
  • nDraws: counter for draw operations (reset on each mix)

6.2 Mix Operations

mixRoot(root): digest = keccak256(digest ‖ root). Used when the prover commits a Merkle root.

mixFelts(felts): digest = keccak256(digest ‖ felts[0]_LE ‖ felts[1]_LE ‖ ...). Each QM31 is encoded as 4 × M31, each M31 as 4 little-endian bytes.

mixU64(value): Used for the PoW nonce.

6.3 Draw Operations

drawU32s(): Produces 8 pseudorandom uint32 values:

hash = keccak256(digest ‖ nDraws_LE ‖ 0x00)
result = parse hash as 8 × LE uint32
nDraws += 1

drawSecureFelt(): Draws 8 uint32s, checks all are in [0,2p)[0, 2p) (rejection sampling with ~2282^{-28} retry probability), reduces to M31, packs first 4 as QM31.

6.4 Proof of Work

Stwo includes a proof-of-work check to increase soundness:

prefixedDigest = keccak256(0x12345678_LE ‖ [0; 12] ‖ digest ‖ nBits_LE)
result = keccak256(prefixedDigest ‖ nonce_LE)
check: trailing_zeros(result_as_LE_u128) ≥ nBits

Our circuit uses nBits = 10, adding ~10 bits of soundness.

7. Circle FRI Protocol

FRI (Fast Reed-Solomon Interactive Oracle Proof) is the core of STARK verification. Circle FRI adapts the standard FRI protocol to circle-group domains.

7.1 Overview

The FRI protocol proves that a committed polynomial has low degree by repeatedly folding it:

  1. First layer (Circle → Line): Fold the circle domain polynomial to a line domain
  2. Inner layers (Line → Line): Standard FRI folding on line domains
  3. Last layer: Check that the final polynomial equals the committed constant

For our parameters:

  • Input: degree <27< 2^7 polynomial on circle domain of size 272^7 (LIFTING_LOG_SIZE = 7)
  • After circle fold: degree <26< 2^6 on line domain of size 262^6
  • After 5 inner folds: degree <21< 2^1 on line domain of size 212^1
  • Last layer: verify as degree-1 polynomial

7.2 Circle-to-Line Fold

Given evaluations f(P)f(P) and f(Pˉ)f(\bar{P}) at conjugate pairs (where Pˉ=(x,y)\bar{P} = (x, -y)):

ffolded(x)=f(P)+f(Pˉ)2+αf(P)f(Pˉ)2yf_{\text{folded}}(x) = \frac{f(P) + f(\bar{P})}{2} + \alpha \cdot \frac{f(P) - f(\bar{P})}{2y}

In our implementation, the denominator 12\frac{1}{2} is absorbed into the twiddle factor, and we compute:

f0 = fP + fNegP;
f1 = (fP - fNegP) * twiddleInv;  // twiddleInv = 1/y
folded = f0 + alpha * f1;

The twiddle factor is 1y\frac{1}{y} where yy is the y-coordinate of the circle domain point (in bit-reversed order).

7.3 Line-to-Line Fold

Standard FRI folding on univariate polynomials over line domains:

ffolded(x2)=f(x)+f(x)2+αf(x)f(x)2xf_{\text{folded}}(x^2) = \frac{f(x) + f(-x)}{2} + \alpha \cdot \frac{f(x) - f(-x)}{2x}

The twiddle factor is 1x\frac{1}{x} where xx is the line domain point.

7.4 FRI Domain Construction

Each FRI layer operates on a progressively smaller domain:

LayerTypeDomainLog Size
FirstCircle → LineCanonicCoset(7).circle_domain()7
Inner 0Line → Linehalf_odds(6)6
Inner 1Line → Linehalf_odds(6).double()5
Inner 2Line → Linehalf_odds(6).double().double()4
Inner 3Line → Line...3
Inner 4Line → Line...2
LastPolynomial eval...1

The line domain half_odds(n) after kk doublings has:

  • initial = subgroupGen(n + 2 - k)
  • step = subgroupGen(n - k)
  • log_size = n - k

7.5 Pair Reconstruction

When the verifier has query evaluations at some positions but not their pairs, it fills in missing siblings from the FRI witness. Three cases:

  1. Both siblings queried: Both f(P)f(P) and f(Pˉ)f(\bar{P}) are in the query set — use directly
  2. Only even position queried: Odd sibling comes from witness
  3. Only odd position queried: Even sibling comes from witness

After folding each pair, the verifier checks the Merkle decommitment of both siblings against the layer's committed root.

See FRI Protocol Deep Dive for worked examples.

8. DEEP Quotient Computation

The DEEP (Domain Extension for Eliminating Pretenders) technique is where the verifier checks that the committed polynomial agrees with the out-of-domain (OOD) evaluations. This is the most mathematically intricate part of the verifier.

8.1 OOD Point Generation

The verifier draws a random tt \in QM31 from the Fiat-Shamir channel and maps it to a point on the QM31 circle via stereographic projection:

x=1t21+t2,y=2t1+t2x = \frac{1 - t^2}{1 + t^2}, \quad y = \frac{2t}{1 + t^2}

This gives the OOD evaluation point (xood,yood)C(Fp4)(x_{\text{ood}}, y_{\text{ood}}) \in \mathcal{C}(\mathbb{F}_{p^4}).

The shifted point is Pshifted=PoodGtraceP_{\text{shifted}} = P_{\text{ood}} \cdot G_{\text{trace}}, where GtraceG_{\text{trace}} is the subgroup generator of order 2LOG_TRACE_SIZE2^{\text{LOG\_TRACE\_SIZE}}.

8.2 Quotient Formula

For each column sample (column value at the query point, OOD evaluation, evaluation point), the verifier computes a quotient contribution using line equations through conjugate pairs.

Given:

  • vv = OOD sample value (QM31)
  • PsP_s = sample point (QM31 circle point)
  • f(q)f(q) = queried trace value at query point q=(qx,qy)q = (q_x, q_y) \in M31

The line through (Ps,v)(P_s, v) and (Psˉ,vˉ)(\bar{P_s}, \bar{v}) is:

(x,y)=ay+b\ell(x, y) = a \cdot y + b

where:

a=vˉv,c=ysˉys,b=vcaysa = \bar{v} - v, \quad c = \bar{y_s} - y_s, \quad b = v \cdot c - a \cdot y_s

The quotient contribution is:

Q=αkcf(q)(aqy+b)cqx(ays+b)(line denominator)Q = \alpha^k \cdot \frac{c \cdot f(q) - (a \cdot q_y + b)}{c \cdot q_x - (a \cdot y_s + b) \cdot \text{(line denominator)}}

where α\alpha is the random coefficient and kk is the accumulation index.

8.3 The Line Denominator

The denominator for the line equation, evaluated at the query point:

denom=(Ps,xreqx)Ps,yim(Ps,yreqy)Ps,xim\text{denom} = (P_{s,x}^{\text{re}} - q_x) \cdot P_{s,y}^{\text{im}} - (P_{s,y}^{\text{re}} - q_y) \cdot P_{s,x}^{\text{im}}

where the superscripts denote the CM31 real and imaginary parts of the QM31 coordinates. This is a CM31 value, inverted once per query and reused across columns sharing the same evaluation point.

8.4 Periodicity Samples

Critical Implementation Detail

This was the second critical bug. Missing periodicity samples shifts all subsequent alpha powers, corrupting every quotient after the first shifted column.

Stwo's build_samples_with_randomness_and_periodicity adds an extra quotient contribution for every column with 2 or more mask offsets (i.e., columns that read both current and next rows).

In our circuit, columns 25 (merkle_current) and 29 (asp_current) have mask [0, 1]. For these columns, instead of the normal 1 or 2 contributions, Stwo emits 3 contributions per column:

  1. Periodicity sample — evaluates at the shifted point using the offset-1 OOD value
  2. Offset-0 sample — evaluates at the OOD point using the offset-0 OOD value
  3. Offset-1 sample — evaluates at the shifted point using the offset-1 OOD value

Each contribution consumes one power of α\alpha.

For our specific parameters, the periodicity point coincides with the shifted point. This is because:

  • period_generator = CanonicCoset(lifting_log_size).step().repeated_double(column_extended_log_size)
  • With column_extended_log_size = lifting_log_size = 7:
    • step = G_7 (subgroup generator of order 272^7)
    • After 7 doublings: `G_7^5 = G_7^128,but, but |G_7| = 128$, so this is the identity
  • Therefore periodicity_point = shifted_point + identity = shifted_point

The implementation:

if (col == 25 || col == 29) {
    // 1. Periodicity sample (at shifted point, offset-1 value)
    uint128 shiftedVal = p.traceOodValues[s.tOodIdx + 1]; // peek ahead
    s.acc = _addQ(s.acc, s.alphaPow, qVal, shiftedVal,
                  pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);

    // 2. Offset-0 sample (at OOD point)
    s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
                  pts.oodsY, ctx.qy, ctx.denomOodsInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);

    // 3. Offset-1 sample (at shifted point)
    s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
                  pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
    s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);
}

The total number of alpha powers consumed is: 37 columns × 1 (single-offset) + 2 columns × 3 (shifted) + 8 composition columns × 1 = 51 contributions.

See DEEP Quotients Deep Dive for the full derivation.

9. Merkle Verification

9.1 Leaf Hashing

Leaves are column values hashed as flat little-endian bytes:

leaf_hash = keccak256(col0_LE4 ‖ col1_LE4 ‖ ... ‖ colN_LE4)

For the trace tree: 46 columns × 4 bytes = 184-byte leaf. For the composition tree: 8 columns × 4 bytes = 32-byte leaf. For FRI trees: QM31 values as 16-byte leaves.

9.2 Bottom-Up Walk

The verifier performs a bottom-up Merkle tree walk:

  1. Start with leaf hashes at queried positions
  2. At each level, pair siblings:
    • If both siblings are in the query set, use both directly
    • Otherwise, fetch the missing sibling from the witness array
  3. Compute parent: keccak256(left ‖ right) (32 + 32 = 64 bytes)
  4. Advance to the next level
  5. At the root, check against the committed root

The tree height equals the domain log size (e.g., 7 for the main trace tree at LIFTING_LOG_SIZE = 7).

10. Proof Serialization

10.1 Format

All values are little-endian, read sequentially with no padding or alignment:

[Commitments]
  n_trees: u32
  per tree: hash (32 bytes)

[Sampled Values]
  n_trees: u32
  per tree:
    n_cols: u32
    per col:
      n_samples: u32
      per sample: qm31 (16 bytes = 4 × m31_LE4)

[Decommitments]
  n_trees: u32
  per tree:
    n_hashes: u32
    per hash: 32 bytes

[Queried Values]
  n_trees: u32
  per tree:
    n_cols: u32
    per col:
      n_vals: u32
      per val: m31 (4 bytes LE)

[Proof of Work]
  nonce: u64 (8 bytes LE)

[FRI Proof]
  First layer:
    commitment: 32 bytes
    n_witness: u32, per witness: qm31 (16 bytes)
    n_decomm: u32, per decomm: 32 bytes
  Inner layers:
    n_layers: u32
    per layer: same as first layer
  Last layer:
    n_coeffs: u32, per coeff: qm31 (16 bytes)

10.2 Proof Size

For our parameters (3 queries, 46+8 columns, 6 FRI layers):

SectionApproximate Size
Commitments100 bytes
Sampled values800 bytes
Decommitments~800 bytes
Queried values~700 bytes
PoW nonce8 bytes
FRI proof~2,500 bytes
Total~5 KB

11. Testing and Cross-Validation

11.1 Methodology

Verification was performed through incremental cross-validation: each component was tested independently against Rust-computed reference values before integration.

The Rust prover includes a comprehensive debug transcript test (test_serialize_debug_transcript) that outputs intermediate values at every verification step. These values were hardcoded into Solidity tests for bit-exact comparison.

11.2 Test Suite

72 tests across 6 test suites:

SuiteTestsWhat It Covers
M31Lib21Field arithmetic: M31, CM31, QM31 add/sub/mul/inv, circle point operations
KeccakChannel7Mix/draw operations, PoW verification, rejection sampling
CircleDomain8Bit reversal, subgroup generators, circle/line domain points
MerkleVerifier6Leaf hashing (M31, QM31), tree verification
FriVerifier4Circle fold, line fold, last layer polynomial eval
CircleStarkVerifier13Phase-by-phase verification, OOD quotients, full proof
VerifierHarness13Integration: 7-phase debug harness, query context, quotient cross-validation

11.3 The Full Proof Test

The definitive test is test_verifyRealProof: a complete Stwo proof generated by the Rust prover is hardcoded as hex bytes and verified end-to-end in Solidity. This exercises every code path: parsing, Fiat-Shamir, Merkle verification, quotient computation, and FRI.

forge test --match-test test_verifyRealProof
[PASS] test_verifyRealProof() (gas: 17,183,996)

11.4 Cross-Validation Process

For each verification phase, we followed this process:

  1. Add console.log statements to the Rust prover's verification path
  2. Run the Rust verifier, capture intermediate values
  3. Hardcode those values in Solidity tests
  4. Compare Solidity outputs against Rust outputs
  5. When mismatches occur, bisect to find the divergence point

This process uncovered the two critical bugs:

  • QM31 conjugation: The _conjQ function was negating imaginary parts of each CM31 independently instead of negating the entire second CM31 component
  • Periodicity samples: Missing the extra quotient contribution for columns with 2+ mask offsets

Both bugs produced outputs that were close to correct (most columns matched) but diverged at specific points, requiring careful intermediate-value comparison to isolate.

12. Gas Analysis

12.1 Breakdown

Total verification (withdrawal circuit, 46 columns): ~20M gas

The transfer circuit (57 columns) uses the same FRI parameters and trace dimensions, so verification gas is expected to be similar (~20M gas). The additional 11 trace columns increase OOD quotient computation slightly but FRI (the dominant cost) remains unchanged.

PhaseEstimated Gas%
Proof parsing (Phases 1-3)~2M12%
PoW verification~0.1M<1%
Merkle decommitments (Phase 5)~3M17%
DEEP quotients (Phase 6)~5M29%
FRI verification (Phase 7)~7M41%

FRI dominates because it involves multiple rounds of:

  • Bit reversal and domain point computation (expensive circle point scalar multiplication)
  • QM31 folding operations
  • Merkle verification at each layer

12.2 Optimization Opportunities

The current implementation prioritizes correctness over gas efficiency. Potential optimizations:

  1. Precomputed twiddle tables: Store domain points instead of computing via scalar multiplication
  2. Assembly field arithmetic: Hand-optimized M31/QM31 operations in Yul
  3. Batch Merkle verification: Combine multiple tree walks
  4. EIP-4844 blob data: Move proof bytes to blob space (~1 gas/byte vs ~16 gas/byte calldata)
  5. Larger FRI fold step: Fold 2+ layers at once (requires more witness data but fewer Merkle verifications)

With these optimizations, verification gas could potentially drop to 5–8M.

12.3 Comparison

SystemProof TypeGasPQ-SecureTrusted Setup
Groth16 (BN254)Pairing SNARK~230KNoYes
PLONK (BN254)Pairing SNARK~300KNoUniversal
Circle STARK (ours)Hash-based~20MYesNo
Stone STARKHash-based~5MYesNo

Our verifier is more expensive than pairing-based systems, but is the only option that provides both post-quantum security and no trusted setup. The gas premium is the cost of quantum resistance.

13. Security Analysis

13.1 Soundness

The STARK achieves conjectured ~100-bit soundness from:

  • FRI soundness: nqueriesF3231\frac{n_{\text{queries}}}{|\mathbb{F}|} \approx \frac{3}{2^{31}} per query, but boosted by the extension field (QM31 has  2124~2^{124} elements)
  • PoW: 10 additional bits
  • DEEP quotient: out-of-domain sampling over QM31

13.2 Post-Quantum Security

The proof system relies solely on:

  • Collision resistance of Keccak-256 (~128 bits, quantum: ~85 bits via Grover)
  • Algebraic properties of M31/QM31 (no DLP assumption)

No known quantum algorithm provides a super-polynomial speedup against collision-resistant hash functions, making the system plausibly post-quantum secure.

13.3 Fiat-Shamir Security

The Keccak channel produces challenges that are computationally indistinguishable from random, assuming Keccak-256 behaves as a random oracle. The channel is non-adaptive: all challenges are derived deterministically from the proof transcript.

14. Conclusion

We have demonstrated that native Circle STARK verification on Ethereum L1 is feasible. The verifier correctly processes proofs from a modified Stwo prover, achieving end-to-end verification at ~20M gas. While more expensive than pairing-based alternatives, this is the only production-ready path to post-quantum zero-knowledge proofs on Ethereum today.

The verifier is parameterized via constructor immutables, supporting multiple circuit configurations — currently the withdrawal circuit (46 columns, 5 constraints) and the transfer circuit (57 columns, 6 constraints) — with zero runtime gas overhead. This design allows new circuits to be deployed as additional instances of the same verified contract code.

The implementation required solving several non-obvious challenges unique to circle-group STARKs:

  • QM31 complex conjugation semantics differ from naive element-wise conjugation
  • Periodicity samples in DEEP quotients consume additional alpha powers for shifted columns
  • Circle domain point generation uses counter-intuitive index arithmetic (half_odds(n) with n+2 initial index)
  • FRI line domains after repeated doubling require careful tracking of initial/step indices

For the broader context of why post-quantum security matters and how this verifier fits into UPP's migration from SNARKs, see From SNARKs to STARKs.

All source code is available in the upp-sdk/src/contracts/src/stark/ and upp-sdk/stwo-prover/ directories.

References

  1. Haböck, U., Janeček, D., Nevo, A. (2024). Circle STARKs. IACR ePrint 2024/278.
  2. StarkWare. Stwo Prover. https://github.com/starkware-libs/stwo
  3. Ben-Sasson, E., et al. (2018). Scalable, transparent, and post-quantum secure computational integrity. IACR ePrint 2018/046.
  4. Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M. (2018). Fast Reed-Solomon Interactive Oracle Proofs of Proximity. IACR ePrint 2017/134.

On this page