Circle STARK Verifier: Stwo to Solidity
A native Solidity verifier for Stwo Circle STARKs — the first post-quantum STARK verifier on Ethereum L1
Status
All 72 tests pass. The verifier is parameterized and supports multiple circuits: withdrawal (46 columns, ~20M gas) and transfer (57 columns). See also: From SNARKs to STARKs for the post-quantum migration context.
Abstract
We present a native Solidity implementation of a Circle STARK verifier compatible with the Stwo prover (v2.1.0). Circle STARKs operate over the Mersenne-31 prime field () using the circle group as the evaluation domain, replacing the traditional multiplicative-group FFT with a circle-group analogue. Our verifier is the first complete Circle STARK verifier deployed as native EVM bytecode on Ethereum L1.
The system consists of two components:
- A modified Stwo prover (Rust) with a custom Keccak-256 Merkle channel replacing Blake2s, and a deterministic proof serialization format.
- A Solidity verifier (7 libraries, ~1,900 lines) implementing M31/CM31/QM31 field arithmetic, circle domain evaluation, Fiat-Shamir transcript replay, Merkle decommitment, DEEP quotient accumulation, and Circle FRI verification.
The verifier is parameterized via constructor immutables, supporting multiple circuit configurations with zero runtime gas overhead. Currently deployed for the UPP withdrawal circuit (46 trace columns, 5 constraints) and the transfer circuit (57 trace columns, 6 constraints), both using 64 rows (), degree-2 constraint bound, and FRI parameters (blowup factor 2, 3 queries, 10-bit proof of work).
1. Introduction
1.1 Motivation
The Universal Private Pool (UPP) requires a zero-knowledge proof for every withdrawal: the user must demonstrate knowledge of a valid note in the pool's Merkle tree, conservation of balances, and (optionally) compliance with an Association Set Provider, without revealing which note is being spent.
Current production ZK systems (Groth16, PLONK) rely on elliptic-curve pairings over BN254 or BLS12-381. These are efficient today but are not post-quantum secure — Shor's algorithm on a sufficiently large quantum computer breaks the discrete logarithm problem underlying these curves.
STARKs (Scalable Transparent Arguments of Knowledge) rely only on hash functions for their security, making them plausibly quantum-resistant. Among STARK implementations, Stwo (StarkWare's next-generation prover) achieves remarkable performance by using:
- The Mersenne-31 prime (), which admits extremely fast arithmetic via bit shifts
- The circle group over M31, which provides a smooth subgroup structure of order — enabling NTT-like transforms without requiring a large multiplicative subgroup
1.2 The Challenge
No native Solidity verifier for Stwo's Circle STARK format existed. StarkNet avoids this problem entirely by wrapping Stwo proofs through Stone (a different STARK system with an existing verifier). Building a native Circle STARK verifier in Solidity requires implementing:
| Component | Difficulty | Why |
|---|---|---|
| M31 field arithmetic + extensions | Moderate | Three-level tower: M31 → CM31 → QM31 |
| Circle domain evaluation | Hard | Non-standard domain structure unlike multiplicative groups |
| Circle FRI protocol | Hard | Twin-coset folding, circle-to-line projection |
| DEEP quotient computation | Hard | Line equations through QM31 conjugate pairs, periodicity samples |
| Fiat-Shamir transcript | Moderate | Must reproduce Stwo's channel operations exactly |
| Proof serialization | Moderate | Bit-exact encoding between Rust and Solidity |
1.3 Key Design Decision: Keccak over Blake2s
Stwo's default channel uses Blake2s for both Merkle hashing and Fiat-Shamir. Ethereum provides no Blake2s precompile (EIP-152 gives Blake2b, not Blake2s). A pure-Solidity Blake2s would cost ~3,000–5,000 gas per compression — with 3 queries × multiple Merkle paths × FRI layers, this would add millions of gas.
Instead, we implemented a custom Keccak-256 channel in the Rust prover. Stwo's Channel and MerkleChannel traits are generic, allowing us to substitute Keccak-256 while preserving all soundness guarantees. The Solidity verifier then uses the native keccak256 opcode at ~30 gas per hash — a 100× reduction.
This is sound because the channel is purely a Fiat-Shamir construction: any collision-resistant hash function suffices. Keccak-256 provides 128-bit collision resistance, matching the security level of the underlying STARK.
1.4 Results
| Metric | Value |
|---|---|
| Solidity code | ~1,900 lines across 7 libraries |
| Supported circuits | Withdraw (46 cols, 5 constraints), Transfer (57 cols, 6 constraints) |
| Verification gas | ~20M (withdraw) |
| Deployment size | 19,576 bytes |
| Test coverage | 72 tests, all passing |
| Proof size | ~5 KB calldata |
| Rust prover time | ~2ms (release build) |
| Rust verifier time | ~97μs |
2. Mathematical Background
2.1 The M31 Field Tower
The base field is the Mersenne prime where .
Mersenne reduction is remarkably simple: for a product , we compute as , with at most one subtraction of . This fits naturally in EVM uint32 arithmetic.
The extension tower:
CM31 (Complex M31): . Elements are with . This is the standard Gaussian integer construction modulo . We pack CM31 as uint64: low 32 bits = real, high 32 bits = imaginary.
QM31 (Quartic M31): . Elements are where CM31 and . This is Stwo's SecureField, used for all random challenges. We pack QM31 as uint128: low 64 bits = first CM31, high 64 bits = second CM31.
The irreducible polynomial is carefully chosen so that is a non-square in CM31, guaranteeing the extension is irreducible.
2.2 The M31 Circle Group
The unit circle over :
forms a group under the operation:
This is isomorphic to complex multiplication on the unit circle: if we identify with , then the group operation is multiplication in restricted to norm-1 elements.
For , the circle group has order , which is a perfect power of 2. This means the group contains subgroups of every order for — exactly the smooth structure needed for FFT-like transforms.
The generator is , with . The subgroup generator of order is:
represented as a CirclePointIndex (the scalar exponent).
2.3 Circle Domains and Cosets
Stwo uses canonical cosets as evaluation domains. For a trace of rows, the evaluation domain is:
The circle domain is constructed from a half-coset:
half_odds(n-1) = Coset::new(initial: G_{n+1}, step: G_{n-1}, log_size: n-1)Note the index arithmetic: half_odds(n) uses subgroupGen(n+2) as the initial point and subgroupGen(n) as the step — the "+2" offset is critical and a common source of bugs.
The full circle domain of size consists of:
- First half (): points from the half-coset
- Second half (): their conjugates (negations on the circle: )
Points are accessed in bit-reversed order for efficient FRI folding.
2.4 QM31 Complex Conjugation
Critical Implementation Detail
This was one of the two critical bugs found during the port. Getting this wrong produces correct-looking intermediate values that diverge subtly.
Stwo defines ComplexConjugate for QM31 as:
// In Stwo's source:
impl ComplexConjugate for QM31 {
fn complex_conjugate(&self) -> Self {
Self(self.0, -self.1) // Negate entire second CM31
}
}This means for a QM31 element packed as :
This is not element-wise CM31 conjugation (which would give ). It negates the entire second CM31 component as a unit. This distinction is critical for the DEEP quotient line equations.
3. System Architecture
3.1 Overview

3.2 Proof Generation (Rust)
The Rust prover (upp-sdk/stwo-prover/) consists of:
| Module | Lines | Purpose |
|---|---|---|
withdraw/air.rs | 190 | Withdrawal AIR constraints (5 constraints, 46 columns) |
withdraw/prover.rs | ~500 | Withdrawal trace generation, commitment, proof orchestration |
withdraw/types.rs | ~60 | Withdrawal input/output types |
transfer/air.rs | ~210 | Transfer AIR constraints (6 constraints, 57 columns) |
transfer/prover.rs | ~465 | Transfer trace generation, commitment, proof orchestration |
transfer/types.rs | ~130 | Transfer input/output types |
keccak_channel.rs | ~470 | Custom Keccak-256 channel + Merkle hasher |
serialize.rs | ~600 | Proof → flat bytes for Solidity |
3.3 Verification (Solidity)
The Solidity verifier (upp-sdk/src/contracts/src/stark/) consists of:
| Library | Lines | Purpose |
|---|---|---|
M31Lib.sol | 311 | M31, CM31, QM31 arithmetic + circle point operations |
KeccakChannel.sol | 213 | Fiat-Shamir channel: mix/draw, PoW verification |
CircleDomain.sol | 188 | Circle domain point evaluation, bit reversal, FRI domains |
MerkleVerifier.sol | 161 | Keccak-256 Merkle tree decommitment verification |
FriVerifier.sol | 222 | FRI folding: circle→line, line→line, last layer |
OodQuotients.sol | 221 | DEEP quotient accumulation with periodicity (parameterized multi-mask columns) |
CircleStarkVerifier.sol | 497 | Main orchestrator: 7-phase verification flow (parameterized via constructor immutables) |
3.3.1 Parameterized Verifier
The verifier is parameterized via Solidity immutable constructor arguments, allowing the same contract code to support different circuit configurations:
uint32 public immutable TRACE_WIDTH; // 46 (withdraw) or 57 (transfer)
uint32 public immutable MULTI_MASK_COL_0; // 25 (state Merkle chain)
uint32 public immutable MULTI_MASK_COL_1; // 29 (ASP Merkle chain)
uint32 public immutable N_TRACE_OOD_VALUES; // TRACE_WIDTH - 2 + 4The deploy script creates separate instances for each circuit type:
CircleStarkVerifier withdrawVerifier = new CircleStarkVerifier(46, 25, 29);
CircleStarkVerifier transferVerifier = new CircleStarkVerifier(57, 25, 29);Since Solidity inlines immutable values into the deployed bytecode, there is zero runtime gas overhead compared to hardcoded constants. Each instance has its own bytecode with the parameters baked in at deploy time.
3.4 Verification Flow
The verify() function executes 7 sequential phases:
Phase 1: Parse commitments + replay Fiat-Shamir transcript
→ Extract 3 Merkle roots, OOD point, sampled values
→ Draw FRI random coefficient (α)
Phase 2: Parse decommitments + queried values + PoW nonce
→ Read Merkle witnesses, trace/composition column values
Phase 3: Parse FRI proof
→ Read layer commitments, witness evaluations, decommitments
→ Mix each FRI commitment into channel, draw fold alphas
→ Read last-layer polynomial coefficients
Phase 3.5: Verify proof of work
→ Check nonce produces sufficient trailing zeros
Phase 4: Draw query positions
→ Sample N_QUERIES positions from channel via rejection sampling
→ Sort positions (insertion sort, small array)
Phase 5: Verify Merkle decommitments
→ Reconstruct leaf hashes from queried column values
→ Walk Merkle tree bottom-up against committed roots
Phase 6: Compute DEEP quotients
→ Build OOD + shifted points on QM31 circle
→ Accumulate quotient contributions across all columns
→ Handle periodicity samples for shifted columns
Phase 7: Verify FRI
→ Circle-to-line fold (first layer)
→ Line-to-line fold (5 inner layers)
→ Polynomial evaluation check (last layer)4. The Withdrawal Circuit
4.1 Trace Layout
The withdrawal circuit uses 46 columns across 64 rows ():
| Columns | Name | Purpose |
|---|---|---|
| 0–7 | owner_secret[0..7] | Private spending key (8 × M31 limbs, 248 bits of entropy) |
| 8 | input_amount | Note amount being spent |
| 9 | input_blinding | Note randomness |
| 10 | input_origin | Note origin address |
| 11 | token | ERC-20 token address |
| 12 | leaf_index | Position in state Merkle tree |
| 13–16 | owner_hash[0..3] | Keccak hash of owner (4 × M31 limbs) |
| 17–20 | commitment[0..3] | Note commitment (4 × M31 limbs) |
| 21–24 | nullifier[0..3] | Spend nullifier (4 × M31 limbs) |
| 25 | merkle_current | State Merkle chain: current hash |
| 26 | merkle_sibling | State Merkle chain: sibling hash |
| 27 | merkle_index | State Merkle chain: path bit |
| 28 | merkle_result | State Merkle chain: computed parent |
| 29 | asp_current | ASP Merkle chain: current hash |
| 30 | asp_sibling | ASP Merkle chain: sibling hash |
| 31 | asp_index | ASP Merkle chain: path bit |
| 32 | asp_result | ASP Merkle chain: computed parent |
| 33 | withdraw_amount | Amount going to recipient |
| 34 | change_amount | Amount remaining as change note |
| 35–38 | change_owner_hash[0..3] | Change note owner |
| 39 | change_blinding | Change note randomness |
| 40–43 | change_commitment[0..3] | Change note commitment |
| 44 | is_ragequit | Flag: ragequit bypass |
| 45 | recipient | Withdrawal recipient address |
4.2 Constraint System
Five algebraic constraints, all degree ≤ 2, enforced at every row:
C1. Balance conservation:
C2. State Merkle chain continuity:
C3. ASP Merkle chain continuity:
C4. Ragequit origin check:
C5. Binary constraint:
Constraints C2 and C3 use mask offsets [0, 1] (reading both current and next row), which makes columns 25 and 29 "shifted" columns — a distinction that becomes critical during quotient computation.
4.3 Hash-and-Check Design
The circuit uses a hash-and-check pattern: Merkle paths and hash computations (Keccak) are precomputed outside the STARK trace and committed as witness values. The STARK constraints then verify the chain structure (result feeds into next level's current) rather than re-implementing hash functions inside the constraint system.
This is why 64 rows suffice for a 32-level Merkle tree: each row represents one level of the Merkle path, with the chain constraints (C2, C3) ensuring vertical consistency.
4A. The Transfer Circuit
The transfer circuit enables 1-input-2-output private transfers: spend one note and create two new notes that stay in the pool. No funds leave — amounts are hidden. Output notes can have different owners, enabling private payments to other users.
4A.1 Trace Layout
The transfer circuit uses 57 columns across 64 rows ():
| Columns | Name | Purpose |
|---|---|---|
| 0–32 | (same as withdrawal) | Input note (8-limb owner secret, scalars, hashes), state Merkle, ASP Merkle |
| 33 | output1_amount | Amount sent to recipient |
| 34–37 | output1_owner_hash[0..3] | Recipient's owner hash (4 × M31 limbs) |
| 38 | output1_blinding | Recipient note randomness |
| 39 | output1_origin | Recipient note origin address |
| 40–43 | output1_commitment[0..3] | Recipient note commitment (4 × M31 limbs) |
| 44 | output2_amount | Change amount back to sender |
| 45–48 | output2_owner_hash[0..3] | Sender's owner hash (change note) |
| 49 | output2_blinding | Change note randomness |
| 50 | output2_origin | Change note origin address |
| 51–54 | output2_commitment[0..3] | Change note commitment (4 × M31 limbs) |
| 55 | is_ragequit | Flag: ragequit bypass |
| 56 | reserved | Reserved (zero-padded) |
Columns 0–32 are identical to the withdrawal circuit. The withdrawal-specific columns (33–45: withdraw_amount, change fields, recipient address) are replaced with two full output note structures.
4A.2 Constraint System
Six algebraic constraints, all degree 2, enforced at every row:
C1. Balance conservation:
C2. State Merkle chain continuity: (same as withdrawal)
C3. ASP Merkle chain continuity: (same as withdrawal)
C4. Origin check (output 1):
C5. Origin check (output 2):
C6. Binary constraint:
The origin constraints (C4, C5) enforce that when ragequitting, both output notes must preserve the original depositor's origin address. This prevents ragequit transfers from laundering the origin. In normal mode (), output origins are unconstrained — the ASP allowlist provides the compliance guarantee instead.
Note that output owner hashes are not constrained by the circuit. They are baked into the output commitments (which are public inputs), but the circuit does not enforce that they match the sender. This is by design — it enables transfers to different recipients.
4A.3 Multi-Mask Columns
Same as the withdrawal circuit: columns 25 (state Merkle merkle_current) and 29 (ASP Merkle asp_current) use mask offsets [0, 1], making them "shifted" columns. Both circuits share these column indices, which are passed as constructor parameters to the verifier (MULTI_MASK_COL_0 = 25, MULTI_MASK_COL_1 = 29).
4A.4 Fiat-Shamir Seed
The transfer circuit binds public inputs via:
seed = keccak256(abi.encodePacked(
nullifier, // bytes32
stateRoot, // uint256
aspRoot, // uint256
token, // address (20 bytes)
outputCommitment1, // bytes32
outputCommitment2 // bytes32
))Unlike the withdrawal seed, there is no amount or recipient field — transfers don't reveal amounts and both outputs stay in the pool.
5. Field Arithmetic in Solidity
5.1 Bit-Packing Strategy
EVM words are 256 bits. We pack field elements to minimize memory operations:
| Type | Packed as | Layout |
|---|---|---|
| M31 | uint32 | Single 31-bit value (max ) |
| CM31 | uint64 | Low 32: real, High 32: imaginary |
| QM31 | uint128 | Low 64: first CM31, High 64: second CM31 |
This allows two QM31 elements per EVM word and makes pack/unpack operations simple bit shifts.
5.2 M31 Arithmetic
Addition: . Single addition + conditional subtraction.
Multiplication: Uses the Mersenne reduction identity. For and product :
with at most one subtraction of . This avoids expensive modular reduction.
Inversion: Via Fermat's little theorem: . Uses binary exponentiation (~31 multiplications).
5.3 CM31 Arithmetic
Standard complex multiplication:
Inversion uses the norm:
5.4 QM31 Arithmetic
Multiplication in where :
Inversion: where is a CM31 element inverted via cm31Inv.
See the companion page Field Arithmetic Deep Dive for exhaustive test vectors and gas benchmarks.
6. Fiat-Shamir: The Keccak Channel
6.1 Channel State
The channel maintains:
digest: a 32-byte Keccak state (initially zero)nDraws: counter for draw operations (reset on each mix)
6.2 Mix Operations
mixRoot(root): digest = keccak256(digest ‖ root). Used when the prover commits a Merkle root.
mixFelts(felts): digest = keccak256(digest ‖ felts[0]_LE ‖ felts[1]_LE ‖ ...). Each QM31 is encoded as 4 × M31, each M31 as 4 little-endian bytes.
mixU64(value): Used for the PoW nonce.
6.3 Draw Operations
drawU32s(): Produces 8 pseudorandom uint32 values:
hash = keccak256(digest ‖ nDraws_LE ‖ 0x00)
result = parse hash as 8 × LE uint32
nDraws += 1drawSecureFelt(): Draws 8 uint32s, checks all are in (rejection sampling with ~ retry probability), reduces to M31, packs first 4 as QM31.
6.4 Proof of Work
Stwo includes a proof-of-work check to increase soundness:
prefixedDigest = keccak256(0x12345678_LE ‖ [0; 12] ‖ digest ‖ nBits_LE)
result = keccak256(prefixedDigest ‖ nonce_LE)
check: trailing_zeros(result_as_LE_u128) ≥ nBitsOur circuit uses nBits = 10, adding ~10 bits of soundness.
7. Circle FRI Protocol
FRI (Fast Reed-Solomon Interactive Oracle Proof) is the core of STARK verification. Circle FRI adapts the standard FRI protocol to circle-group domains.
7.1 Overview
The FRI protocol proves that a committed polynomial has low degree by repeatedly folding it:
- First layer (Circle → Line): Fold the circle domain polynomial to a line domain
- Inner layers (Line → Line): Standard FRI folding on line domains
- Last layer: Check that the final polynomial equals the committed constant
For our parameters:
- Input: degree polynomial on circle domain of size (LIFTING_LOG_SIZE = 7)
- After circle fold: degree on line domain of size
- After 5 inner folds: degree on line domain of size
- Last layer: verify as degree-1 polynomial
7.2 Circle-to-Line Fold
Given evaluations and at conjugate pairs (where ):
In our implementation, the denominator is absorbed into the twiddle factor, and we compute:
f0 = fP + fNegP;
f1 = (fP - fNegP) * twiddleInv; // twiddleInv = 1/y
folded = f0 + alpha * f1;The twiddle factor is where is the y-coordinate of the circle domain point (in bit-reversed order).
7.3 Line-to-Line Fold
Standard FRI folding on univariate polynomials over line domains:
The twiddle factor is where is the line domain point.
7.4 FRI Domain Construction
Each FRI layer operates on a progressively smaller domain:
| Layer | Type | Domain | Log Size |
|---|---|---|---|
| First | Circle → Line | CanonicCoset(7).circle_domain() | 7 |
| Inner 0 | Line → Line | half_odds(6) | 6 |
| Inner 1 | Line → Line | half_odds(6).double() | 5 |
| Inner 2 | Line → Line | half_odds(6).double().double() | 4 |
| Inner 3 | Line → Line | ... | 3 |
| Inner 4 | Line → Line | ... | 2 |
| Last | Polynomial eval | ... | 1 |
The line domain half_odds(n) after doublings has:
initial = subgroupGen(n + 2 - k)step = subgroupGen(n - k)log_size = n - k
7.5 Pair Reconstruction
When the verifier has query evaluations at some positions but not their pairs, it fills in missing siblings from the FRI witness. Three cases:
- Both siblings queried: Both and are in the query set — use directly
- Only even position queried: Odd sibling comes from witness
- Only odd position queried: Even sibling comes from witness
After folding each pair, the verifier checks the Merkle decommitment of both siblings against the layer's committed root.
See FRI Protocol Deep Dive for worked examples.
8. DEEP Quotient Computation
The DEEP (Domain Extension for Eliminating Pretenders) technique is where the verifier checks that the committed polynomial agrees with the out-of-domain (OOD) evaluations. This is the most mathematically intricate part of the verifier.
8.1 OOD Point Generation
The verifier draws a random QM31 from the Fiat-Shamir channel and maps it to a point on the QM31 circle via stereographic projection:
This gives the OOD evaluation point .
The shifted point is , where is the subgroup generator of order .
8.2 Quotient Formula
For each column sample (column value at the query point, OOD evaluation, evaluation point), the verifier computes a quotient contribution using line equations through conjugate pairs.
Given:
- = OOD sample value (QM31)
- = sample point (QM31 circle point)
- = queried trace value at query point M31
The line through and is:
where:
The quotient contribution is:
where is the random coefficient and is the accumulation index.
8.3 The Line Denominator
The denominator for the line equation, evaluated at the query point:
where the superscripts denote the CM31 real and imaginary parts of the QM31 coordinates. This is a CM31 value, inverted once per query and reused across columns sharing the same evaluation point.
8.4 Periodicity Samples
Critical Implementation Detail
This was the second critical bug. Missing periodicity samples shifts all subsequent alpha powers, corrupting every quotient after the first shifted column.
Stwo's build_samples_with_randomness_and_periodicity adds an extra quotient contribution for every column with 2 or more mask offsets (i.e., columns that read both current and next rows).
In our circuit, columns 25 (merkle_current) and 29 (asp_current) have mask [0, 1]. For these columns, instead of the normal 1 or 2 contributions, Stwo emits 3 contributions per column:
- Periodicity sample — evaluates at the shifted point using the offset-1 OOD value
- Offset-0 sample — evaluates at the OOD point using the offset-0 OOD value
- Offset-1 sample — evaluates at the shifted point using the offset-1 OOD value
Each contribution consumes one power of .
For our specific parameters, the periodicity point coincides with the shifted point. This is because:
period_generator = CanonicCoset(lifting_log_size).step().repeated_double(column_extended_log_size)- With
column_extended_log_size = lifting_log_size = 7:step = G_7(subgroup generator of order )- After 7 doublings: `G_7^5 = G_7^128|G_7| = 128$, so this is the identity
- Therefore
periodicity_point = shifted_point + identity = shifted_point
The implementation:
if (col == 25 || col == 29) {
// 1. Periodicity sample (at shifted point, offset-1 value)
uint128 shiftedVal = p.traceOodValues[s.tOodIdx + 1]; // peek ahead
s.acc = _addQ(s.acc, s.alphaPow, qVal, shiftedVal,
pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);
// 2. Offset-0 sample (at OOD point)
s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
pts.oodsY, ctx.qy, ctx.denomOodsInv);
s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);
// 3. Offset-1 sample (at shifted point)
s.acc = _addQ(s.acc, s.alphaPow, qVal, p.traceOodValues[s.tOodIdx++],
pts.shiftedY, ctx.qy, ctx.denomShiftedInv);
s.alphaPow = qm31Mul(s.alphaPow, p.randomCoeff);
}The total number of alpha powers consumed is: 37 columns × 1 (single-offset) + 2 columns × 3 (shifted) + 8 composition columns × 1 = 51 contributions.
See DEEP Quotients Deep Dive for the full derivation.
9. Merkle Verification
9.1 Leaf Hashing
Leaves are column values hashed as flat little-endian bytes:
leaf_hash = keccak256(col0_LE4 ‖ col1_LE4 ‖ ... ‖ colN_LE4)For the trace tree: 46 columns × 4 bytes = 184-byte leaf. For the composition tree: 8 columns × 4 bytes = 32-byte leaf. For FRI trees: QM31 values as 16-byte leaves.
9.2 Bottom-Up Walk
The verifier performs a bottom-up Merkle tree walk:
- Start with leaf hashes at queried positions
- At each level, pair siblings:
- If both siblings are in the query set, use both directly
- Otherwise, fetch the missing sibling from the witness array
- Compute parent:
keccak256(left ‖ right)(32 + 32 = 64 bytes) - Advance to the next level
- At the root, check against the committed root
The tree height equals the domain log size (e.g., 7 for the main trace tree at LIFTING_LOG_SIZE = 7).
10. Proof Serialization
10.1 Format
All values are little-endian, read sequentially with no padding or alignment:
[Commitments]
n_trees: u32
per tree: hash (32 bytes)
[Sampled Values]
n_trees: u32
per tree:
n_cols: u32
per col:
n_samples: u32
per sample: qm31 (16 bytes = 4 × m31_LE4)
[Decommitments]
n_trees: u32
per tree:
n_hashes: u32
per hash: 32 bytes
[Queried Values]
n_trees: u32
per tree:
n_cols: u32
per col:
n_vals: u32
per val: m31 (4 bytes LE)
[Proof of Work]
nonce: u64 (8 bytes LE)
[FRI Proof]
First layer:
commitment: 32 bytes
n_witness: u32, per witness: qm31 (16 bytes)
n_decomm: u32, per decomm: 32 bytes
Inner layers:
n_layers: u32
per layer: same as first layer
Last layer:
n_coeffs: u32, per coeff: qm31 (16 bytes)10.2 Proof Size
For our parameters (3 queries, 46+8 columns, 6 FRI layers):
| Section | Approximate Size |
|---|---|
| Commitments | 100 bytes |
| Sampled values | 800 bytes |
| Decommitments | ~800 bytes |
| Queried values | ~700 bytes |
| PoW nonce | 8 bytes |
| FRI proof | ~2,500 bytes |
| Total | ~5 KB |
11. Testing and Cross-Validation
11.1 Methodology
Verification was performed through incremental cross-validation: each component was tested independently against Rust-computed reference values before integration.
The Rust prover includes a comprehensive debug transcript test (test_serialize_debug_transcript) that outputs intermediate values at every verification step. These values were hardcoded into Solidity tests for bit-exact comparison.
11.2 Test Suite
72 tests across 6 test suites:
| Suite | Tests | What It Covers |
|---|---|---|
| M31Lib | 21 | Field arithmetic: M31, CM31, QM31 add/sub/mul/inv, circle point operations |
| KeccakChannel | 7 | Mix/draw operations, PoW verification, rejection sampling |
| CircleDomain | 8 | Bit reversal, subgroup generators, circle/line domain points |
| MerkleVerifier | 6 | Leaf hashing (M31, QM31), tree verification |
| FriVerifier | 4 | Circle fold, line fold, last layer polynomial eval |
| CircleStarkVerifier | 13 | Phase-by-phase verification, OOD quotients, full proof |
| VerifierHarness | 13 | Integration: 7-phase debug harness, query context, quotient cross-validation |
11.3 The Full Proof Test
The definitive test is test_verifyRealProof: a complete Stwo proof generated by the Rust prover is hardcoded as hex bytes and verified end-to-end in Solidity. This exercises every code path: parsing, Fiat-Shamir, Merkle verification, quotient computation, and FRI.
forge test --match-test test_verifyRealProof
[PASS] test_verifyRealProof() (gas: 17,183,996)11.4 Cross-Validation Process
For each verification phase, we followed this process:
- Add
console.logstatements to the Rust prover's verification path - Run the Rust verifier, capture intermediate values
- Hardcode those values in Solidity tests
- Compare Solidity outputs against Rust outputs
- When mismatches occur, bisect to find the divergence point
This process uncovered the two critical bugs:
- QM31 conjugation: The
_conjQfunction was negating imaginary parts of each CM31 independently instead of negating the entire second CM31 component - Periodicity samples: Missing the extra quotient contribution for columns with 2+ mask offsets
Both bugs produced outputs that were close to correct (most columns matched) but diverged at specific points, requiring careful intermediate-value comparison to isolate.
12. Gas Analysis
12.1 Breakdown
Total verification (withdrawal circuit, 46 columns): ~20M gas
The transfer circuit (57 columns) uses the same FRI parameters and trace dimensions, so verification gas is expected to be similar (~20M gas). The additional 11 trace columns increase OOD quotient computation slightly but FRI (the dominant cost) remains unchanged.
| Phase | Estimated Gas | % |
|---|---|---|
| Proof parsing (Phases 1-3) | ~2M | 12% |
| PoW verification | ~0.1M | <1% |
| Merkle decommitments (Phase 5) | ~3M | 17% |
| DEEP quotients (Phase 6) | ~5M | 29% |
| FRI verification (Phase 7) | ~7M | 41% |
FRI dominates because it involves multiple rounds of:
- Bit reversal and domain point computation (expensive circle point scalar multiplication)
- QM31 folding operations
- Merkle verification at each layer
12.2 Optimization Opportunities
The current implementation prioritizes correctness over gas efficiency. Potential optimizations:
- Precomputed twiddle tables: Store domain points instead of computing via scalar multiplication
- Assembly field arithmetic: Hand-optimized M31/QM31 operations in Yul
- Batch Merkle verification: Combine multiple tree walks
- EIP-4844 blob data: Move proof bytes to blob space (~1 gas/byte vs ~16 gas/byte calldata)
- Larger FRI fold step: Fold 2+ layers at once (requires more witness data but fewer Merkle verifications)
With these optimizations, verification gas could potentially drop to 5–8M.
12.3 Comparison
| System | Proof Type | Gas | PQ-Secure | Trusted Setup |
|---|---|---|---|---|
| Groth16 (BN254) | Pairing SNARK | ~230K | No | Yes |
| PLONK (BN254) | Pairing SNARK | ~300K | No | Universal |
| Circle STARK (ours) | Hash-based | ~20M | Yes | No |
| Stone STARK | Hash-based | ~5M | Yes | No |
Our verifier is more expensive than pairing-based systems, but is the only option that provides both post-quantum security and no trusted setup. The gas premium is the cost of quantum resistance.
13. Security Analysis
13.1 Soundness
The STARK achieves conjectured ~100-bit soundness from:
- FRI soundness: per query, but boosted by the extension field (QM31 has elements)
- PoW: 10 additional bits
- DEEP quotient: out-of-domain sampling over QM31
13.2 Post-Quantum Security
The proof system relies solely on:
- Collision resistance of Keccak-256 (~128 bits, quantum: ~85 bits via Grover)
- Algebraic properties of M31/QM31 (no DLP assumption)
No known quantum algorithm provides a super-polynomial speedup against collision-resistant hash functions, making the system plausibly post-quantum secure.
13.3 Fiat-Shamir Security
The Keccak channel produces challenges that are computationally indistinguishable from random, assuming Keccak-256 behaves as a random oracle. The channel is non-adaptive: all challenges are derived deterministically from the proof transcript.
14. Conclusion
We have demonstrated that native Circle STARK verification on Ethereum L1 is feasible. The verifier correctly processes proofs from a modified Stwo prover, achieving end-to-end verification at ~20M gas. While more expensive than pairing-based alternatives, this is the only production-ready path to post-quantum zero-knowledge proofs on Ethereum today.
The verifier is parameterized via constructor immutables, supporting multiple circuit configurations — currently the withdrawal circuit (46 columns, 5 constraints) and the transfer circuit (57 columns, 6 constraints) — with zero runtime gas overhead. This design allows new circuits to be deployed as additional instances of the same verified contract code.
The implementation required solving several non-obvious challenges unique to circle-group STARKs:
- QM31 complex conjugation semantics differ from naive element-wise conjugation
- Periodicity samples in DEEP quotients consume additional alpha powers for shifted columns
- Circle domain point generation uses counter-intuitive index arithmetic (
half_odds(n)withn+2initial index) - FRI line domains after repeated doubling require careful tracking of initial/step indices
For the broader context of why post-quantum security matters and how this verifier fits into UPP's migration from SNARKs, see From SNARKs to STARKs.
All source code is available in the upp-sdk/src/contracts/src/stark/ and upp-sdk/stwo-prover/ directories.
References
- Haböck, U., Janeček, D., Nevo, A. (2024). Circle STARKs. IACR ePrint 2024/278.
- StarkWare. Stwo Prover. https://github.com/starkware-libs/stwo
- Ben-Sasson, E., et al. (2018). Scalable, transparent, and post-quantum secure computational integrity. IACR ePrint 2018/046.
- Ben-Sasson, E., Bentov, I., Horesh, Y., Riabzev, M. (2018). Fast Reed-Solomon Interactive Oracle Proofs of Proximity. IACR ePrint 2017/134.