Experiment 181: Single-stream long-payload hash

Date: 2026-06-17

Status: Rejected

Direction:long-text-stream-hashing

Benchmark Run: focused harness only (benchmark/experiments/single_stream_long_payload_hash.dart); no release-suite run because the native candidate was reverted.

Problem

Exp 110 accepted the 8-byte FNV byte-stream fold after the new 4 KB long-text

unchanged-fanout row showed a clear win over byte-at-a-time hashing. Exp 173

then tested a 16-byte unrolled fold against a 32 KB long-text workload and

rejected it: the candidate measured +4.5% and +12.1% versus the 8-byte body

across order-flipped passes.

The remaining uncertainty was whether exp 173 hid hash-loop overhead by

parallelizing eight unchanged streams across the normal reader pool. The signal

map left one explicit candidate: build a single-stream long-payload benchmark

that bypasses reader-pool parallelism, then use it to decide whether the

16-byte fold has any isolated public-API workload signal.

Hypothesis

If the pool-of-4 fanout was masking the loop-control cost, a one-reader,

one-unchanged-stream workload should make the byte-stream fold more dominant.

Under that shape, the exp 173 16-byte fold might finally show a stable win over

the exp 110 8-byte body.

Acceptance criterion: the 16-byte fold must improve the focused single-stream

harness across an order-flipped A/B pair. Reject if the medians overlap or the

effect changes sign, because the public stream path still cannot see the loop

unroll.

Approach

Added benchmark/experiments/single_stream_long_payload_hash.dart.

The harness uses internal runtime pieces so it can force a one-reader stream

engine without changing the public API:

64 KB BLOB, about 8 MB hashed serially per invalidation;

second emission, which can only happen after the one reader finishes the long

unchanged hash pass.

Then re-tested the exp 173 16-byte candidate in fnv_combine_bytes:

 for (; i + 16 <= len; i += 16) { uint64_t w0, w1; memcpy(&w0, b + i, 8); memcpy(&w1, b + i + 8, 8); h ^= w0; h = (h * RESQLITE_FNV_PRIME) & RESQLITE_FNV_MASK; h ^= w1; h = (h * RESQLITE_FNV_PRIME) & RESQLITE_FNV_MASK; } 

The candidate preserves the same serial xor/multiply sequence as the 8-byte

body; it only halves loop-control overhead. Native code was reverted after the

measurement.

Results

Focused harness, 2 warmup + 9 measured rounds per side.

PassOrderBaseline 8-byte medianCandidate 16-byte medianDelta
1baseline first2.771 ms2.763 ms-0.3%
2candidate first2.777 ms2.792 ms+0.5%

Measured ranges:

SideMedianp90MinMax
Baseline pass 12.771 ms3.150 ms2.636 ms3.150 ms
Candidate pass 12.763 ms3.066 ms2.659 ms3.066 ms
Candidate pass 22.792 ms3.015 ms2.642 ms3.015 ms
Baseline pass 22.777 ms2.974 ms2.655 ms2.974 ms

The candidate is indistinguishable from the baseline. Removing reader-pool

parallelism did not make the 16-byte loop body visible.

Decision

Rejected. The single-stream workload consumed the remaining open candidate and

refuted the premise that reader-pool parallelism was hiding a mergeable 16-byte

FNV win.

The 8-byte fold from exp 110 remains the right implementation. The 16-byte body

was reverted from native/resqlite.c; the harness is retained so a future

runner can recheck this path if a production profile makes long-payload

unchanged hashing hot again.

Future Notes

single-stream public stream workloads. Exp 173 and exp 181 now both show no

stable win for the 16-byte body.

direct resqlite_query_hash microbenchmark or production profile that splits

SQLite value access, hashing, reader dispatch, and reply delivery.

measurement tool, not a proposed public Database.open option.

Validation