Experiment 075: Native-buffered hash for selectIfChanged

Date: 2026-04-16

Status: Accepted

Problem

Stream re-queries via selectIfChanged go through the full Dart decode

path (allocate values list, decode every cell into a Dart String /

Uint8List, build RowSchema, return a ResultSet) then hash the values.

When the hash matches lastResultHash, all of that decoded material

is discarded — the stream emits nothing.

In dashboard apps with multiple streams on the same table, most writes

change at most one stream's result set. Every other stream's re-query

pays the full decode cost for a result that gets thrown away.

Hypothesis

Move the hash into native code, hashing raw cell bytes during

sqlite3_step without materializing any Dart objects. If the hash

matches the cached value, return the "unchanged" sentinel immediately.

Only on mismatch do we re-step the statement and build Dart objects.

Trade-off: the "changed" path pays two sqlite3_step passes (hash pass

+ decode pass). The expectation is that unchanged re-queries dominate

in realistic reactive workloads, so the net impact is positive.

Approach

C side

Hashing lives entirely in C. Two helpers and one exported entry point:

and BLOB payloads.

the bound statement to completion, folds every cell's type tag +

value bytes + length into an FNV-1a accumulator, and returns the

final hash. Calls sqlite3_reset at both ends so the caller can

invoke this on a freshly-bound stmt or on one just drained by

decodeQuery — same function, both use cases.

Per-cell fold rules (both paths must agree bit-for-bit with the

re-query path, because they do):

Dart side

```

stmt = stmt_acquire_on(...) // binds params, reset

new_hash = resqlite_query_hash(stmt) // pass 1: hash in C

if (new_hash == last_hash) return (new_hash, null)

raw = decodeQuery(stmt, sql) // pass 2: step + decode

return (new_hash, raw) // reuse pass-1 hash

```

Same hash value from pass 1 is reused as the new baseline — no need

to recompute during decode.

```

raw = decodeQuery(stmt, sql) // step + decode

hash = resqlite_query_hash(stmt) // step + hash (replay)

return (raw, readTables, hash)

```

Pays one extra sqlite3_step walk through the same read-only query

once per stream subscription. Amortized across every subsequent

re-query of that stream (typically hundreds to thousands over the

stream's lifetime), the cost is negligible.

emission, removing the Dart-side _hashResult function and its

result_hash.dart module (both now dead code — hashes are C-computed

everywhere).

Hash domain consistency

Only one hash implementation exists (resqlite_query_hash). Initial

query and re-query both call it on the same stmt, so the baseline and

every comparison hash live in the exact same domain by construction.

No possibility of drift between paths.

Benchmark

Needed a new benchmark to measure the target path: none of the

existing streaming benchmarks exercise unchanged re-queries. Added

Unchanged Fanout Throughput (1 canary + 10 unchanged streams) in

benchmark/suites/streaming.dart:

stream registry doesn't dedupe them. All select ~1000 rows and

never change (writes target id > 1000).

a signal used for iteration synchronization.

processes them in waves. The canary's emission bounds iteration

time.

dominates noise.

Also verified the change correctness against the existing probe

tool/experiments/hash_probe.dart.

Results

A/B with 053 library changes stashed in/out to control for thermal

drift:

BenchmarkBaseline (ms)exp053 (ms)Δ
Unchanged Fanout Throughput0.440.27−39%
Invalidation Latency (changed)0.050.05within noise
Fan-out (10 streams, changed)0.180.22within noise
Stream Churn (100 cycles)2.002.18within noise

All 129 existing tests pass plus 3 new DDL-invalidation tests from

exp 051.

The "changed" path's 2-pass cost does not show up as a measurable

regression on any existing streaming benchmark. The 2-pass overhead

(step+hash, then step+decode+hash) is bounded above by the decode

cost of one pass — small compared to the dispatch + isolate costs

that dominate.

Why the earlier version of this idea (exp 072) failed

Exp 072 tried to swap FNV-1a for xxhash64 on already-hashed 64-bit

values. On pre-hashed inputs, FNV's simpler xor+multiply beats

xxhash's multi-step merge round. 072 regressed stream paths by

+45-75%.

The insight 053 rescues: xxhash/FNV choice matters for byte-stream

hashing. In 053 we hash raw cell bytes in C — the scenario where

the hash function's throughput actually matters. Here FNV is fine

because we control the implementation and can avoid Dart↔C

marshaling, so the algorithmic constant factor is not the

bottleneck.

Decision

Accepted. Large win (−39%) on the target path, zero measurable

regression on any other path. Hash code consolidates to a single C

implementation; the main-isolate _hashResult becomes dead code and

is removed.

Follow-ups

the changed-path 2-step cost pathologically.

same memcpy-vs-FFI-savings wall as exp 018 and was rejected

separately. A future experiment could combine 053's hashing with

selective batching only when text/blob content is small (exploit

the fact that pure-numeric rows don't need the memcpy).