Experiment 104: Re-eval of exp 094 (dirty/read table string reuse) under A11c fan-out

Date: 2026-04-25

Status: Rejected

Problem

Exp 094 (Apr 23, 2026) made

read_set_add and dirty_set_add keep the existing strdup buffer

when a reused slot already held the same table name, eliminating one

free/strdup pair per stable-table capture cycle.

It was rejected with: *"Focused dispatch was effectively flat and the

full suite produced no wins; native branch/lifetime complexity is not

justified."* The full release comparison reported 0 wins, 14

regressions, 139 neutral.

094's rejection happened before A11c existed. [A11c

(Many-Streams Writer Throughput)](../benchmark/suites/many_streams_writer_throughput.dart)

landed two days later (Apr 25) on

#39. It is

structurally a focused-dispatch suite under fan-out: 50 streams × 500

writes × ⌈50/4⌉ pool round-trips means thousands of stream-side

read_set_add calls and writer-side dirty_set_add calls per timed

loop, all hitting the same handful of table names repeatedly.

Exp 105 — also rejected, also Apr 25 —

explicitly identified A11c as the right test bed for re-evaluating

this class:

> The real lever for A11c is batching the per-write fan-out

> itself, not parallelizing it harder. Re-query batching (revisit

> exp 071 / 093 / 094 under A11c) and column-tracking dispatch

> elision (exp 052) both remain on the table.

This experiment is the 094 leg of that revisit. Pattern-matches

exp 065 — explicit re-eval of an earlier

rejection under a workload that didn't exist when the original verdict

was rendered.

Hypothesis

094's target — one free + one strdup per stable-table capture — is

real but small (~hundreds of ns per pair). On the original suite the

amplification factor was ≈1 capture per query, so the ceiling was

below MDE. A11c amplifies it: at N=50 streams × 500 writes, the

writer's dirty-table emit fires 50 × 500 = 25,000 times in the timed

loop, and the reader pool replays each subscribed stream's

selectIfChanged against a fresh capture — multiplying the

read_set_add traffic per write by ⌈50/4⌉=13 pool round-trips.

If the per-pair cost lifts above the noise floor under that

multiplier, A11c writer throughput should rise materially without

regressing anything else (the change is local to the slot-reuse

branch and its semantic invariant — that names returned to Dart stay

valid until the next capture cycle — is preserved).

Approach

The implementation is a literal port of 094's idea to current

native/resqlite.c (the file has evolved since 094 was authored:

stmt_cache_entry_set_read_tables, read_set_load_from_cache_entry,

and the surrounding allocator are post-094 additions, but the two

target functions are line-for-line the same as when 094 was written —

no merge/cherry-pick conflicts, no semantic drift).

094 did not have an archive/exp-094 tag of its own implementation —

the existing archive/exp-094 tag is for an unrelated renumbered

094 (skip sqlite3_column_count FFI on cache hit, also rejected). So

this is a port of the idea from 094's writeup, not a cherry-pick:

 // read_set_add — A11c-relevant on stream-side reads if (s->count < s->allocated) { if (strcmp(s->names[s->count], table_name) == 0) { s->count++; return;        // reuse strdup buffer; no free, no strdup } free(s->names[s->count]); } s->names[s->count] = strdup(table_name); // dirty_set_add — A11c-relevant on writer-side dirty emit //   (same shape, RESQLITE_MAX_DIRTY_TABLES) 

Tested locally; reverted before commit since the result was below

the noise floor (see Results). The implementation sketch above is

preserved here for any future cherry-pick if measurement conditions

shift.

Results

Artifacts:

*(reused — same hardware, same day, same --include-slow invocation;

pre-dates the candidate by minutes)*

The harness picked 2026-04-25T19-47-54-exp105-reader-pool-8.md (the

exp 105 cap=8 rejected candidate) as its automatic comparison

anchor because it was the most recent file in benchmark/results/,

not the cap=4 baseline that's actually relevant. The deltas reported

in the candidate file's "Comparison vs Previous Run" table compare

against cap=8 and are therefore "less bad than the rejected

exp 105 candidate" rather than "vs current main." The numbers below

are computed manually against the cap=4 baseline, which is the

correct anchor.

A11c writer throughput (50 streams × 500 writes, primary signal)

ScenarioBaseline (ms)Candidate (ms)Baseline w/sCandidate w/sΔ w/s
No-streams baseline9.9813.8050,11036,232−28%(noisy: baseline CV 30%, candidate CV 30%)
Disjoint (50 streams)126.39121.363,9564,120+4% (within ±16% MDE_ci)
Overlap (50 streams)111.69114.084,4774,383−2% (within ±20% MDE_ci)

The two stream-fan-out scenarios — the actual amplification target —

moved by a few percent in opposite directions, both well inside the

candidate's bootstrap CI. The no-streams baseline's −28 % w/s looks

large in isolation but lives in a high-variance row (CV 30 % on both

sides) and is not corroborated by any other write or read benchmark

showing similar regression direction; without repeats it is

indistinguishable from run-to-run drift.

Suite-level summary

8 wins, 10 regressions, 152 neutral in the harness's automatic

comparison vs the cap=8 anchor. After back-out: the wins are

"recovered from cap=8 catastrophe" rather than "improved over cap=4"

(e.g. High-Cardinality Stream Fan-out −47 % and *4× Concurrent

Reads −69 %* are both reverting cap=8's regressions, not new gains).

None of the 10 regressions point at an isolated change driven by the

fast-path itself; they pattern-match run-to-run drift on small-row

schema-shape and scaling rows.

Concurrent reads (1× / 2× / 4× / 8×) vs cap=4 baseline

094's original hypothesis touched the read-set capture path. Cross-check:

ConcurrencyBaseline (ms)Candidate (ms)Δ
0.290.31+7 % (within 10 % noise)
0.370.33−11 % (within 10 % noise)
0.410.45+10 % (within 10 % noise)
0.760.84+11 % (within 16 % CV)

No directional signal — three rows trend slightly slower, one

slightly faster. Per-cycle savings on read_set_add are not lifting

the read path at any concurrency level.

Decision

Rejected. Even under A11c-shaped fan-out — the maximum

amplification we currently have for read_set_add / dirty_set_add

traffic — the 094 fast-path's deltas are inside the noise band on

every workload it should affect:

This empirically validates 094's original rejection rationale

("focused dispatch was effectively flat") under stronger

amplification: the per-cycle free/strdup pair is real but lives

below the harness's measurement floor. The native branch + lifetime

complexity is still not justified.

Pattern-matches exp 102 (savepoint string cache, also rejected on

"theoretically removable, practically below noise") and the broader

late-2026 cluster of rejected micro-shaves on the dispatch path

(071, 076, 081, 093, 094, 095, 099, 102). The codebase has chased

this lever to the floor; further per-call savings on read-set /

dirty-set capture should wait for a structural change that

significantly increases per-write traffic — A11c is already the

heaviest workload available and it isn't enough.

Code reverted before commit. Writeup keeps the implementation sketch

in Approach for cherry-pick if a future workload (e.g.

many-streams-many-tables) ever amplifies the path further.