Experiment 148: Reader reply batching

Date: 2026-06-08

Status: Rejected

Direction:stream-rerun-dispatch

Benchmark Run: Tracelite A/B experiment, exp-148-reader-reply-batching

Problem

Experiment 136 made reader-completion churn look like the next bounded stream

implementation target. On A11c overlap, the main-isolate reader worker port

handler chain accounted for 28.57% of total wall (burst + drain), at about

18 us per callback across 4,228 callbacks per burst. Subscriber fanout was

less than 1% of that chain, so the plausible target was not StreamEntry.emit;

it was the repeated reader reply / Future resolution / _requery continuation

for streams that usually short-circuit as unchanged.

The question for this run was whether collapsing multiple stream re-query

replies into one reader-worker response would turn that profile signal into an

end-to-end win on the stream rerun dispatch suite.

Hypothesis

Batching stream re-query work inside a reader worker should reduce the number

of main-isolate completion handlers when many dirty streams hash as unchanged.

That should improve A11c overlap or many-stream writer throughput, or at least

stay neutral on keyed-PK subscriptions.

Reject if the candidate only improves the profile counter but fails to clear

the Tracelite primary gate, or if keyed-PK / disjoint-shaped work regresses.

Approach

Created two resqlite worktrees from origin/main at

8c783e787485eaace3a77612aaab2f9528c53b39:

Candidate commit:

Candidate patch:

reader worker protocol

eight stream entries, preserving the single-entry path for unbatched work

requeue behavior and per-entry error propagation

Before running the formal suite, a profile smoke compared exp 136's completion

counter harness on baseline vs candidate:

workloadversiontotal_mscompletion_uscompletion_countcompletion_us / total
A11c overlapbaseline336.40109,5624,52732.57%
A11c overlapcandidate288.3055,6491,42519.30%
keyed PK subscriptionsbaseline450.5517,9191,1713.98%
keyed PK subscriptionscandidate446.7315,7087153.52%

That confirmed the implementation changed the intended mechanism, so the full

Tracelite A/B run was worth doing.

Ran the integrated Tracelite workflow with pinned Tracelite

a2bf3648836fcf680d0aceccb18c2b31a2109586:

 dart run benchmark/run_tracelite_experiment.dart \ --dart=/usr/local/bin/dart \ --tracelite-root=/Users/dan/Coding/tracelite \ --baseline-root=/Users/dan/.codex/worktrees/resqlite-exp148-baseline \ --candidate-root=/Users/dan/.codex/worktrees/resqlite-exp148-reader-reply-batching \ --label=exp-148-reader-reply-batching \ --direction=stream-rerun-dispatch \ --runs=2 \ --min-repetitions=5 \ --max-repetitions=12 \ --out-dir=build/tracelite-experiments/exp-148-reader-reply-batching 

Artifacts:

Results

The wrapper collected clean baseline and candidate histories:

stepstatus
baseline suite historyok
candidate suite historyok
graph data exportvalid
decision artifactinconclusive

Tracelite decision policy:

fieldvalue
expectationimprovement
primary threshold28.0%
max guardrail regression21.0%
max CV21.0%

Decision comparisons:

rolescenariopeermetricbaselinecandidatechangemax CVpstatuseffect
primaryhigh-cardinality-fanoutresqlitemeasured_elapsed_ns368 ms387 ms+5.18%10.1%0.315neutralinconclusive
primarykeyed-pk-subscriptionsresqlitemeasured_elapsed_ns301 ms341 ms+13.5%13.5%0.000694neutralinconclusive
primarymany-streams-writer-throughputresqlitemeasured_elapsed_ns592 ms611 ms+3.28%5.14%0.165neutralinconclusive

Decision insights:

severityfindingdetail
warningDecision is inconclusiveEvidence is not strong enough for a production decision.
warningPrimary metric did not clearhigh-cardinality-fanout changed by +5.18% with neutral status; 95% CI -10.97 ms..49.09 ms.
warningPrimary metric did not clearkeyed-pk-subscriptions changed by +13.5% with neutral status; 95% CI 13.87 ms..67.34 ms.
warningPrimary metric did not clearmany-streams-writer-throughput changed by +3.28% with neutral status; 95% CI -7.66 ms..46.53 ms.

The decision step exited 65 because --expect=improvement was not met. The

artifacts were still preserved and are the source of this writeup.

Decision

Reject.

The mechanism worked, but the product-level result did not. The candidate

reduced completion callback count and completion counter wall in the profile

smoke, yet the formal Tracelite suite did not show a measured-elapsed

improvement. The keyed-PK subscription lane was slower by 13.5% with a

positive confidence interval, even though it stayed inside the configured

guardrail threshold.

Do not merge the reader-worker batch protocol or _requeryBatch(...) shape.

The extra protocol surface and stream-engine complexity are not justified by a

counter-only win.

Future stream-dispatch work should move past plain reader-reply batching and

split the residual writer/request bucket from exp 147 more narrowly: dirty-set

harvest, writer reply send, main-isolate request resolution, and drain-time

coordination are better next targets than another worker-side reply batch.

Workflow Notes

This was a useful stress test for the Tracelite experiment workflow under a

real implementation attempt:

mistaken for a mergeable performance win.

graph data, and decision insights even when the expectation failed.

Validation