Experiment 146: Lower batch packing threshold

Date: 2026-06-08

Status: Rejected

Direction:parameter-encoding-and-binding

Benchmark Run: Tracelite A/B experiment, exp-146-lower-batch-pack-threshold

Problem

Experiments 125 and 126 justified direct text payload packing for large wide

batches. The live guard keeps the ASCII fast path narrow:

The remaining question was whether that guard is too conservative. If direct

ASCII packing is cheap enough, lowering the threshold could bring the allocation

win to smaller generated-statement batches without waiting for very wide or

very large rows.

Hypothesis

Lowering the ASCII batch-packing guard to paramCount >= 2 and

totalParamCount >= 64 should improve the narrow batch-insert lane, or at

least stay neutral, because it removes temporary UTF-8 list allocation from a

larger set of batch writes.

Reject if Tracelite cannot show a primary improvement on resqlite

narrow-batch-insert, if guardrails are too noisy to support the change, or if

the result merely broadens a fast path without measurable benefit.

Approach

Created two resqlite worktrees from origin/main at

423a74eda5d05c2d7fc6f0ba70fd978fb6c345d0:

Candidate patch:

 -const int _asciiBatchMinParamCount = 8; -const int _asciiBatchMinTotalParamCount = 8192; +const int _asciiBatchMinParamCount = 2; +const int _asciiBatchMinTotalParamCount = 64; 

Ran the integrated Tracelite A/B workflow with pinned Tracelite

a2bf3648836fcf680d0aceccb18c2b31a2109586 and ARM64 Dart:

 /Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart run \ benchmark/run_tracelite_experiment.dart \ --dart=/Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart \ --tracelite-root=/Users/dan/Coding/tracelite \ --baseline-root=/Users/dan/.codex/worktrees/resqlite-exp146-baseline \ --candidate-root=/Users/dan/.codex/worktrees/resqlite-exp146-candidate \ --label=exp-146-lower-batch-pack-threshold \ --direction=parameter-encoding-and-binding \ --runs=3 \ --min-repetitions=7 \ --max-repetitions=30 \ --out-dir=build/tracelite-experiments/exp-146-lower-batch-pack-threshold 

Before the final run, stale multi-day Dart test processes from an unrelated

dune_core checkout were terminated because they were consuming significant

CPU and would have polluted local benchmark stability.

Artifacts:

Results

The integrated wrapper completed clean baseline and candidate collection:

stepstatus
baseline suite historyok
candidate suite historyok
decision artifactinconclusive

Tracelite decision policy:

fieldvalue
expectationimprovement
primary threshold24.0%
max guardrail regression18.0%
max CV18.0%

Decision comparisons:

rolescenariopeermetricbaselinecandidatechangemax CVpstatuseffect
primarynarrow-batch-insertresqlitemeasured_elapsed_ns11.45 ms11.62 ms+1.45%13.2%0.513neutralinconclusive
guardrailnarrow-batch-insertsqlite_asyncmeasured_elapsed_ns14.93 ms12.28 ms-17.7%64.9%0.615too_noisyinconclusive

Decision insights:

severityfindingdetail
warningDecision is inconclusiveEvidence is not strong enough for a production decision.
warningGuardrails are inconclusiveOne guardrail comparison needs cleaner or repeated evidence.
warningPrimary metric did not clearresqlite changed by +1.45% with status neutral; 95% CI -0.81 ms..1.14 ms.

The graph-data export validated and produced decision-level data:

datasetrows
scenario_series1680
peer_summary12
decision_summary1
decision_comparisons2

Decision

Reject.

The result does not justify lowering the ASCII batch-packing threshold. The

primary resqlite lane is neutral, not an improvement, and the guardrail is too

noisy to add confidence. Since the candidate only broadens an optimization

guard, there is no correctness or maintainability reason to carry it without a

clear timing win.

Keep the current large-wide-batch guard from exp 125. Small and narrow batch

writes should stay on the generic path unless a future workload shows parameter

encoding is a material part of wall time.

Workflow Notes

This experiment also validated the new Tracelite A/B workflow:

to the baseline worktree, then the candidate worktree, and restored the

original override afterwards.

calibration does not prevent collecting both sides. The decision step remains

the gate that reports accepted, rejected, or inconclusive evidence.

the visualizer gets all repeated-run samples instead of a single suite

manifest.