Experiment 171: Resolved runtime cache for Database hot paths

Date: 2026-06-14

Status: Rejected

Direction:stream-rerun-dispatch

Problem

Exp 159 attacked fixed per-round-trip overhead inside Writer._request by

caching _sendPort and removing the await _workerPort.future microtask

hop. It left two structural notes:

event-loop scheduling in both isolates).*

a different transport.*

Every public Database method (select, selectBytes, execute,

executeBatch, transaction) still does

final ... = await _runtime; at the top. _runtime is a single Future

created in the constructor and completed once reader pool + writer have

spawned. After that point every call awaits an already-resolved Future,

which still schedules its continuation as a microtask. That is one

microtask hop above the writer/reader call, on every Database method

call.

If exp 159's residual is really port-wake + event-loop scheduling, the

same pattern at the layer above should chip another hop off every call.

Whether it actually moves the focused benchmark is the question.

Hypothesis

Mirror exp 159's _sendPort cache one layer up: populate a

sync-readable _resolvedRuntime field inside _runtime's body the

moment its spawn awaits complete, and let post-open callers read the

field directly instead of awaiting an already-resolved Future. Removes

one microtask hop per call. At ~1–2 µs per hop the upper bound on

sequential-awaited (2000 writes) is ~2–4 ms out of a ~32 ms median:

roughly 6–12% theoretical, comfortably above the focused-benchmark

noise band exp 159 used to accept its win.

Approach

lib/src/database.dart:

ReaderPool.spawn and Writer.spawn complete, assign the runtime

record to _resolvedRuntime before return runtime.

final ... = _resolvedRuntime ?? await _runtime;. Five call sites:

select, selectBytes, execute, executeBatch, transaction.

called rarely and benefit nothing.

one-shot stream wraps the future, not an await, and is invoked

once per stream creation.

Safety: _resolvedRuntime is set inside the same microtask that

completes _runtime, so callers awaiting _runtime and callers reading

the cache see the same runtime record. There is no "partially

initialized" window observable from main isolate code.

No public API change. No new test scaffolding. No temporary

instrumentation — the implementation is pure overhead removal.

Results

Focused benchmark: dart run benchmark/experiments/writer_pipelining.dart,

two passes, 7 rounds each, order flipped between passes (per the JOURNAL

"phase-ordered A/B" lesson).

passshapebaselinecandidatedelta
1sequential-awaited (2000)32.169 ms31.425 ms−2.3%
1concurrent-burst (10×200)24.465 ms25.818 ms+5.5%
1transaction-guardrail (50×10)4.587 ms4.244 ms−7.5%
2sequential-awaited (2000)31.794 ms32.603 ms+2.5%
2concurrent-burst (10×200)24.089 ms25.138 ms+4.4%
2transaction-guardrail (50×10)4.093 ms4.342 ms+6.1%

Sequential-awaited flips sign between passes (−2.3% / +2.5%);

transaction-guardrail flips sign with the same magnitude (−7.5% /

+6.1%); concurrent-burst is consistently small-positive (+4–5%) but

inside the round-to-round spread on the same lane.

Excluding round-0 warmup outliers (each shape has a ~2× first-round

spike from cold spawn) does not change the picture: sequential-awaited

moves to −1.5% / +1.7%, same alternating-sign pattern.

dart analyze lib: clean. `dart test test/database_test.dart

test/stream_test.dart`: 75/75 pass.

Decision

Rejected — below the focused-benchmark noise floor across two

order-flipped passes. The theoretical 6–12% headroom did not show up

in measured wall time; the deltas alternate sign between passes at the

same magnitude as round-to-round variance on a single phase. That is

the signature of "the optimization is at or below the floor of this

harness," not "the optimization regressed."

No runtime code kept. The implementation pattern (sync field plus null-

coalesce on await) is preserved here as evidence so a future runner can

tell whether their new evidence changes the calculus.

Future Notes

the writer round-trip (~15 µs in the same harness). Saving one hop

per call is ~2–4 ms across 2000 sequential writes — at or below this

harness's per-round spread. Reopen if a future workload measures the

sequential-write floor cleanly enough to resolve sub-1µs/call deltas

(e.g., a 10k-write lane with tighter inter-round variance, or a

Tracelite scenario whose primary metric tracks per-call dispatch

cost).

reduction (exp 148, exp 151, exp 145): callback / scheduling counters

may move but measured-elapsed does not. Once the per-call overhead

drops below ~2 µs the focused harness can't separate it from machine

jitter.

(group commit) or a different transport" remains the correct

framing: the round-trip floor is the floor, and trimming individual

microtask hops at the Database layer above the writer does not move

it. The next candidate in this direction should be group-commit-

shaped, not hop-shaped.

can resolve sub-1µs per-call deltas on real workloads. Repeating the

measurement against the same harness will reproduce the same

alternating-sign noise.