Experiment 083: Stream rerun pre-dispatch queue

Date: 2026-04-20

Status: In Review

PR:#25

Problem

High-fan-out stream scenarios (A11, A11b) were still spending a large

amount of time on reruns that eventually finished stale.

The new stream timing counters showed the cost was not:

It was mostly reader-pool wait time. Under bursty writes, many

different streams each contributed one rerun, and those reruns sat inside

the generic ReaderPool queue long enough to become stale before they

mattered.

Hypothesis

If reruns are coalesced before they enter ReaderPool, instead of

after they are already waiting for a reader, then:

semantics

Approach

Added a bounded pre-dispatch rerun queue in StreamEngine.

Instead of immediately dispatching every rerun request to ReaderPool:

writeGen

Supporting observability was added so the decision could be based on

measured queue wait, worker execution, and completion time rather than

wall-clock alone.

Results

Scenario profiler: direct bottleneck hit

Compared to the pre-queue observability baseline:

ScenarioMetricBaselinePre-dispatch queue
A11reruns started1024737
A11stale reruns968553
A11pool wait / rerun2155 us0.3 us
A11breruns started1123705
A11bstale reruns1020552
A11bpool wait / rerun3880 us0.1 us

Worker execution stayed small in both cases (roughly 44-46 us per rerun),

which confirms the optimization is hitting the real bottleneck: queued

reruns waiting on readers.

Real suite sections: broad behavior stays in band

Three alternating baseline/candidate pairs were run. Median summary:

ScenarioBaselineCandidate
A6 Feed Reactive111.991 ms111.826 ms
A11 Keyed PK225.37 ms217.35 ms
A11b High-card fan-out427.35 ms229.49 ms
A7 bulk burst56.59 ms54.38 ms
A7 merge rounds3.02 ms3.17 ms

Interpretation:

collapse

Primary Metrics

Guardrail Metrics

Decision

Keep this in review.

This is the first stream scheduler change that directly attacked the

measured bottleneck instead of changing timing heuristics around it. The

important learning is:

add delays after reruns were already in flight

The remaining review question is code complexity versus benefit, not

whether the optimization is hitting the right layer.