Experiment 182: Skip preupdate dependency tracking when no streams active
Date: 2026-06-17
Status: Rejected
Direction:stream-rerun-dispatch
Archive:archive/exp-182
Benchmark Run: Focused dep_tracking_skip.dart (two order-flipped passes) + release-suite single-pass A/B + Tracelite stream-rerun-dispatch decision; see Results.
Problem
After exp 159 cleared the writer reply/request path, exp 147's residual
writer/request bucket on stream-active workloads stayed the largest
remaining slice. The recent rejection cluster (exp 170 Mutex.tryLock +
non-async Writer.execute, exp 171 cached _resolvedRuntime, exp 151
Completer<T>.sync() writer response) attacked that bucket by trimming
microtask hops; all three sat at or below the focused-harness noise floor.
The JOURNAL ["Mirroring a rejected experiment on the symmetric path does
not reopen its rejection"](JOURNAL.md) calls for a mechanism change, not
another scheduling tweak.
Looking elsewhere in the per-write path, the writer always populates a
dirty table / column set via SQLite's preupdate_hook (`native/resqlite.c
:preupdate_hook → dirty_set_add + dirty_columns_add_for_active_stmt`)
on every row mutation, then harvests it back to Dart at reply via
getDirtyTableDependencies (two FFI calls + toDartString per dirty
entry + TableDependency construction). When no streams are registered,
StreamEngine.onDependencyChanges short-circuits with _entries.isEmpty
— the harvest result is consumed by an immediate return. The accumulation
work in C and the marshalling work in Dart are then both wasted.
Hypothesis
Toggling preupdate_hook off at the C level (and skipping the reply
harvest) when _streamEngine.length == 0 at request-send time should
reduce per-row preupdate cost on wide / batch writes and the per-write
harvest cost on single-row writes, without affecting workloads that have
streams active. Acceptance criterion (set before running): Single Inserts
and Concurrent Single Inserts move at least 3% on the release suite with
neutral keyed-PK / A11c / many-streams measured-elapsed under Tracelite.
Approach
- Native (
native/resqlite.c,native/resqlite.h). Added a
track_dirty field to resqlite_db (default 1) and a fast `if
(!sdb->track_dirty) return; at the top of preupdate_hook` so the
per-row dirty_set_add + dirty_columns_add_for_active_stmt work is
skipped wholesale when off. New FFI exporter
resqlite_set_track_dirty(db, enabled) lets the writer isolate flip
the flag.
- Bindings (
lib/src/native/resqlite_bindings.dart). FFI binding
resqliteSetTrackDirty(db, int), mirrored from the header.
- Request shape (
lib/src/writer/write_worker.dart). Added a
tracksDirty: bool field to ExecuteRequest, BatchRequest,
CommitRequest, plus a new DrainRequest no-op barrier. The writer
isolate caches the native flag in _WriterState.nativeTrackEnabled
and uses _ensureTrackDirty(state, enabled) to call the FFI only on
transitions.
- Handler gating.
_handleExecute/_handleBatchattxDepth == 0: track and harvest
follow the request flag. When off, the harvest call is replaced with
TableDependencies.none (the C dirty set was never populated).
- In-transaction handlers always force track on (BEGIN is the natural
enable point) because the harvest decision must wait for COMMIT-time
stream presence.
_handleCommitatnewDepth == 0: harvests if the flag is set,
otherwise calls discardDirtyTableDependencies to drain the
accumulated state without marshalling it back.
- Send-side gate (
lib/src/writer/writer.dart).Writer.execute,
executeBatch, and the commit half of transaction read
_streamEngine.length > 0 at send time. On a false reading they flip
_sentUntracked = true.
- Race fence (
lib/src/stream_engine.dart+lib/src/database.dart).
The writer exposes drainIfUntracked which sends a DrainRequest
only when _sentUntracked == true (the no-op message lands behind
all in-flight requests in the worker's FIFO; its reply implies they
all committed). StreamEngine calls it from _createStream's
Future.sync body right before the initial reader-pool query, so a
newly registered stream cannot race against an in-flight write that
was sent with tracksDirty = false.
No public API change; native default keeps the existing behavior so any
caller that does not opt into the new flag (e.g., the older binding
prebuilds) gets identical semantics.
dart test test/ — 303 passed, including the stream suite
(stream_test.dart, stream_invalidation_coalescing_test.dart,
stream_dependency_shapes_test.dart, `stream_cache_hit_reliability_test
.dart, stream_trigger_cascade_test.dart`) and the writer / transaction
suites.
Results
Focused benchmark (benchmark/experiments/dep_tracking_skip.dart, 9 rounds + 3 warmup, two order-flipped passes)
Pass 1 (baseline first):
| Lane | Baseline | Candidate | Δ |
|---|---|---|---|
| sequential-awaited (2000 writes, no streams) | 28.065 ms | 27.006 ms | −3.8% |
| wide-batch-no-streams (10000 rows × 20 params) | 12.104 ms | 11.816 ms | −2.4% |
| tx-loop-no-streams (50 tx × 100 writes) | 21.950 ms | 22.856 ms | +4.1% |
| with-streams guardrail (1000 writes, 1 stream) | 19.426 ms | 19.942 ms | +2.7% |
Pass 2 (candidate first):
| Lane | Candidate | Baseline | Δ (cand vs base) |
|---|---|---|---|
| sequential-awaited (2000 writes, no streams) | 26.977 ms | 28.481 ms | −5.3% |
| wide-batch-no-streams (10000 rows × 20 params) | 12.258 ms | 13.032 ms | −5.9% |
| tx-loop-no-streams (50 tx × 100 writes) | 22.834 ms | 21.814 ms | +4.7% |
| with-streams guardrail (1000 writes, 1 stream) | 19.524 ms | 18.820 ms | +3.7% |
The two no-stream non-tx lanes agree across both order-flipped passes —
real wins. The with-streams guardrail also agrees across passes in the
same direction (+2.7% / +3.7%), and benchmark/ab_drift_check.dart
classifies that signature as reproduced rather than drift — the
optimization carries measurable per-write overhead even when tracking is
on. The tx-loop lane is noisy (35–40% within-pass spread on both sides);
both passes lean +4% but cannot be distinguished from variance.
Release-suite A/B (run_release.dart, single pass each side)
| Lane | Baseline | Candidate | Δ |
|---|---|---|---|
| Single Inserts (100 sequential) — resqlite | 1.54 ms | 1.46 ms | −5.2% (within MDE) |
| Concurrent Single Inserts (100 concurrent) | 1.04 ms | 1.00 ms | −3.8% (within MDE) |
| Batch Insert (1000 rows) | 0.41 ms | 0.39 ms | −4.9% (within MDE) |
| Batch Insert (10000 rows) | 4.00 ms | 3.73 ms | −6.8% (within MDE) |
| Wide Batch Insert (10000 rows × 20 params) | 12.80 ms | 12.60 ms | −1.6% (within MDE) |
| Batched Write Inside Transaction (100 rows / tx.execute loop) | 0.41 ms | 0.56 ms | +37% (single-pass noise at sub-1ms) |
Overall: 5 wins / 6 regressions / 156 neutral. Every flagged regression
is on a sub-1ms metric where the per-benchmark MDE (~±0.06 ms) is half
the absolute swing the metric takes between adjacent runs on this
machine. Without a second order-flipped pass these regressions cannot be
distinguished from per-run drift — same shape as the exp 144 single-pass
swing (19/18/124 vs 30/2/129 across reruns on a vendoring-only change).
Tracelite decision (run_tracelite_experiment.dart --direction=stream-rerun-dispatch --runs=3)
Decision: inconclusive. Primary metric (measured_elapsed_ns on
resqlite):
| Scenario | Baseline | Candidate | Δ | Max CV | Verdict |
|---|---|---|---|---|---|
high-cardinality-fanout | 332 ms | 334 ms | +0.55% | 0.48% | neutral → pass |
keyed-pk-subscriptions | 239 ms | 236 ms | −1.16% | 10.2% | too_noisy → inconclusive |
many-streams-writer-throughput | 552 ms | 553 ms | +0.17% | 0.49% | neutral → pass |
Stream measured-elapsed is neutral on the two stable primaries and too
noisy on keyed-pk-subscriptions. But the warmup guardrail tells a
different story:
| Scenario | Baseline | Candidate | Δ | Max CV | Verdict |
|---|---|---|---|---|---|
high-cardinality-fanout warmup | 25.7 ms | 29.0 ms | +12.4% | 13.1% | too_noisy |
keyed-pk-subscriptions warmup | 19.6 ms | 26.6 ms | +35.9% | 11.7% | too_noisy |
many-streams-writer-throughput warmup | 25.9 ms | 31.3 ms | +21.0% | 8.63% | too_noisy |
All three stream-direction warmups regressed by 12–36% in the same
direction. too_noisy removes the gate signal but the direction
matches the focused benchmark's with-streams regression: the candidate
adds measurable per-write overhead when streams are active, and the
warmup phase — where streams are being registered and the
drainIfUntracked barrier fires once per _entries-empty-to-nonempty
transition — concentrates it.
Decision
Rejected. The optimization works as designed on the no-stream write
shapes it targeted:
- ~4–5% on Single Inserts release lane (sub-MDE on the public guard, but
matched in the focused dep_tracking_skip.dart sequential lane at
−3.8% / −5.3% across order-flipped passes).
- ~3–6% on no-stream wide batch (the wider the row, the more preupdate
fires the C-level flag skips per batch).
But the per-call gate it adds on the stream-active path costs back what
it earns:
- ~3% slower on the focused
with-streams guardraillane across both
order-flipped passes — reproduced, not drift.
- Tracelite stream-direction warmup elapsed +12–36% across all three
scenarios in the same direction (gated too_noisy, but the matched
direction is the same signal the focused harness shows).
Resqlite's primary use case is reactive streams, so the workload mix in
which the candidate helps (write-heavy without active streams) is
narrower than the one it slows (any window of writes with at least one
active stream). Net unfavorable on the realistic mix. The implementation
also brings complexity that has no upside when tracking is on: a
tracksDirty field threaded through four request types, a DrainRequest
barrier, a Dart-side _sentUntracked flag, and a writer-isolate mirror
of the native flag with FFI-call elision.
Acceptance criterion did not clear: the Tracelite primary gate stayed
inconclusive (the keyed-pk-subscriptions primary was too noisy and one
of the three stream warmups regressed beyond the 5% noise gate even
though too_noisy precluded a hard fail).
Would reopen if a real workload shows write throughput without
active streams is on a hot path the library should care about — at that
point the focused dep_tracking_skip.dart harness gives a clean signal
for the no-stream lanes, and the with-streams overhead would need a
different mitigation (e.g., reads the streamEngine flag once per request
build via a cached bool hasStreams on the writer to amortize, or
restricts the gate to single-row standalone writes where the per-write
overhead is smallest).
Future Notes
archive/exp-182keeps the full implementation reachable for
cherry-pick; the per-row preupdate skip is the load-bearing piece for
any future revisit.
- The
DrainRequestno-op barrier is also useful on its own as a
test-side fence (e.g., "drain the writer FIFO so the next assertion
sees post-commit state"). If a future experiment needs it, factor it
out of this archive separately.
- The
track_dirtyfield andresqlite_set_track_dirtyFFI are sound;
if a follow-up needs to selectively disable the preupdate hook (e.g.,
for an admin-only bulk-load API that opts out of stream invalidation),
the C-side mechanism here is the starting point.