Experiment 178: Missing-benchmark-run declaration guard

Date: 2026-06-16

Status: In Review

Direction:measurement-system

Benchmark Run: none (CI guardrail / methodology tooling; no runtime code, structural counter evidence below)

Problem

The resqlite-experiment contract ties every chartable experiment to a benchmark

result file: docs/experiments/history.json is built by mapping each

experiment to a benchmark/results/<ISO-timestamp>-<label>.md run, and a chart

point appears only when the experiment doc's Date: matches at least one

run's filename-timestamp date. The skill spells out the failure mode directly:

> Drop the result file and the experiment is invisible on the chart (though it

> still shows in the text list).

generate_history.dart already guards the wrong-file version of this: when an

Accepted experiment's chart slot points at a baseline-shaped run while a

candidate-shaped run also exists for the date,

_assertAcceptedExperimentsLinkToCandidates fails the build (the exp-109 chart

mixup). And the repo has the intended opt-out signal for experiments that

genuinely have no release run — a Benchmark Run: header reading none

/ n/a / not applicable / tracelite, recognised by

_skipsReleaseBenchmarkRunMapping (Tracelite A/B and focused-harness

experiments use it: exp 149, 169, 174, 177).

What is not guarded is the most common silent case: a chartable experiment

whose linker found no run at all and which never declared that absence.

That is exactly what happens when a runner forgets to commit the result file, or

gives it a date that does not match the doc's Date:. The experiment then

drops off the chart with no error anywhere in CI — indistinguishable from a

deliberate "no release run".

Structural evidence the gap is live

Running the generator on current main and tallying experiments whose linker

produced a null benchmark run, by status:

Statusnull-run countdeclare Benchmark Run:?
accepted70 of 7
in_review165 of 16 (121, 143, 147, 149, 169 — plus 174/177 focused/none)
rejected25(out of scope — see below)

Of the 23 chartable (accepted + in_review) experiments with no linked run, only

~5 declare the opt-out header. The other ~17 are silently unmapped. Several

predate the per-experiment-result-file convention (003–038), but several recent

ones do not (116, 118, 119, 125, 126, 136, 161, 172) — they were measured with

Tracelite or focused harnesses and simply never declared it. A forgotten result

file today produces the identical null with no signal that anything is wrong.

Hypothesis

The "no run AND no declaration" case is a small, deterministic, structural rule.

Encoding it as a build-time guard — alongside the existing

_assertAcceptedExperimentsLinkToCandidates — turns the silent

chart-invisibility failure into a loud CI error for new work, without any

runtime behavior change and without depending on noisy wall-time numbers.

Approach

opted out via Benchmark Run: (it already collected them into

skipRunMappingIds; previously the set was discarded after mapping).

optOuts, {cutoff})`: returns one issue line per accepted/in-review

experiment whose number is >= cutoff, whose linker found no run, and which

is not in the opt-out set. Pure + parameterised so it is unit-testable

without the exit(1) side effect.

exit(1)s, mirroring the existing baseline-link assertion. Wired into

buildHistoryData right after _assertAcceptedExperimentsLinkToCandidates,

so it runs in both generate_history.dart and the CI

check_generated_data.dart job.

grandfathered (same pattern as signals.json's

experimentEntriesRequiredFrom: 110), so no retroactive edits to ~17 older

docs are required. The cutoff is documented to bump only alongside a

backfill pass.

experiments commonly and legitimately carry no release run, so they are not

in scope.

group with six tests — fires on a >= cutoff chartable experiment with no run

and no opt-out; stays silent when the opt-out header is declared, when a run

is linked, when below cutoff, and for rejected experiments; plus a regression

anchor that runs the real experiments/ tree through the detector and asserts

it stays clean at the shipped cutoff.

This is methodology tooling in the class of exp 161 / 169 / 177: it changes no

runtime code (lib/, native/, hook/ untouched) and is not a

measurement-that-unlocks-an-implementation, so the paired-run carry rule does

not apply — the deliverable is the guard itself.

Results

Structural / behavioral evidence (machine load is irrelevant — no wall-time

claim):

CheckResult
Guard fires on synthetic missing-run exp >= 178yes (1 issue)
Guard silent when Benchmark Run: opt-out declaredyes
Guard silent when a run is linkedyes
Guard silent below cutoff (177, 116)yes
Guard silent for rejected experimentsyes
Live experiments/ tree at cutoff 178clean (no false positive)
dart run benchmark/generate_history.darthistory.json is current (unchanged)
dart run benchmark/check_generated_data.dartup to date
dart run benchmark/check_experiment_signals.dartvalid
dart test test/benchmark_pipeline_test.dart14/14 pass

The generated history.json is byte-for-byte unchanged: the guard adds a check,

not a field, and is a no-op on the current tree (no chartable experiment >= 178

exists yet, and all pre-178 experiments are grandfathered).

Decision

Accepted (In Review). A bounded, no-runtime-code CI guardrail that converts

a documented silent failure (forgotten result file / date mismatch →

invisible chart point, no error) into a loud build failure for new experiments,

reusing the repo's existing Benchmark Run: opt-out signal as the escape

hatch. Would reopen / extend if: a backfill pass annotates the ~17 pre-178

silently-unmapped chartable experiments (then the cutoff can drop toward the

start of the result-file convention), or if rejected experiments later need

chart coverage (then widen the status scope).

Future Notes

the older unmapped docs with Benchmark Run: (or commit their missing

result files) and lower _benchmarkRunDeclarationCutoff in the same change.

experiments page ever charts rejected experiments that ship a result file,

revisit the scope rather than the cutoff.