Experiment 143: Tracelite profile insight audit

Date: 2026-06-08

Status: In Review

Direction:measurement-system

Benchmark Run: None

Problem

Resqlite now routes its preferred profile workflow through Tracelite, but a

scheduled experimenter still needs to know whether that path produces more than

a trace file. The useful question for this pass was not "can Tracelite run?" It

was whether the current Tracelite artifacts make performance characteristics

clear enough to guide future optimization work.

Hypothesis

A pinned Tracelite profile run should provide decision-useful structure that a

release peer comparison does not: dispatch floors, floor-subtracted work,

operation tails, memory deltas, SQLite diagnostics, allocation counters, source

provenance, and graph data. If the generated insight layer is strong enough, a

future runner should be able to read insights.md before opening raw JSON.

Approach

Ran the canonical profile wrapper twice on current origin/main with the pinned

Tracelite checkout and ARM64 Dart runtime:

 /Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart run \ benchmark/profile/run_tracelite_profile.dart \ --tracelite-root=/Users/dan/Coding/tracelite \ --dart=/Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart \ --label=exp-143-profile-baseline \ --out-dir=build/tracelite-profile/exp-143-profile-baseline /Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart run \ benchmark/profile/run_tracelite_profile.dart \ --tracelite-root=/Users/dan/Coding/tracelite \ --dart=/Users/dan/Coding/flutter_arm64/bin/cache/dart-sdk/bin/dart \ --label=exp-143-profile-repeat \ --out-dir=build/tracelite-profile/exp-143-profile-repeat \ --no-graph-data 

The first run exported graph data and validated it. The repeat skipped graph

export and was used only to check whether the decomposition was stable.

Raw trace regions remain in build/ and are not committed. The aggregate record

is committed at

benchmark/profile/results/exp-143-tracelite-profile-insights.md.

Results

Tracelite profile artifacts:

The full baseline graph export validated successfully and produced:

datasetrows
workload_summary4
workload_operations41
workload_memory132
workload_fanout0

The headline profile numbers:

runworkloadopp50 usp99 usmax uswork usrss delta MBwal delta Brows decodedcells decoded
baselinenoopselect1293459-1.53101000010000
baselinesingle_insertexecute201707829414.547171392000
baselinepoint_queryselect11782048018.297050000300000
baselinemerge_roundsexecuteBatch938904136770.485824000
repeatnoopselect121031416-2.90701000010000
repeatsingle_insertexecute211064455512.391171392000
repeatpoint_queryselect13501402120.281050000300000
repeatmerge_roundsexecuteBatch93510912770.079824000

Noop floors were stable in both runs:

runreader floor uswriter floor us
baseline1216
repeat1216

insights.md was much thinner than the structured data. It reported only:

severityfindingdetail
goodWorkload summaries loaded4 workload(s) are available for inspection.

Analysis

Tracelite did demonstrate value, but most of that value is currently in the

structured artifacts rather than the generated prose.

The dispatch-floor split is immediately useful. Point queries are at or barely

above the 12 us reader floor, with 0-1 us of floor-subtracted work. That argues

against more point-query SQL or decode micro-optimization as the next target;

the remaining median cost is dispatch shaped.

Merge rounds show the opposite shape. Their p50 is stable at 93 us, with 77 us

of floor-subtracted work in both runs. That is the clearest current target for

batch encoding, parameter packing, or SQLite step-path analysis.

Single inserts sit near the writer floor at the median: 20-21 us p50 against a

16 us writer floor. Tracelite also surfaces the storage side effect that wall

time alone would hide: WAL growth is stable at 1,713,920 bytes across runs.

Point queries show why memory diagnostics matter. The median wall time says

"dispatch-bound," but the profile still records 50,000 rows and 300,000 cells

decoded plus roughly 18-20 MB RSS delta. That is useful signal for future

allocation-focused work where time alone would miss the cost.

The repeat also confirms a known methodology caveat: tails are noisy. p99 and

max moved substantially between the two runs while p50, dispatch floors, and

work medians stayed stable. A future p99 claim should use a multi-run A/B.

Decision

Accept for review - measurement.

The Tracelite profile workflow is worth keeping as the default experiment path.

It captures the right low-level facts in one pinned, provenance-recorded run and

exports graph data that can feed the docs/dashboard path.

The follow-up is not another legacy profile harness. It is a Tracelite

interpretation improvement: tracelite explain should emit workload-summary

rules for dispatch-bound, work-bound, memory-heavy, and tail-noisy workloads so

future runners do not need to reverse-engineer those conclusions from JSON.

Validation