Experiment 054: Profile-Guided Optimization (PGO)
Date: 2026-04-15
Status: Rejected (infrastructure limitation on macOS)
Problem
The -O3 compiler flag makes optimization decisions based on static heuristics (code structure, loop analysis). PGO uses actual execution profiles to make better decisions: branch prediction hints, hot/cold code layout, function ordering, and inlining thresholds. Published benchmarks show 10-15% speedup for SQLite workloads (ClickBench: -20%, Speedtest1: -12%).
Hypothesis
A two-pass build — instrument, run benchmarks, recompile with profile — should yield 10-15% improvement across all query paths by optimizing the code layout for actual access patterns in resqlite's workload (WAL reads, batch row stepping, JSON serialization).
Approach
Attempted a manual two-pass build:
- Phase 1: Compiled
libresqlite.dylibwith-fprofile-generate=.pgousing Clang directly (bypassingnative_toolchain_c). Successfully produced an instrumented dylib.
- Phase 2: Swapped the instrumented dylib into the Dart build cache and ran the full benchmark suite with
LLVM_PROFILE_FILEset to capture profile data.
- Phase 3 (never reached): Merge
.profrawfiles withllvm-profdata mergeand recompile with-fprofile-use.
Why It Failed
The LLVM profiling runtime registers __llvm_profile_write_file() via atexit() to flush profile data when the process exits. However, when the instrumented code is in a shared library (libresqlite.dylib) loaded by a host process (the Dart VM), the atexit handler in the dylib does not fire reliably on macOS:
- The Dart VM may
dlclose()the library before exit, destroying the runtime state - The VM may call
_exit()instead ofexit(), bypassingatexithandlers entirely - On macOS,
atexithandlers in dylibs have platform-specific ordering issues
No .profraw files were generated despite the instrumented dylib running successfully and producing correct benchmark results.
Workarounds investigated:
LLVM_PROFILE_FILEenvironment variable — set explicitly, still no output- Search entire filesystem for
.profraw— none found - Adding
__attribute__((destructor))to call__llvm_profile_write_file()— would require modifying the LLVM runtime or adding a manual FFI binding, too invasive
Decision
Rejected due to infrastructure limitation. PGO is theoretically sound and likely delivers the expected 10-15% speedup, but cannot be implemented in the current macOS + Dart FFI + native_toolchain_c build setup without a custom CI pipeline.
Future path: PGO could work in a CI pipeline that:
- Builds a standalone C benchmark binary (not a Dart-loaded dylib)
- Runs it to generate profraw files
- Merges profiles and commits the
.profdatato the repo - The build hook uses
-fprofile-usepointing to the committed profdata
This is standard practice (Chromium, Firefox check in profdata), but requires CI infrastructure beyond the current project setup.