Experiment 054: Profile-Guided Optimization (PGO)

Date: 2026-04-15

Status: Rejected (infrastructure limitation on macOS)

Problem

The -O3 compiler flag makes optimization decisions based on static heuristics (code structure, loop analysis). PGO uses actual execution profiles to make better decisions: branch prediction hints, hot/cold code layout, function ordering, and inlining thresholds. Published benchmarks show 10-15% speedup for SQLite workloads (ClickBench: -20%, Speedtest1: -12%).

Hypothesis

A two-pass build — instrument, run benchmarks, recompile with profile — should yield 10-15% improvement across all query paths by optimizing the code layout for actual access patterns in resqlite's workload (WAL reads, batch row stepping, JSON serialization).

Approach

Attempted a manual two-pass build:

  1. Phase 1: Compiled libresqlite.dylib with -fprofile-generate=.pgo using Clang directly (bypassing native_toolchain_c). Successfully produced an instrumented dylib.
  1. Phase 2: Swapped the instrumented dylib into the Dart build cache and ran the full benchmark suite with LLVM_PROFILE_FILE set to capture profile data.
  1. Phase 3 (never reached): Merge .profraw files with llvm-profdata merge and recompile with -fprofile-use.

Why It Failed

The LLVM profiling runtime registers __llvm_profile_write_file() via atexit() to flush profile data when the process exits. However, when the instrumented code is in a shared library (libresqlite.dylib) loaded by a host process (the Dart VM), the atexit handler in the dylib does not fire reliably on macOS:

No .profraw files were generated despite the instrumented dylib running successfully and producing correct benchmark results.

Workarounds investigated:

Decision

Rejected due to infrastructure limitation. PGO is theoretically sound and likely delivers the expected 10-15% speedup, but cannot be implemented in the current macOS + Dart FFI + native_toolchain_c build setup without a custom CI pipeline.

Future path: PGO could work in a CI pipeline that:

  1. Builds a standalone C benchmark binary (not a Dart-loaded dylib)
  2. Runs it to generate profraw files
  3. Merges profiles and commits the .profdata to the repo
  4. The build hook uses -fprofile-use pointing to the committed profdata

This is standard practice (Chromium, Firefox check in profdata), but requires CI infrastructure beyond the current project setup.