Rejected

Apr 22, 2026 · Rejected

Experiment 093: Alias cache entry's read tables instead of copying

Date: 2026-04-22

Status: Rejected

Archive:archive/exp-093

Problem

On every reader query, resqlite_get_read_tables copied the read-table

list out of a per-reader resqlite_read_set scratch buffer. That scratch

was populated on each query by read_set_load_from_cache_entry, which

called read_set_reset (freeing every previously-owned string) and then

read_set_add (which strdups each table name back in) for every

dependency of the cached statement.

For a statement whose authorizer captured N read tables, that meant N

free + N strdup + N memcpy of the captured strings on every

reader query — even on cache hits where the table list on the cache

entry was already authoritative and immutable.

Hypothesis

Eliminate the per-query strdup/free churn by having

resqlite_get_read_tables alias directly into the cache entry's

read_tables array. The cache entry's string storage lives for the

cache's lifetime (invalidated as a unit on DDL), so as long as we stash

a pointer to the currently-active entry on the reader we can serve the

lookup with zero string operations — just a pointer follow + copy

of N char* values.

Expected savings: 2 × N × (strdup cost) per query — order of a few

hundred nanoseconds on a 3–4 table statement. Not large by itself, but

the path runs on every stream registration / initial query, and the

target is "cheap-check-first" territory continuous with exp 077.

Approach

Added a resqlite_cached_stmt* current_read_entry field to

resqlite_reader. get_or_prepare_reader publishes the entry pointer

on both the cache-hit path (after sqlite3_reset) and the cache-miss

path (after insertion). Fresh-prepare failure clears the alias to NULL

so stale pointers can't leak out.

resqlite_get_read_tables now reads

readers[reader_id].current_read_entry->read_tables directly, dropping

the read_set_load_from_cache_entry call from the hot path entirely.

Safety argument: a reader is held exclusively for the duration of a

query (pool-assigned dedicated slot, exp 030). Between the

get_or_prepare_reader return and the

resqlite_get_read_tables/release sequence there is no opportunity for

the cache to mutate on the same reader, so the aliased pointer is valid

for as long as it's read.

Results

Three-run profile-mode A/B comparison (candidate vs. baseline), medians

of percentiles across runs. Raw JSONs are gitignored; the aggregate is

at benchmark/profile/results/exp-093-aggregate.md.

workload / metric	baseline	candidate	Δ%	baseline CV
merge_rounds executeBatch p50	106 μs	109 μs	+2.8%	2.4%
merge_rounds executeBatch p99	280 μs	615 μs	+119.6%	30.1%
merge_rounds executeBatch max	816 μs	1726 μs	+111.5%	7.0%
noop execute p99	69 μs	96 μs	+39.1%	14.5%
noop execute max	569 μs	936 μs	+64.5%	16.4%
noop select p99	53 μs	71 μs	+34.0%	19.3%
point_query select p99	33 μs	38 μs	+15.2%	14.6%
point_query select max	1179 μs	2732 μs	+131.7%	92.7%
single_insert execute p99	77 μs	130 μs	+68.8%	14.4%
single_insert execute max	1379 μs	1612 μs	+16.9%	13.4%
work medians (all workloads)	—	—	+0–3 μs	—

Full per-run values are in the aggregate file.

Decision

Rejected — twice over:

Work medians are unchanged. Every workload's "work"

measurement (C-side query execution) is within 0–3 μs of baseline —

i.e., within the noise floor. The change saves the strdup/free

pair, but that saving does not show up against everything else the

work line measures.

Tail metrics regressed, but the regression pattern is not

attributable to the change. noop workloads regressed on p99 and

max just as much as point_query and single_insert, even though

noop execute never runs a reader query and therefore never touches

read_set_load_from_cache_entry. That's a strong hint the pattern

is run-to-run variance (each side had only 3 samples; baseline CV on

point_query max was 92.7 %) rather than a real regression from

the change.

Combine the two: the change's hoped-for savings are below the

measurement floor, and there is no signal in the data that justifies

shipping it. Same class as exp 076 (pre-bound stmt cache rejected in

pre-implementation analysis: "bind is ~0.3 % of re-query wall time").

The strdup/free pair per query is real work, but at our query

rates it's just not visible.

What this tells us

The cache-hit read-table path is not the place to look for wins.

resqlite_get_read_tables is called at most once per stream-

registered query (not on plain reads, not on re-queries — exp 075's

two-pass hash skips dep lookup on unchanged results), and even when

it runs, N is typically 1–3. The absolute savings ceiling is tiny.

For future optimizations in this area, 3 profile-runs per side is

too few to resolve effects at this magnitude. Need ≥ 7 runs per side

for a tail-metrics claim, or a focused microbenchmark of the

strdup/free pair itself.