Experiment 198: Direct-to-buffer integer and float JSON formatting

Date: 2026-06-24

Status: In Review

Direction:result-transfer-shape

Benchmark Run: Focused A/B (benchmark/experiments/select_bytes_int_heavy.dart and benchmark/experiments/select_bytes_real_int_fastpath.dart), order-flipped pair on a quiet box; release-suite single-pass A/B captured as a no-regression smoke (release lanes are not integer- or integer-real-heavy enough to register the focused signal, and the flagged single-pass rows live on writer-pipelining / scaling / re-emit paths this change cannot mechanically touch — same exp 159 / exp 177 phase-drift signature as exp 192 / exp 194).

Problem

After exp 192 collapsed the fast_i64_to_str digit

loop and exp 194 routed integer-valued REAL

through that same path, every INTEGER and integer-valued FLOAT cell in

write_json_to_buf still pays the same boilerplate around the formatter

call:

 case SQLITE_INTEGER: { char num[24]; int num_len = fast_i64_to_str(sqlite3_column_int64(stmt, i), num); JSON_CHECK(buf_write_str(b, num, num_len));   // memcpy stack → b break; } 

The stack scratch buffer is filled by the formatter and then immediately

memcpy'd into b->data + b->len by buf_write_str

buf_writememcpy. On a 10k row × 20 INTEGER column query that is

200,000 short memcpys of ≤20 bytes each, plus a non-inlineable

buf_write call that has to re-check capacity even though the formatter

already capped its write at ≤24 bytes.

fast_i64_to_str already takes a raw char*, and

fast_double_to_json_num already takes a char* + size_t; neither

inspects the destination beyond writing into it. They can write straight

into the output buffer if the caller guarantees capacity.

Hypothesis

Pre-reserving the maximum cell length (buf_ensure(b, 24) for INTEGER,

33 for FLOAT — one extra byte covers the snprintf NUL terminator on the

fractional fallback) and pointing the formatter at b->data + b->len

directly should remove one short memcpy per integer or float cell.

The signal should reproduce on the same int-heavy lanes that drove

exp 192 (select_bytes_int_heavy.dart) and on exp 194's integer-real

lanes (select_bytes_real_int_fastpath.dart), while the fractional

REAL fallback — dominated by snprintf("%.17g") — stays inside the

noise floor.

Approach

In native/resqlite.c:

buf_ensure(b, 24), then call fast_i64_to_str with the

destination set to (char*)(b->data + b->len), then advance

b->len by the returned digit count. fast_i64_to_str never

writes a NUL terminator, so 24 bytes (20 digits + sign + slack)

is exact.

buf_ensure(b, 33) and call fast_double_to_json_num(..., 33)

directly into b->data + b->len. The extra byte covers the

snprintf fractional fallback's NUL terminator, which lands inside

the buffer but is not counted toward b->len.

write_json_to_buf with single JSON_CHECK calls to the new helpers.

Bit-identical output: same formatter, same digits, same NUL handling

(the snprintf fallback's NUL still lands in scratch space below

b->cap).

Stack scratch arrays, the intermediate int num_len locals, and the

buf_write_str indirection on this path are gone.

No public API change. No new const data. The new helpers shave one

memcpy and one function-call boundary per integer or integer-real

cell; the buf_ensure cost is identical to what buf_write would have

paid anyway. The existing int extremes and real integer-valued

selectBytes tests in test/database_test.dart cover the

correctness-preserving boundary cases (0, ±1, ±999, ±10000, ±1234567890,

LLONG_MIN, LLONG_MAX, integer-valued REAL through ±max_exact_int).

Results

Two order-flipped passes on each focused harness, median of 6 rounds

per lane. Same-machine quiet box.

select_bytes_int_heavy.dart (exp 192's harness)

LaneBase P1Base P2Cand P1Cand P2Δ P1Δ P2
10k × 8 small ints3035299828172775−7.2 %−7.4 %
10k × 20 small ints6557652763926080−2.5 %−6.8 %
10k × 20 big ints (~18 digits)8350836176247572−8.7 %−9.4 %
10k × 8 mixed (4 int + 2 text + 2 real)8983904088309211−1.7 %+1.9 %
1k × 2 ints116115105106−9.5 %−7.8 %

All values µs/query median. The mixed-row guard is dominated by text and

real cells (~75 % of the per-row work); its split sign across passes is

sub-2 % phase noise of the kind exp 177 catalogued, not a real regression.

The small-magnitude (1k × 2) lane reproduces the per-cell win at

sub-millisecond scale.

select_bytes_real_int_fastpath.dart (exp 194's harness)

LaneBase P1Base P2Cand P1Cand P2Δ P1Δ P2
10k × 8 integral reals3252328029722990−8.6 %−8.8 %
10k × 20 integral reals6835683563236318−7.5 %−7.6 %
10k × 20 fractional reals68909692726934468287+0.6 %−1.4 %
10k × 8 mixed (4 int-real + 2 frac-real + 2 text)9593953193469425−2.6 %−1.1 %
1k × 2 integral reals122122111114−9.0 %−6.6 %

Integer-via-REAL inherits the integer-side win cleanly (−7 to −9 %).

Fractional REAL stays inside ±1.5 % across the flip — the snprintf

%.17g call dwarfs the saved memcpy, so the helper's only effect on

that path is to remove the intermediate num_len round trip.

Release-suite single-pass A/B + flip

Baseline: benchmark/results/2026-06-24T07-27-32-baseline-for-exp198.md.

Candidate: benchmark/results/2026-06-24T07-30-24-exp198-direct-buf-int-json.md.

Flagged rows are dominated by single-pass noise on lanes the change

cannot mechanically touch: Single Inserts (100 sequential) +12 % and

Disjoint/Overlap column re-emit counters live on the writer-pipelining

and stream-dispatch paths (exp 159 / exp 177 territory); `Large payload

(~650 KB) selectBytes` +21 % is a +0.05 ms swing on a 0.23 ms metric

whose payload is one large TEXT-heavy row (no integer cells); the only

mechanical-path-adjacent flag is `Select → JSON Bytes / 100 rows /

resqlite + jsonEncode` −11 % which is consistent with the focused

signal. No integer-heavy release lane crosses the per-benchmark MDE in

both directions; the broader spread is the phase-drift signature

exp 192 and exp 194 also produced single-pass.

Decision

In Review (candidate-accepted at the local level). Two order-flipped

focused passes both clear the per-benchmark MDE on the integer and

integer-real lanes (−6.6 % to −9.5 %), the fractional-REAL guard stays

inside ±1.5 %, the mixed-shape guards stay inside ±2 %, and the

selectBytes int-extremes + real-integer-valued tests in

test/database_test.dart continue to pass against the candidate

without modification. The change is ~20 lines of additive C, no new

const data, and no public API surface.

Why kept

The focused signal is structurally what the diff predicts: one fewer

short memcpy and one fewer function-call boundary per integer or

integer-real cell, with no other code paths perturbed. The integer and

integer-real wins extend the encoder line that

exp 023exp 192

exp 194 opened, and the helpers are

mechanically reusable by any future C-side JSON writer that wants the

same direct-write pattern.

What this leaves on the table

The fractional REAL path is still snprintf("%.17g"); a hand-rolled

Grisu2/Ryu would attack that, but it is a much larger change with

correctness audit cost and is out of scope here. The SQLITE_TEXT and

SQLITE_BLOB paths already write to the output buffer via

json_write_string / json_write_base64, which already do their own

buf_ensure and direct-write; no analogous helper would help them.

Operational notes

case bodies simplified.

cover correctness (LLONG_MIN, LLONG_MAX, integer-valued REAL

through ±max_exact_int, fractional REAL, negative zero — all

preserved).