Experiment 043: SWAR escape scanning + escape lookup table
Date: 2026-04-15
Status: Accepted
Problem
json_write_string in resqlite.c scans each byte individually through 8 if/else if comparisons to check for JSON-escapable characters (", \, \b, \f, \n, \r, \t, control chars < 0x20). For the overwhelmingly common case where no escape is needed, this is 8 branches per byte — 8000 branches per 1000-character string.
Hypothesis
Two complementary optimizations:
- SWAR (SIMD Within A Register): Load 8 bytes into a
uint64_tand check all escape conditions with ~10 bitwise ops. If the result is zero, skip all 8 bytes at once. This eliminates branch overhead for the common no-escape case (most strings contain no special characters).
- Escape lookup table: Replace the 8-way if/else chain with a 256-byte static lookup table indexed by character value. Maps each byte to its escape length (0 = safe, 2 = named escape, 6 =
\uXXXX). Reduces the per-escape-character branch chain from 8 comparisons to a single table lookup.
These stack naturally: SWAR handles bulk safe spans, the lookup table handles the rare escape characters efficiently.
Approach
// SWAR: check 8 bytes at once for escape-needing characters. while (i + 8 <= len) { uint64_t word; memcpy(&word, s + i, 8); // Check for bytes < 0x20 (control chars) uint64_t below_space = (word - 0x2020202020202020ULL) & ~word & 0x8080808080808080ULL; // Check for '"' (0x22) and '\\' (0x5C) via XOR + zero-detect uint64_t has_quote = ...; uint64_t has_bslash = ...; if ((below_space | has_quote | has_bslash) == 0) { i += 8; continue; } break; // Fall through to byte-by-byte } // Byte-by-byte with lookup table for (; i < len; i++) { unsigned char elen = json_esc_len[c]; // 0, 2, or 6 if (__builtin_expect(elen == 0, 1)) continue; // ... flush span, write escape } The SWAR technique is inspired by simdjson but uses no SIMD intrinsics — pure portable C that works on all platforms (ARM, x86, etc.).
Results
22 wins, 0 regressions — the strongest result of any experiment in this batch.
| Benchmark | Baseline (ms) | SWAR+LUT (ms) | Delta |
|---|---|---|---|
| selectBytes 1000 rows | 0.51 | 0.35 | -31% |
| selectBytes 10000 rows | 5.70 | 4.14 | -27% |
| Text-heavy schema (1000 rows) | 0.67 | 0.58 | -13% |
| Concurrent reads 8x | 0.77 | 0.68 | -12% |
| Parameterized queries | 15.89 | 13.90 | -13% |
The selectBytes improvements are the direct signal — 27-31% faster JSON serialization at scale. The SWAR fast-path eliminates branch overhead for the vast majority of string bytes (typical text has no JSON-special characters), while the lookup table makes the rare escape case branchless.
Decision
Accepted. 22 wins, zero regressions, and a 27-31% improvement on the targeted path (selectBytes at scale). The code is pure portable C with no platform-specific intrinsics. The SWAR approach is well-established (simdjson, yyjson) and the lookup table is a standard optimization.
Combined with experiment 041 (Ryu), these two changes halve selectBytes time: 5.70ms → 3.19ms at 10k rows (-44%).