Experiment 010: ASCII Fast-Path for String Decoding
Date: 2026-04-06
Status: Rejected
Problem
utf8.decode is called for every text column value. Most database strings are ASCII. Could an ASCII fast-path (String.fromCharCodes for all-ASCII bytes) avoid the UTF-8 validation overhead?
Hypothesis
For ASCII-only strings (all bytes < 128), String.fromCharCodes would be faster than utf8.decode because it skips UTF-8 validation. A byte scan to check for non-ASCII bytes would be cheap for short strings.
What We Tested
Micro-benchmark comparing utf8.decode vs a fast-path that scans for non-ASCII bytes and uses String.fromCharCodes for ASCII, falling back to utf8.decode for non-ASCII.
Results
| String type | utf8.decode | Fast-path | Winner |
|---|---|---|---|
| Short ASCII (10 bytes) | 50 ns | 20 ns | Fast-path (60% faster) |
| Medium ASCII (50 bytes) | 55 ns | 37 ns | Fast-path (33% faster) |
| Long ASCII (100 bytes) | 84 ns | 81 ns | Tie |
| Unicode (emoji) | 44 ns | 53 ns | utf8.decode (20% slower) |
| Unicode (CJK) | 72 ns | 80 ns | utf8.decode (11% slower) |
The fast-path wins for short ASCII strings but the advantage shrinks with length. For non-ASCII strings, it's strictly worse (double scan — first for ASCII check, then utf8.decode).
Aggregate Impact
At 5,000 rows with ~3,000 text values: estimated savings of ~0.27ms (18ns per string × 15,000 strings). Measured in practice: within noise. The utf8.decode implementation in the Dart VM is already highly optimized C code.
Why Rejected
- Marginal aggregate improvement (~0.27ms) not worth the code complexity
- Slower for non-ASCII strings (double scan)
utf8.decodeis implemented in native C inside the Dart VM — hard to beat with Dart-level code- The byte scan loop is itself Dart code with per-byte overhead
Key lesson: The Dart VM's native implementations of common operations (utf8.decode, SendPort.send) are highly optimized C++. Dart-level alternatives rarely win.