Experiment 055: Columnar typed arrays for results
Date: 2026-04-15
Status: Rejected (memory win real but below time-based benchmark floor)
Problem
Query results use List<Object?> in row-major layout. Every int and double is boxed — a 64-bit integer costs ~24 bytes (8 pointer + 16 boxed object) vs 8 bytes in an Int64List. For 10,000 rows × 5 numeric columns, that is ~1.2MB vs ~400KB — a 3x memory overhead. This also creates ~50,000 GC-visible objects vs 5 typed arrays.
Hypothesis
Replace the flat List<Object?> with per-column typed arrays: Int64List for integer columns, Float64List for float columns, List<String> for text columns. Expected benefits:
- 75% memory reduction for numeric columns
- 1.8x faster isolate transfer (TypedData
memcpyvs boxed list deep-copy) - ~10,000x fewer GC objects for numeric-heavy results
- 10-15% faster row iteration (unboxed access beats pointer chase)
Research Findings
Detailed benchmarking of Dart VM behavior confirmed the hypothesis:
| Metric | List<Object?> | Int64List | Ratio |
|---|---|---|---|
| Per-element memory (int) | ~24 bytes | 8 bytes | 3x |
| Allocation 100k elements | 339μs | 11μs | 31x |
| SendPort.send 100k elements | 1,268μs | 285μs | 4.4x |
| Isolate.exit 500k elements | 2,329μs | 820μs | 2.8x |
| GC objects per 10k ints | ~10,001 | 1 | 10,000x |
For realistic mixed-type schemas (2 int + 1 double + 2 string + 1 nullable string), columnar iteration was 1.1-1.4x faster than flat boxed, and isolate transfer was 1.8x faster.
Why Not Implemented
- Throughput is not the bottleneck. At 10k rows, select() takes 5.57ms. Allocation is ~3% of that. Even a 10x allocation speedup saves ~0.15ms — below the benchmark noise floor.
- Memory wins don't show in time-based benchmarks. The 75% memory reduction is real and valuable for apps, but the benchmark suite measures wall time, not memory. GC pressure differences only manifest under sustained allocation (production workloads), not in isolated benchmark iterations.
- Large surface area change. The
Rowclass implementsMap<String, Object?>.ResultSetextendsList<Map<String, Object?>>. Changing the internal storage requires modifying Row, ResultSet, RowSchema, the decode loop, the sacrifice threshold logic, the hash function, and all stream result caching. The risk of subtle behavior changes outweighs the measurable benefit.
- String columns remain boxed. Dart strings are heap objects regardless. For text-heavy schemas (common in CRUD apps: names, emails, descriptions), there is no memory win for the dominant column type.
- The sacrifice threshold already handles transfer cost. Results >256KB use Isolate.exit (zero-copy). Results <256KB use SendPort (copy). The columnar transfer speedup only matters for the SendPort path, and at <256KB the absolute transfer time is already sub-millisecond.
Decision
Rejected — assessed but not implemented. The memory benefits are real but would require a different benchmark methodology (memory profiling, GC pause tracking) to validate. The throughput benefits are too small relative to the benchmark noise floor to justify the architectural complexity.
Future consideration: If memory profiling shows GC pressure from large result sets in production apps, columnar storage would be the right fix. The implementation path is clear: typed column arrays in RawQueryResult, schema stores per-column types, Row dispatches by column type index. Not architecturally difficult — just not worth the risk for a marginal throughput win.