Swarmgram Notes

Validation notes

Short memos on how we evaluate Lewsearch: aggregate benchmarks, protocols, and failure modes. Not a substitute for a full academic paper. Customer-facing tables live on lewsearch.com/methodology.

BenchmarksMay 25, 20266 min read

460-Question Benchmark Overview

How Lewsearch synthetic panels are evaluated against independently fielded survey instruments from Texas, California, and national tri-metro sources.

Read →

MethodsMay 25, 20265 min read

Five-Fold Cross-Validation Protocol

Why every Lewsearch accuracy number is out-of-sample, and how we prevent calibration leakage across the benchmark pool.

Read →

BenchmarksMay 25, 20264 min read

Pre-Registered Held-Out Evaluation

The strictest generalization test: questions sourced after training freeze, scored under pre-specified exclusion rules.

Read →

PolicyMay 25, 20265 min read

What We Publish (and What We Withhold)

Building credibility without publishing the full internal roadmap. Public validation boundaries for Lewsearch.

Read →

ForecastsMay 25, 20264 min read

Pre-Registered Forward Forecasts (May 2026)

Frozen predictions on Pew AI Trends and Pew Top Problems benchmarks before ground truth is published.

Read →