BenchmarksMay 25, 20266 min read

460-Question Benchmark Overview

How Lewsearch synthetic panels are evaluated against independently fielded survey instruments from Texas, California, and national tri-metro sources.

Lewsearch is evaluated against 460 independently fielded survey questions drawn from three source families plus a held-out batch: the University of Texas / Texas Politics Project (120 items), the Public Policy Institute of California (175 items), a pooled Pew / Gallup / official canvass tri-metropolitan legacy set (151 items), and a separate pre-registered held-out batch (14 scored items after automated exclusions) sourced after training and calibration were frozen.

Every published error rate is out-of-sample. We fit calibration on held-out folds and score on data the calibrator never saw during fitting. Pooled calibrated mean absolute error is 7.47 percentage points on marginal response proportions, measured on the ex-electoral subset (418 of the 460 questions); the remaining items are electoral questions reported separately.

Panel-level results (ex-electoral subsets)

Texas (UT/TPP): 4.88% calibrated MAE on the 97 ex-electoral items (of 120 total)
California (PPIC): 7.77% on the 148 ex-electoral items (of 175 total)
Tri-metro legacy set: 7.86% on 151 questions
Best subdomain: Texas political approval at 3.43% (n=18)

What this is and is not

These numbers describe marginal distribution accuracy on real survey items administered as of each instrument's field date. They do not claim joint-distribution fidelity, individual-level prediction, or replacement of probability-sample polling for high-stakes inference.

Full item-level results and proprietary system internals are not published. See our transparency boundary note for what we disclose publicly versus under NDA.

Product validation

Customer-facing benchmark tables and live predictions are on lewsearch.com/methodology. Due diligence materials available on request.