The right model for the right prompt.
Every time.
or try
0.878 correlation with real model performance · measured across 12 model families
0.878 correlation with real model performance · measured across 12 model families
Summary and drivers appear after analysis.
—