How we judge whether an alpha actually works
We generate alpha, run it through a fixed evaluation pipeline, and publish only what survives. The right yardstick depends on what a strategy actually exploits — so the two families of alpha we trade are graded on two different scorecards, each net of cost, out-of-sample, deflated for multiple testing, and blocked from capital until capacity, attribution, robustness and monitoring are documented.
Cross-sectional alpha
Ranks a large universe at each date; the edge is relative — which names beat which. Graded on cross-sectional IC, monotone quantile spreads, a simple long/short book, and overlays for turnover, capacity, crowding and multiple testing.
Time-series forecast alpha
Forecasts each asset's own future return, traded as a concentrated long/short book of a handful of names. The signal is the position — so the grade is the deflated, net-of-cost out-of-sample P&L of that book.
◆ The golden rule
Measure the quantity the strategy harvests, then haircut it as an investor would. A ranker earns from ordering names correctly at a point in time; a forecaster earns from calling each asset's direction over time. Apply one family's scorecard to the other and you get noise dressed as a number. The only fair comparison is after each candidate clears its native scorecard, net of costs, capacity, deflation and a locked out-of-sample review.
What this guide covers
The first 2 are open to everyone; the rest unlock when your account is activated.
Foundations
Two families of alpha, two scorecards
An alpha starts as one number — a score per asset per rebalance date. It is not yet a trade, not yet a portfolio, not yet a capacity claim. How we turn that score into a verdict splits cleanly by what the signal exploits.
Left: the ranker's skill is “did high scores out-rank low scores today?” Right: the timer's skill is “did my forecast call each asset's own path?” Different questions, different scorecards.
◆ Why the distinction is load-bearing
The Fundamental Law of active management, , makes the failure precise. For a ranker, breadth = names and IC = cross-sectional rank correlation. For a timer, breadth = independent time periods and the relevant correlation is over time — and effective breadth collapses under serial and cross-asset correlation, . Apply the wrong one and the headline number is meaningless.
Activate to continue
See exactly how every alpha is judged
You've seen the two families and why each needs its own scorecard. The full evaluation framework — the gates, the metrics, and the interactive labs — unlocks once your account is activated.
- The PM decision standard and the eight review gates
- The full cross-sectional scorecard — rank IC, quantile spreads and thresholds
- Six interactive labs: IC, deflated Sharpe, cost, breadth, expectancy and annualization
- The three-tier time-series verdict and forecast-quality diagnostics
- Costs, capacity curves and the frequency rulebook, plus the full glossary
Checking your access…

