Quant Research · Institutional Evaluation Standard

How we judge whether an alpha actually works

We generate alpha, run it through a fixed evaluation pipeline, and publish only what survives. The right yardstick depends on what a strategy actually exploits — so the two families of alpha we trade are graded on two different scorecards, each net of cost, out-of-sample, deflated for multiple testing, and blocked from capital until capacity, attribution, robustness and monitoring are documented.

Alpha families

Evidence tiers

Defined metrics

Interactive labs

Cross-sectional alpha

Ranks a large universe at each date; the edge is relative — which names beat which. Graded on cross-sectional IC, monotone quantile spreads, a simple long/short book, and overlays for turnover, capacity, crowding and multiple testing.

Time-series forecast alpha

Forecasts each asset's own future return, traded as a concentrated long/short book of a handful of names. The signal is the position — so the grade is the deflated, net-of-cost out-of-sample P&L of that book.

◆ The golden rule

Measure the quantity the strategy harvests, then haircut it as an investor would. A ranker earns from ordering names correctly at a point in time; a forecaster earns from calling each asset's direction over time. Apply one family's scorecard to the other and you get noise dressed as a number. The only fair comparison is after each candidate clears its native scorecard, net of costs, capacity, deflation and a locked out-of-sample review.

What this guide covers

The first 2 are open to everyone; the rest unlock when your account is activated.

Foundations

Two families of alpha, two scorecards

An alpha starts as one number — a score per asset per rebalance date. It is not yet a trade, not yet a portfolio, not yet a capacity claim. How we turn that score into a verdict splits cleanly by what the signal exploits.

Left: the ranker's skill is “did high scores out-rank low scores today?” Right: the timer's skill is “did my forecast call each asset's own path?” Different questions, different scorecards.

◆ Why the distinction is load-bearing

The Fundamental Law of active management, , makes the failure precise. For a ranker, breadth = names and IC = cross-sectional rank correlation. For a timer, breadth = independent time periods and the relevant correlation is over time — and effective breadth collapses under serial and cross-asset correlation, . Apply the wrong one and the headline number is meaningless.

Activate to continue

See exactly how every alpha is judged

You've seen the two families and why each needs its own scorecard. The full evaluation framework — the gates, the metrics, and the interactive labs — unlocks once your account is activated.

The PM decision standard and the eight review gates
The full cross-sectional scorecard — rank IC, quantile spreads and thresholds
Six interactive labs: IC, deflated Sharpe, cost, breadth, expectancy and annualization
The three-tier time-series verdict and forecast-quality diagnostics
Costs, capacity curves and the frequency rulebook, plus the full glossary

Checking your access…