Judge Engine

Judge Engine defined: the conservative evidence gate that reviews quant strategy candidates using DSR, PBO, walk-forward validation, costs, data lineage, and human approval.

The Judge Engine is Corrai's conservative promotion layer. It reviews a strategy candidate after the research workflow has produced evidence. Its job is not to create more variants or improve the backtest. Its job is to decide whether the candidate survives the required validation gates.

Related search phrases include quant strategy validation engine, AI backtest judge, strategy approval workflow, alpha evidence gate, and automated backtest review.

Why a Judge exists

Research systems are naturally optimistic. Researchers want ideas to work. AI agents can generate plausible explanations. Backtests can reward repeated searching. A Judge exists to counter that pressure.

The Judge is intentionally skeptical. It asks whether the candidate's evidence is strong enough after accounting for the known ways backtests mislead:

selection bias and multiple testing
look-ahead leakage
missing costs and same-bar fills
regime concentration
fragile parameters
incomplete data lineage
missing human review

For the broader failure modes, see Why Backtests Lie.

What the Judge reads

A Judge verdict should be based on an evidence package, not a summary chart. The package includes:

hypothesis and registered run id
full trial history
data source and point-in-time lineage
workflow graph and parameter set
cost and execution model
walk-forward and purged validation results
DSR and PBO diagnostics
risk, turnover, drawdown, and regime checks
reviewer notes and signoff status

If one of these pieces is missing, the Judge can block promotion because the evidence is incomplete.

Verdicts are explanations

A useful Judge does not only return pass or fail. It explains the blocking reason. Examples:

blocked because DSR is below threshold after registered trial count
blocked because walk-forward performance is concentrated in one window
blocked because the data uses revised values without availability timestamps
blocked because the strategy survives gross but fails after declared costs
blocked because human review is missing

These verdicts help agents and researchers decide what to do next. Sometimes the next action is to repair a data issue. Sometimes it is to reduce the search scope. Often it is to abandon the idea.

Judge versus optimizer

The Judge is not an optimizer. If the Judge adjusts parameters until the result passes, it becomes part of the search and destroys its own independence. Corrai keeps generation and judgment separate: agents and Alpha Canvas produce candidates; the Judge reviews them.

This separation is central to evidence-based alpha validation. It lets the system support fast AI research without turning validation into another overfitting loop.