Shadow: Parallel Truth Evaluation

Compare legacy and candidate implementations side by side before rollout. Never trust a refactor until the new code proves itself.

Run	Legacy vs Candidate	Samples	Match	Critical	Latency	Verdict
shd_a14e226d	isc-scoringvsisc-scoring-v2	50	96%	0%	2ms → 3ms	SAFE
shd_b29f334e	file-scoringvsfile-scoring-weighted	120	72%	8%	15ms → 12ms	HOLD
shd_c33d445f	assembly-pipelinevsassembly-pipeline-v3	200	45%	22%	45ms → 120ms	UNSAFE

How Shadow Works

1. Run both paths

Execute legacy and candidate on identical inputs with per-sample timeout.

2. Compare deterministically

5 comparators: output, sufficiency, ranking, normalization, latency.

3. Gate the rollout

CI-grade thresholds. Exit 0 = safe. Exit 10 = blocked.