Shadow: Parallel Truth Evaluation
Compare legacy and candidate implementations side by side before rollout. Never trust a refactor until the new code proves itself.
| Run | Legacy vs Candidate | Samples | Match | Critical | Latency | Verdict |
|---|---|---|---|---|---|---|
| shd_a14e226d | isc-scoringvsisc-scoring-v2 | 50 | 96% | 0% | 2ms → 3ms | SAFE |
| shd_b29f334e | file-scoringvsfile-scoring-weighted | 120 | 72% | 8% | 15ms → 12ms | HOLD |
| shd_c33d445f | assembly-pipelinevsassembly-pipeline-v3 | 200 | 45% | 22% | 45ms → 120ms | UNSAFE |
How Shadow Works
1. Run both paths
Execute legacy and candidate on identical inputs with per-sample timeout.
2. Compare deterministically
5 comparators: output, sufficiency, ranking, normalization, latency.
3. Gate the rollout
CI-grade thresholds. Exit 0 = safe. Exit 10 = blocked.