| Run | Date | Test |
| 23944013461 | 2026-04-03 11:20 |
pytest |
resultsclaude-haiku-4-5-20251001 |
resultsclaude-sonnet-4-6 |
resultsclaude-opus-4-6 |
| dev-record/full_workflow |
reportPASS, ABILITY: 31.6%; $0.1434 · 93s · 15t |
reportPASS, ABILITY: 31.6%; $0.3628 · 184s · 24t |
reportPASS, ABILITY: 31.6%; $0.3260 · 98s · 23t |
| generator-coding/library_generator-baseline |
reportPASS, ABILITY: 100.0%; $0.0824 · 40s · 8t |
reportPASS, ABILITY: 100.0%; $0.3368 · 114s · 7t |
reportPASS, ABILITY: 100.0%; $0.1647 · 45s · 7t |
| generator-coding/library_generator-with_skill |
reportPASS, ABILITY: 100.0%; $0.4010 · 163s · 36t |
reportPASS, ABILITY: 100.0%; $1.5305 · 464s · 38t |
reportPASS, ABILITY: 100.0%; $0.6425 · 152s · 27t |
| review-skill/review_finds_seeded_issues |
reportPASS, ABILITY: 100.0%; $0.0367 · 37s · 6t |
reportPASS, ABILITY: 100.0%; $0.1222 · 83s · 7t |
reportPASS, ABILITY: 100.0%; $0.1745 · 73s · 11t |
| review-steps/review_preserves_vocabulary |
reportPASS, ABILITY: 100.0%; $0.0191 · 13s · 4t |
reportPASS, ABILITY: 100.0%; $0.1601 · 54s · 9t |
reportPASS, ABILITY: 100.0%; $0.1058 · 36s · 9t |
| review-skill/review_skill |
— |
— |
— |
| review-steps/review |
— |
— |
— |
| total |
$0.6826 · 346s · 69t |
$2.5124 · 900s · 85t |
$1.4135 · 403s · 77t |
| 23685733627 | 2026-03-28 13:06 |
pytest |
resultsclaude-haiku-4-5-20251001 |
resultsclaude-sonnet-4-6 |
resultsclaude-opus-4-6 |
| dev-record/full_workflow |
report$0.1965 · 134s · 20t |
report$0.2246 · 122s · 18t |
report$0.3099 · 120s · 19t |
| generator-coding/library_generator-baseline |
— |
— |
— |
| generator-coding/library_generator-with_skill |
— |
— |
— |
| review-skill/review_finds_seeded_issues |
— |
— |
— |
| review-steps/review_preserves_vocabulary |
— |
— |
— |
| review-skill/review_skill |
report$0.0368 · 20s · 9t |
report$0.0881 · 70s · 6t |
report$0.1933 · 81s · 12t |
| review-steps/review |
report$0.0422 · 31s · 9t |
report$0.0305 · 13s · 4t |
report$0.1314 · 44s · 9t |
| total |
$0.2755 · 186s · 38t |
$0.3432 · 205s · 28t |
$0.6346 · 245s · 40t |
| 23567439585 | 2026-03-25 22:37 |
pytest |
resultsclaude-haiku-4-5-20251001 |
resultsclaude-sonnet-4-6 |
resultsclaude-opus-4-6 |
| dev-record/full_workflow |
report$0.1692 · 118s · 17t |
report$0.2762 · 130s · 18t |
report$0.3035 · 101s · 22t |
| generator-coding/library_generator-baseline |
— |
— |
— |
| generator-coding/library_generator-with_skill |
— |
— |
— |
| review-skill/review_finds_seeded_issues |
— |
— |
— |
| review-steps/review_preserves_vocabulary |
— |
— |
— |
| review-skill/review_skill |
report$0.0407 · 40s · 8t |
report$0.1258 · 88s · 9t |
report$0.1320 · 65s · 8t |
| review-steps/review |
report$0.0433 · 31s · 8t |
report$0.1216 · 56s · 11t |
report$0.1277 · 41s · 9t |
| total |
$0.2532 · 188s · 33t |
$0.5236 · 274s · 38t |
$0.5632 · 207s · 39t |
| 23401521620 | 2026-03-22 11:02 |
pytest |
resultsclaude-haiku-4-5-20251001 |
resultsclaude-sonnet-4-6 |
resultsclaude-opus-4-6 |
| dev-record/full_workflow |
report$0.1765 · 142s · 20t |
report$0.2539 · 102s · 20t |
report$0.3033 · 100s · 20t |
| generator-coding/library_generator-baseline |
— |
— |
— |
| generator-coding/library_generator-with_skill |
— |
— |
— |
| review-skill/review_finds_seeded_issues |
— |
— |
— |
| review-steps/review_preserves_vocabulary |
— |
— |
— |
| review-skill/review_skill |
report$0.0295 · 18s · 8t |
report$0.1390 · 114s · 8t |
report$0.1722 · 74s · 9t |
| review-steps/review |
report$0.0377 · 27s · 8t |
report$0.1326 · 70s · 10t |
report$0.1300 · 48s · 9t |
| total |
$0.2437 · 188s · 36t |
$0.5255 · 286s · 38t |
$0.6055 · 223s · 38t |