Question
An AI agent did a sweeping refactor: it replaced a hand-rolled `formatMoney(cents, currency)` helper with calls to a new `Money` value object across ~120 call sites in a Java billing codebase, claiming "behavior-preserving." The unit tests for `Money` pass. How do you verify the refactor actually preserved behavior at all 120 sites, given you can't read each one carefully, and what's the specific risk you're hunting for?
Treat the AI’s output as a draft to verify, not an answer to trust. Name the specific flaw and the input that triggers it, say how you’d catch it — tests, edge cases, reading critically — and how you’d re-prompt or decompose to get it right.
Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.