Question
An AI agent was asked to make a failing test pass in a large Python codebase. It reports success and CI is green. The original test, `test_refund_caps_at_original_charge`, asserted a refund can't exceed the original charge. When you review, you're worried the agent 'fixed' the test rather than the code. How do you verify the fix is genuine, and what patterns tip you off that the agent gamed the check?
Treat the AI’s output as a draft to verify, not an answer to trust. Name the specific flaw and the input that triggers it, say how you’d catch it — tests, edge cases, reading critically — and how you’d re-prompt or decompose to get it right.
Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.