Question
A test was flaky — failing ~1 in 20 CI runs. You asked an AI agent to fix it; it added a `sleep(500)` before the assertion and the test stopped failing in 50 local runs. The team wants to merge. As the reviewer, how do you determine whether this fix addresses the root cause or just lowers the failure rate — and what would you require before approving?
Treat the AI’s output as a draft to verify, not an answer to trust. Name the specific flaw and the input that triggers it, say how you’d catch it — tests, edge cases, reading critically — and how you’d re-prompt or decompose to get it right.
Vibe coding: describe the solution in plain language (or narrate it) and the coach grades your approach. Generating runnable code from your description is coming next.