On-callHardoc-g449

Subject Gc pausesLevel Senior–Staff~40 minCommon in Algorithms & data structures interviewsIndustries Technology, Software development

Question

A JVM cache/search service runs G1GC with a large 64GB heap holding a big in-memory index plus a request-scoped working set. Young/mixed GC pauses are fine (sub-20ms). But a few times an hour, p99 spikes to 2-4s and GC logs show a full GC or a long concurrent-cycle-then-Full-GC, often preceded by 'to-space exhausted' / 'Humongous Allocation' messages and a climbing 'Humongous regions' count. The service occasionally builds large byte[] buffers (multi-MB serialized result pages) per request. Heap usage is high but not growing without bound (no leak). How do you triage and mitigate?

What a strong answer looks like

Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.

Learn the concepts

Diagram & narrate the incident

Loading whiteboard…

Run or narrate your approach, then ask the coach.