Question
Long-lived connections from clients to a streaming/notification service silently die after a few minutes of inactivity: the client thinks the connection is open and waits forever for the next push, but pushes never arrive and no error is raised until much later. It only affects clients sitting behind certain NAT gateways / cloud egress; clients on other paths are fine. Captures at the server show it sends a push after a quiet period and never gets an ACK; the client never received the push. There's no app keepalive / TCP keepalive configured, and a network change introduced a new NAT/stateful gateway in the affected path. Triage and fix.
Stop the bleeding first (mitigate), then form hypotheses from real signals. Separate root cause from symptom, communicate status as you go, and close with what prevents a repeat.