Put a small proxy beside every service so the network — retries, mTLS, routing, metrics — is handled outside your app code.
In a microservice fleet, every service needs the same network plumbing: encrypt traffic, retry a flaky peer, balance across replicas, emit metrics. Writing that into each app — in each language — is repetitive and drifts out of sync.
A service mesh moves it out of the app. A sidecar proxy (typically Envoy) runs next to each service instance. Calls never go app→app directly; they go app→local sidecar→remote sidecar→app. The sidecars (the data plane) do mTLS, load balancing, and retries; a control plane tells them how.
The app code only ever made one plain call to payments. Everything green — encryption, the upstream choice, the retry — happened in the sidecars, invisible to both apps.
Each pod runs the app and a sidecar side by side. Inbound and outbound traffic is transparently redirected (via iptables or eBPF) into the local sidecar, so the app makes an ordinary http://payments/charge call and never knows a proxy exists. The outbound sidecar resolves payments to a set of healthy endpoints, opens an mTLS connection to the chosen peer’s sidecar — both sides present and verify certificates — load-balances, and retries on failure. The remote sidecar terminates mTLS and forwards over loopback to its app.
The control plane (e.g. Istiod) pushes config to every sidecar. A route/retry policy looks like:
# control plane → every sidecar (Envoy) for the "payments" service
route:
destination: payments # logical name, not an IP
load_balancer: round_robin # spread across healthy endpoints
retry_policy:
retry_on: connect-failure,5xx,reset
num_retries: 2 # try up to 2 other endpoints
per_try_timeout: 250ms
tls:
mode: ISTIO_MUTUAL # mTLS: both proxies present a cert
# certs are issued + auto-rotated by the control plane (SPIFFE identity)
outlier_detection: # eject an endpoint that keeps failing
consecutive_5xx: 5
base_ejection_time: 30s
None of this lives in the application. Change the retry count or turn on mTLS and the apps redeploy nothing — the control plane reconfigures the sidecars in place.
| Concern | With a mesh | Without (in-app) |
|---|---|---|
| Latency | Extra hop in + out of each sidecar (often sub-millisecond, but real) | Direct app→app, no proxy hop |
| Resource cost | One sidecar per pod: extra CPU + memory, fleet-wide | No per-pod proxy overhead |
| mTLS | Uniform, auto-rotated certs, on by default | Hand-rolled per service / language; easy to skip |
| Retries & LB | Declarative policy, consistent everywhere | Re-implemented in every client library |
| Observability | Golden metrics + tracing headers without app work | Each app instruments itself |
| Operational load | A whole control plane + sidecars to run and upgrade | Fewer moving parts to operate |
The bargain: you accept an extra hop and per-pod overhead to get uniform mTLS, retries, load balancing, and metrics without touching app code or libraries. That pays off most when you have many services in many languages.
Orders calls payments to charge a card. Payments runs two replicas, B1 and B2. During a deploy, B1 is briefly flapping and resets connections.
orders app: POST http://payments/charge # one plain call, no TLS code
sidecar A: resolve "payments" → [B1, B2]
pick B1 (round robin)
mTLS handshake with sidecar B1 … RESET # B1 is flapping
retry_policy fires: num_retries=2
pick B2 (next healthy endpoint)
mTLS handshake with sidecar B2 … OK # certs verified
send encrypted request → sidecar B2
sidecar B2: terminate mTLS, forward over loopback → payments app
payments app: charge card → 200 OK
response retraces the path back to orders
orders app: got 200 OK in ~30ms # never saw the reset, retry, or TLS
Outlier detection then ejects B1 for 30s so later calls skip it entirely. The orders team changed nothing and shipped no client-retry code; the mesh absorbed the flap.
Where does the mutual-TLS encryption actually happen in a sidecar mesh?
The upstream endpoint B1 resets the connection. With a retry policy in the mesh, what does the calling app observe?