Hi golang-nuts, I've got a service that is seeing some confusing GC behavior. The behavior manifests as long request latencies when the GC runs. At first I thought this would just be stop-the-world behavior from the GC, but as I dig into it more, it's clear that the STW phase is actually quite short. It's actually that the concurrent phase is completely using all cores at 100% utilization while it runs. Because there's no CPU left to run the service itself, the service is unavailable during this window, which seems to be a few hundred ms. I'm not especially excited about the service being unavailable for this length of time, to say the least. It's worth noting that > 100ms is the worst case, maybe one out of every 10 GC runs, but all of them spike the CPU pretty hard for at least 10ms.
I've tried adjusting GOGC to a smaller number, but this just makes the cpu spikes come more frequently. I've been poking around pprof and it looks like the service has roughly 20M long-lived objects and ~80M garbage objects on each GC run. If it matters, this program has a pretty large LRU cache implemented with a map and a container/list. It's also got quite a few mutexes, though I'm not sure these meaningfully interact with Go's GC in any way. 1) Is this common behavior for Go's GC? 2) Is there possibly something about how the service allocates objects that's causing undue work, say perhaps an unexpected behavior during the mark phase? 3) Is it possible to convince Go to spread its concurrent phase out over more time? From the gc trace, it appears to be doing about 2s of work (wall clock), but it runs every 120s, so it feels like this work could be spread out in a way that doesn't interfere with the service. I'm compiled with Go 1.10.3, 4 cores. Here's a sample trace from gctrace GC forced pacer: assist ratio=+8.611156e-002 (scan 960 MB in 15690->21268 MB) workers=1++0.000000e+000 pacer: H_m_prev=11151043456 h_t=+9.500000e-001 H_T=21744534739 h_a=+4.765627e-001 H_a=16465215016 h_g=+1.000000e+000 H_g=22302086912 u_a=+9.186874e-001 u_g=+3.000000e-001 W_a=347113784 goalΔ=+5.000000e-002 actualΔ=-4.734373e-001 u_a/u_g=+3.062291e+000 gc 36 @1197.776s 1%: 0.59+2210+0.29 ms clock, 2.3+5915/2192/0+1.1 ms cpu, 15690->15702->10671 MB, 21268 MB goal, 4 P scvg7: inuse: 15834, idle: 107, sys: 15942, released: 0, consumed: 15942 (MB) pacer: sweep done at heap size 10851MB; allocated 179MB during sweep; swept 2026864 pages at +1.906868e-004 pages/byte Thank you, and thanks for all the great work on Go! -Brian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
