[go-nuts] Re: 10x latency spikes during GC alloc assist phase

m.kevac via golang-nuts Thu, 27 Jul 2017 08:07:07 -0700

Hi. We have made a simple one line patch to Go that turns off GC assist. 
Unfortunately, it was the only way. I feel your pain.


$ git diff
diff --git a/src/runtime/malloc.go b/src/runtime/malloc.go
index 8850659748..a2c8697bf5 100644
--- a/src/runtime/malloc.go
+++ b/src/runtime/malloc.go
@@ -602,7 +602,7 @@ func mallocgc(size uintptr, typ *_type, needzero bool) 
unsafe.Pointer {
                }
                // Charge the allocation against the G. We'll account
                // for internal fragmentation at the end of mallocgc.
-               assistG.gcAssistBytes -= int64(size)
+               assistG.gcAssistBytes = 1024
 
                if assistG.gcAssistBytes < 0 {
                        // This G is in debt. Assist the GC to correct


Our CPU usage went down 
6x: https://twitter.com/mkevac/status/882289083132903424
Latency is now 
negligible: https://twitter.com/mkevac/status/882289349328596992 
https://twitter.com/mkevac/status/882289514437332992
And memory usage increase is not so 
big: https://twitter.com/mkevac/status/882289742901018625

On Tuesday, July 25, 2017 at 3:44:10 AM UTC+3, [email protected] wrote:
>
> Hi,
>
> We are experiencing a problem that I believe may be related to issue 14812 
> <https://github.com/golang/go/issues/14812> but I wanted to ask here 
> before adding to that case or filing a new issue. Of course, we’d also 
> greatly appreciate any advice about how to make our program performant.
>
> Here is what we observe: at Laserlike one of our core user-facing services 
> (the “leaf”) typically responds to a particular rpc in <400ms.  During GC 
> we see spikes in latency to >5s on some simple requests.  The stop the 
> world periods are short, so the GC spikes appear to happen at other times.
>
> We have been unable to replicate this in a simple program, but we did run 
> our code in a test mode that repros it.  In our test environment the server 
> loads ~10 GB of persistent data (never deleted so doesn’t really need to be 
> GCed), and we ask for 8 processors.  We are running go version 1.8.3 on 
> kubernetes on GCP machine of type n1-highmem-64.  To create the problem we 
> send the server a single request with >500 tasks..
>
>
> This google drive folder has leaf-logs-full.redacted.txt as well as other 
> performance tooling files 
> <https://drive.google.com/drive/folders/0BypqYg6r4ebqb0lQNnM5eWtISms?usp=sharing>.
>  
>  A snippet from that log here shows normal query responses and timing:
>
> I0719 22:50:22.467367 leaf.go:363] Worker #5 done search for '[redacted]', 
> took 0.013 seconds
>
> I0719 22:50:22.467406 leaf.go:225] Worker #5 starting search for 
> '[redacted]'
>
> I0719 22:50:22.467486 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.001 seconds
>
> I0719 22:50:22.467520 leaf.go:225] Worker #6 starting search for 
> '[redacted]'
>
> I0719 22:50:22.468050 leaf.go:363] Worker #9 done search for '[redacted]', 
> took 0.005 seconds
>
>
> We have observed that if a GC cycle happens to start while serving traffic 
> (which is often) AND there is a large amount of time spent in assist, then 
> our serving latency skyrockets by 10x.  In the log the slowdown commences 
> roughly when pacer assist starts at I0719 22:50:31.079283 and then reverts 
> to normal latencies shortly after the gc cycle completes at I0719 
> 22:50:36.806085.
>
> Below I copy parts of the log where we see latencies of up to 729ms on 
> tasks.  I also bold the line that shows 32929ms spent on alloc gc assist.
>
> We have captured an attached cpu profile during this time which seems to 
> confirm a large amount of time spent in runtime.gcAssistAlloc.func1.
>
>
> Pardon our ignorance about GC in golang, but our hypothesis about what may 
> be going wrong is that our large in-memory data structures are causing gc 
> to often go into assist mode, and that for reasons we don’t understand 
> malloc becomes expensive in that mode.  Since we also create ~100k new data 
> objects when processing user requests, we are guessing those allocs become 
> very slow.  Another piece of evidence for this hypothesis is that we have 
> another (prototype) implementation of this kind of service that makes more 
> use of object pools and doesn’t seem to have as much of slowdown during GC.
>
> Note on large in-memory data-structures:
>
> The principal data structures can be thought of as:
>
> Map[uint64][]byte (about 10M map entries, the slice lengths between 5K to 
> 50K) (around ~10G total memory usage) 
>
> Map[uint64][]uint64 (about 50M map entries, the slice lengths vary between 
> 10 and 100K, in a zipfian distribution, about 3G total memory usage)
>
> These data structures mostly stay as is over the life of the program.
>
> We are trying to figure out how to solve this so would appreciate any 
> advice. An engineer on our team wrote up the following ideas, none of which 
> are that great:
>
>    1. 
>    
>    Figure out a simple way to prevent our goroutines slowing down during 
>    GC.  I had some hopes LockOSThread() could be made to work, but it didn't 
>    seem to help in my experiments.  I'm not ruling this solution out 
> entirely, 
>    but if it's the write barriers that are the main problem, I don't have 
> much 
>    hope.
>    2. 
>    
>    Run at least 2 replicas of all our servers.  Manage their GC cycles 
>    ourselves, synchronized so that at most one replica is in GC at any given 
>    time.  The clients should either send all requests to both replicas (and 
>    cancel when one replies), or use some more complicated Kubernetes and 
>    client logic so a GCing replica is never sent requests.  This is the 
>    simplest solution that is likely to work, and doesn't require us to change 
>    our habits too much.  It just costs more. :)
>    3. 
>    
>    Refactor our current servers and program future servers with the 
>    explicit goal of reducing GC burden.  Give our servers more memory and 
>    increase SetGCPercent() so that Garbage Collection happens less 
> frequently. 
>     Use simpler data structures with fewer pointers to reduce the length of 
>    time the GC cycle lasts when it does run.  This isn't a perfect solution, 
>    because the GC will still hit us on occasion, but it reduces the harm.
>    4. 
>    
>    Throw out much of the convenience of Go and write our own 
>    allocators/shared pointers that index into a flat byte buffer, making no 
>    use of the GC.  (Either do this for particular objects, or just write a 
>    general solution.)  Make GC cycles so infrequent that we can either turn 
> GC 
>    off completely without going OOM, or at least run it only once per day, 
>    during off hours.  This seems very hard, and we'll be fighting with Go a 
>    lot.
>    5. 
>    
>    Implement the core serving data structures of our service in C++, 
>    paired off with another server written in Go that manages populating that 
>    server’s data -- the Go server would manage the data in the C++ server via 
>    rpc, and query it over rpc. 
>    6. 
>    
>    The nuclear option: decide that Go is simply not a suitable language 
>    for our serving stack.  Rewrite major servers in C++.
>    
>
>
> Right now #5 is our (terrible but only feasible-seeming) choice.
>
> Appreciate any advice!
>
> Steve Baker
>
>
>
>
> Logs with gc log lines:
>
>
> pacer: assist ratio=+3.155887e+001 (scan 2669 MB in 10072->10157 MB) 
> workers=2+0
>
> I0719 22:50:31.097707 leaf.go:363] Worker #3 done search for '[redacted]', 
> took 0.017 seconds
>
> I0719 22:50:31.098036 leaf.go:225] Worker #3 starting search for 
> '[redacted]'
>
> I0719 22:50:31.133959 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.069 seconds
>
> I0719 22:50:31.134300 leaf.go:225] Worker #6 starting search for 
> '[redacted]'
>
> I0719 22:50:31.142191 leaf.go:363] Worker #9 done search for '[redacted]', 
> took 0.066 seconds
>
> I0719 22:50:31.142275 leaf.go:225] Worker #9 starting search for 
> '[redacted]'
>
> I0719 22:50:31.173921 leaf.go:363] Worker #10 done search for 
> '[redacted]', took 0.098 seconds
>
> I0719 22:50:31.174009 leaf.go:225] Worker #10 starting search for 
> '[redacted]'
>
> I0719 22:50:31.328796 leaf.go:363] Worker #15 done search for 
> '[redacted]', took 0.260 seconds
>
> I0719 22:50:31.330194 leaf.go:225] Worker #15 starting search for 
> '[redacted]'
>
> I0719 22:50:31.395579 leaf.go:363] Worker #11 done search for 
> '[redacted]', took 0.374 seconds
>
>
> <SNIP>
>
> I0719 22:50:36.317004 leaf.go:363] Worker #5 done search for '[redacted]', 
> took 0.729 seconds
>
> I0719 22:50:36.317190 leaf.go:225] Worker #5 starting search for 
> '[redacted]'
>
> I0719 22:50:36.370191 leaf.go:363] Worker #15 done search for 
> '[redacted]', took 0.192 seconds
>
> I0719 22:50:36.371446 leaf.go:225] Worker #15 starting search for 
> '[redacted]'
>
> I0719 22:50:36.421953 leaf.go:363] Worker #7 done search for '[redacted]', 
> took 0.116 seconds
>
> I0719 22:50:36.422092 leaf.go:225] Worker #7 starting search for 
> '[redacted]'
>
> I0719 22:50:36.570008 leaf.go:363] Worker #12 done search for 
> '[redacted]', took 0.778 seconds
>
> I0719 22:50:36.572970 leaf.go:225] Worker #12 starting search for 
> '[redacted]'
>
> I0719 22:50:36.573571 leaf.go:363] Worker #6 done search for '[redacted]', 
> took 0.710 seconds
>
> I0719 22:50:36.573721 leaf.go:225] Worker #6 starting search for 
> '[redacted]'
>
> pacer: H_m_prev=8875421544 h_t=+1.900000e-001 H_T=10561751637 
> h_a=+1.983366e-001 H_a=10635742360 h_g=+2.000000e-001 H_g=10650505852 
> u_a=+9.735779e-001 u_g=+2.500000e-001 W_a=2354135120 goalΔ=+1.000000e-002 
> actualΔ=+8.336587e-003 u_a/u_g=+3.894312e+000
>
> gc 1072 @4337.737s 39%: 0.37+5688+0.32 ms clock, 2.2+32929/11310/21+1.9 ms 
> cpu, 10072->10143->8661 MB, 10157 MB goal, 8 P
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[go-nuts] Re: 10x latency spikes during GC alloc assist phase

Reply via email to