On Sat, Oct 24, 2020 at 7:30 PM JuanPablo AJ <[email protected]> wrote:


> I have some doubts related to the HTTP client.
>

First, if you have unexplained efficiency concerns in a program, you should
profile and instrument. Make the system tell you what is happening rather
than making guesses as to why. With that said, I have some hunches and
experiments you might want to try out.

When you perform a load test, you have a SUT, or system-under-test. That is
the whole system, including infrastructure around it. I can be a single
program, or a cluster of machines. You also have a load generator, which
generates load on your SUT in order to test different aspects of the SUT:
bandwidth usage, latency in response, capacity limits, resource limits,
etc[1]. Your goal is to figure out if the data you are seeing are within an
acceptable range for your use case, or if you have to work more on the
system to make it fall within the acceptable window.

Your test is about RTT latency of requests. This will become important.

One particular problem in your test is that the load generator and the SUT
runs in the same environment. If the test is simple and you are trying to
stress the system maximally, chances are that the load generator impacts
the SUT. That means the latency will rise due to time sharing in the
operating system.

Second, when measuring latency you should look out for the problem Gil Tene
coined as "coordinated omission". In CO, the problem is that the load
generator and the SUT cooperates in order to deliver the wrong latency
counts. This is especially true if you just fire as many requests as
possible on 50 connections. Under an overload situation, the system will
suffer in latency since that is the only way the system can alleviate
pressure. The problem with CO is that a server can decide to park a couple
of requests and handle the other requests as fast as possible. This can
load to a high number of requests on the active connections, and the
stalled connections become noise in the statistics. You can look up Tene's
`wrk2` project, but I think the ideas were baked back into Will
Glozers wrk at a later point in time (memory eludes me).

The third point is about the sensitivity of your tests: when you measure
things at the millisecond, microsecond or nanosecond range, your test
becomes far more susceptible to foreign impact. You can generally use
statistical bootstrapping to measure the impact this has on test variance,
which I've done in the past. You start finding all kinds of interesting
corner cases that perturb your benchmarks. Among the more surprising ones:

* CPU Scaling governors
* Turbo boosting: one core can be run at a higher clock frequency than a
cluster. GC in Go is multicore, so even for a single-core program, this
might have an effect
* CPU heat. Laptop CPUs have miserable thermal cooling compared to a server
or desktop. They can run fast in small bursts, but not for longer stretches
* Someone using the computer while doing the benchmark
* An open browser window which runs some Javascript in the background
* An open electron app with a rendering of a .gif or .webm file
* Playing music while performing the benchmark, yielding CPU power to the
MP3, Vorbis or AAC decoder
* Amount of incoming network traffic to process for a benchmark that has
nothing to do with the network

Finally, asynchronous goroutines are still work the program needs to
execute. It isn't free. So as the system is stressed with a higher load you
run higher against the capacity limit, thus incurring slower response
times. In the case where you perform requests in the background to another
HTTP server, you are taking a slice of the available resources. You are
also generating as much work internally as is coming in externally. In a
real world server, this is usually a bad idea and you must put a resource
limit in place. Otherwise an aggressive client can overwhelm your server.
The trick is to slow the caller down by *not* responding right away if you
are overloaded internally.

You should check your kernel. When you perform a large amount of requests
on the same machine, you can run into limits in the number of TCP source
ports if they are rotated too fast. It is a common problem when the load
generator and SUT are on the same host.

You should check your HTTP client configuration as well. One way to avoid
the above problem is to maximize connection reuse, but then you risk
head-of-line blocking on the connections, even (or perhaps even more so) in
the HTTP/2 case.

But above all: instrument, profile, observe. Nothing beats data and plots.

[1] SLI, SLOs etc. A good starting point is
https://landing.google.com/sre/sre-book/chapters/service-level-objectives/
but that book is worth it for a full read.
https://landing.google.com/sre/books/ too!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGrdgiXFet9iZCjT5CDwrb9FX_-n%3DrhP1oMXoiokbTJ12GoUzg%40mail.gmail.com.

Reply via email to