An update.
The VM on which I am running has 40 vCPUs.
I changed my GOMAXPROCS to a lower value of 8.
And also changed my client to use 90 HTTP/2 connections towards the HTTP/2
server.
With this setup, I can now achieve about 9000 Txns per second.
However, if I go any higher than that, my client starts throwing the
"context deadline exceeded" errors. I am sure that the responses are
received by the client well within the timeout of 7 seconds.
When I observe the CPU utilization of my client process, it only uses about
300% vCPU i.e. 3 out of 40 vCPUs.
Still, I don't understand why I cannot achieve higher throughput than 9000
per second.
Here is a sample output from the *atop* utility. As seen below, only 257%
shows as CPU utilization.
Is this because goroutines are not getting context switched onto a logical
processor or thread. I thought if there is so much CPU available, then
context switching should be very fast and not an issue at all.
Thus leading to timeouts ?
ATOP - myserver 2023/05/05 22:16:19
y--------P----
10s elapsed
PRC | sys 6.63s | user 21.56s | | #proc 543 | #trun
4 | #tslpi 639 | | #tslpu 0 | #zombie 0 | clones
11 | | no procacct |
CPU | sys 61% | user 213% | irq 4% | idle 3724% | wait
0% | steal 0% | guest 0% | | ipc notavail | cycl
unknown | curf 3.00GHz | curscal ?% |
CPL | avg1 2.28 | avg5 2.40 | | avg15 2.45 |
| | csw 656705 | intr 515996 | |
| numcpu 40 | |
MEM | tot 503.7G | free 468.6G | cache 26.9G | dirty 0.1M | buff
1.0G | slab 1.4G | slrec 1.2G | shmem 4.1G | shrss 117.0M | vmbal
0.0M | hptot 0.0M | hpuse 0.0M |
SWP | tot 4.0G | free 4.0G | | |
| | | | |
| vmcom 6.0G | vmlim 255.8G |
LVM | g_vd0-lv_var | busy 0% | read 0 | | write
158 | KiB/r 0 | KiB/w 4 | MBr/s 0.0 | MBw/s 0.1 |
| avq 5.00 | avio 0.01 ms |
DSK | sda | busy 0% | read 0 | | write
122 | KiB/r 0 | KiB/w 5 | MBr/s 0.0 | MBw/s 0.1 |
| avq 2.00 | avio 0.01 ms |
NET | transport | tcpi 181070 | tcpo 289884 | udpi 18 | udpo
18 | tcpao 0 | tcppo 26 | tcprs 556 | tcpie 0 | tcpor
1 | udpnp 0 | udpie 0 |
NET | network | ipi 181092 | ipo 201839 | | ipfrw
0 | deliv 181091 | | | |
| icmpi 0 | icmpo 0 |
NET | bond1 0% | pcki 181057 | pcko 290688 | sp 20 Gbps | si 16
Mbps | so 190 Mbps | coll 0 | mlti 0 | erri 0 | erro
0 | drpi 0 | drpo 0 |
NET | net2 0% | pcki 181056 | pcko 290688 | sp 20 Gbps | si 16
Mbps | so 190 Mbps | coll 0 | mlti 0 | erri 0 | erro
0 | drpi 0 | drpo 0 |
PID TID SYSCPU USRCPU VGROW RGROW
RUID EUID ST EXC THR S
CPUNR CPU CMD 1/1
40753 - 4.08s 21.52s 0K -11.7M
mtx mtx -- - 15 S
20 *257% 5gclient*
40753 40753 0.00s 0.00s 0K -11.7M
mtx mtx -- - 1 S
20 0% 5gclient
40753 40754 0.00s 0.00s 0K -11.7M
mtx mtx -- - 1 S
5 0% 5gclient
40753 40755 0.00s 0.00s 0K -11.7M
mtx mtx -- - 1 S
3 0% 5gclient
40753 40756 0.00s 0.00s 0K -11.7M
mtx mtx -- - 1 S
21 0% 5gclient
40753 40757 0.45s 2.35s 0K -11.7M
mtx mtx -- - 1 S
6 28% 5gclient
40753 40758 0.44s 2.31s 0K -11.7M
mtx mtx -- - 1 S
20 28% 5gclient
40753 40759 0.44s 2.52s 0K -11.7M
mtx mtx -- - 1 S
0 30% 5gclient
40753 40760 0.36s 1.80s 0K -11.7M
mtx mtx -- - 1 S
28 22% 5gclient
40753 40761 0.41s 2.04s 0K -11.7M
mtx mtx -- - 1 S
9 25% 5gclient
40753 40762 0.48s 2.46s 0K -11.7M
mtx mtx -- - 1 R
1 30% 5gclient
40753 40763 0.00s 0.00s 0K -11.7M
mtx mtx -- - 1 S
23 0% 5gclient
40753 40764 0.45s 2.48s 0K -11.7M
mtx mtx -- - 1 S
2 29% 5gclient
40753 40765 0.21s 1.17s 0K -11.7M
mtx mtx -- - 1 S
1 14% 5gclient
40753 40766 0.45s 2.46s 0K -11.7M
mtx mtx -- - 1 S
5 29% 5gclient
40753 41140 0.39s 1.90s 0K -11.7M
mtx mtx -- - 1 R
29 23% 5gclient
On Friday, 5 May 2023 at 08:33:12 UTC+10 envee wrote:
> Hi All,
> I have an application which sends HTTP requests at a rate of say 6000 per
> second. In addition, my application also has about 100K goroutines and
> these 6000 requests are basically issued by a subset of these goroutines.
>
> When I increase the rate to more than 6000, I get the error :
> "context deadline exceeded (Client.Timeout exceeded while awaiting
> headers)"
>
> From a PCAP trace, I have seen that the server does respond within the
> timeout (which I've set to 7s), but the application is not able to probably
> finish reading the response.
>
> From the code of the HTTP client, I see that there is a timer/ticker which
> is run in the background for every request.
>
> My understanding is that the goroutine which issued the HTTP request was
> not able to get scheduled on a Processor/Thread and hence timed out.
>
> Possibly due to multiple goroutines becoming Runnable, but not being
> scheduled ?
>
> Is my understanding of why I see timeouts correct ?
>
> If a goroutine starts a timer/ticker, then if it gets taken off a P as it
> is waiting for a Network I/O response, will the timer also pause when it is
> removed from a P by the scheduler ?
>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/593823f5-1be6-4d60-9341-6fbc3da82c55n%40googlegroups.com.