** Description changed: [ Impact ] Creating a TERMINATED_HTTPS listener in an amphora with >=32GB of memory triggers the OOM-killer during listener startup (and any subsequent `systemctl reload` of haproxy in the amphora). ``` os loadbalancer listener create --name thttps_xxlarge --protocol TERMINATED_HTTPS --protocol-port 443 --default-tls-container-ref <URL> --wait xxlarge1 ``` This was originally reported in a Caracal cloud using an Ubuntu 22.04 Amphora image. I've been able to reproduce this reliably in my lab using the latest devstack and an Ubuntu 24.04 Amphora image. + Workaround by setting a higher connection limit on one listener in + proportion to the 50000 default and the memory on the system. So for an + amphora with 32GiB of RAM, use --connection-limit 200000 for one + listener. [ Root Cause ] 454cff5 (in Zed+ IIUC) introduces the use of haproxy's `tune.ssl.cachesize` for TERMINATED_HTTPS listeners [1][2]. The commit does not make clear that during a reload of haproxy (SIGUSR2), the old worker process stays running until the new worker process is ready [3][4]. This means that two TLS session caches are allocated/held simultaneously during a reload of the service [5]. For small Amphorae, this works fine. The default connection limit is 50000, which takes enough of a chunk out of the 50% allocation that there is enough wiggle room for the new haproxy worker to allocate its cache and coexist with the old worker for some time. However, as the available memory in the system increases, the memory consumed by the session cache approaches 50%, and increases the worker's memory usage beyond 50% (as something else in the worker is also using memory in proportion to the configured cachesize). I tested 10 values of tune.ssl.cachesize in an amphora with 32GiB of RAM, reloading the haproxy service each time: - vsz here is the value reported by `ps -ax -o pid,vsz,rss,uss,pmem,args | grep haproxy` - overhead is `tune.ssl.cachesize_MiB - vsz_MiB - 261` - overhead% is `floor((overhead / tune.ssl.cachesize_MiB) * 100)` tune.ssl.cachesize | tune.ssl.cachesize_MiB | vsz | vsz_MiB | overhead | overhead% - 0 | 0 | 267416 | 261 | 0 | 0% - 7741606 | 1476 | 2142472 | 2092 | 355 | 24% - 15483212 | 2953 | 4017260 | 3923 | 709 | 24% - 23224818 | 4429 | 5892180 | 5754 | 1064 | 24% - 30966424 | 5906 | 7767100 | 7585 | 1418 | 24% - 38708030 | 7382 | 9642020 | 9416 | 1773 | 24% - 46449636 | 8859 | 11516940 | 11247 | 2127 | 24% - 54191242 | 10336 | 13391860 | 13077 | 2480 | 23% - 61932848 | 11812 | 15266780 | 14908 | 2835 | 24% - 69674454 | 13289 | 17141700 | 16739 | 3189 | 23% - 77416060 | 14765 | 19016744 | 18571 | 3545 | 24% + 0 | 0 | 267416 | 261 | 0 | 0% + 7741606 | 1476 | 2142472 | 2092 | 355 | 24% + 15483212 | 2953 | 4017260 | 3923 | 709 | 24% + 23224818 | 4429 | 5892180 | 5754 | 1064 | 24% + 30966424 | 5906 | 7767100 | 7585 | 1418 | 24% + 38708030 | 7382 | 9642020 | 9416 | 1773 | 24% + 46449636 | 8859 | 11516940 | 11247 | 2127 | 24% + 54191242 | 10336 | 13391860 | 13077 | 2480 | 23% + 61932848 | 11812 | 15266780 | 14908 | 2835 | 24% + 69674454 | 13289 | 17141700 | 16739 | 3189 | 23% + 77416060 | 14765 | 19016744 | 18571 | 3545 | 24% Note that this listener was not configured with a pool, so there was no load on the system when I gathered this data. As shown, haproxy to consumes additional memory proportional to the size of the TLS session cache. The allocation for the cache occurs at [6], referring to [7]. I verified the documentation's assertion that tune.ssl.cachesize is 200 bytes on amd64; sizeof(struct shared_block) is 48 bytes on the same hardware [8]. Octavia should allocate closer to 1/3 than 1/2 for the TLS session cache. I'll test and propose a patch against master shortly. [1] https://opendev.org/openstack/octavia/commit/454cff587ed10b5e504da93b074b77cb85055b13 [2] https://www.haproxy.com/documentation/haproxy-configuration-manual/new/2-8r1/#section-3.2.-tunesslcachesize [3] https://github.com/haproxy/haproxy/issues/217#issuecomment-544515990 [4] https://manpages.ubuntu.com/manpages/jammy/en/man1/haproxy.1.html [5] https://opendev.org/openstack/octavia/src/branch/master/octavia/amphorae/backends/agent/api_server/templates/systemd.conf.j2 [6] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/ssl_sock.c?h=applied/ubuntu/noble-devel#n5346 [7] https://git.launchpad.net/ubuntu/+source/haproxy/tree/src/shctx.c?h=applied/ubuntu/noble-devel#n300 [8] https://git.launchpad.net/ubuntu/+source/haproxy/tree/include/haproxy/shctx-t.h?h=applied/ubuntu/noble-devel#n38
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2119987 Title: haproxy reload triggers OOM-killer for TERMINATED_HTTPS loadbalancers To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2119987/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
