Hi Étienne,

On 03-12-2023 18:34, Étienne Mollier wrote:
1) why does it now suddenly start to (nearly always) fail across the
board on arm64 (in Debian, Ubuntu still seems fine), without changes to
the infrastructure that I know of?

I'm afraid I'm not sure what is up with shasta eating up more
memory on arm64 hosts of CI infrastructure.  What I can see from
my end is that the test roughly requires 8GiB of anonymous
memory to map for doing its job.

8GiB... that's not little, considering that that's what these hosts have as RAM (https://wiki.debian.org/ContinuousIntegration/WorkerSpecs).

Except that, this is already
the case for shasta in bookworm running on bookworm kernel, so
that doesn't look to be a regression per se.

Weird.

Per chance, could you double check the memory settings on the CI
hosts, just in case, to make sure that the swap didn't drop off
the machine?

  ci-worker-arm64-04: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-02: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-06: -rw------- 1 root root 3.9G May 26  2022 /swap
  ci-worker-arm64-03: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-05: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-11: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-07: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-08: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-09: -rw------- 1 root root 3.9G May 27  2022 /swap
  ci-worker-arm64-10: -rw------- 1 root root 3.9G May 27  2022 /swap

Or maybe check for memory overcommit settings
inconsistencies?

It's kbytes, memory, ratio == 0, 0, 50 across all our hosts.

Currently readable test logs suggest that:

   * ci-worker-arm64-10 met memory requirements in November,
   * ci-worker-arm64-07 did not meet requirements in October,
   * ci-worker-arm64-08 did not meet requirements in October,
   * ci-worker-arm64-03 did not meet requirements in October.

Those hosts should be equivalent. Be aware though that tests don't run in isolation. At the same time, on our arm64 hosts, one more test might be running. So what's *available* might not be constant in time.

Paul

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to