Package: golang-1.24-go
Version: 1.24.2-1
Severity: grave
Tags: upstream
Justification: causes non-serious data loss

Since the update to trixie on last Friday, we started experiencing
random failures on both podman and docker.io either while pulling
images, or while running containers.

This is due to https://github.com/golang/go/issues/73141.

We applied locally the patch at
https://go-review.googlesource.com/c/go/+/662496 and this fixed the
problem (by recompiling golang-go + podman + docker.io).

Some other reports upstream:

* https://github.com/moby/moby/issues/49513
* https://github.com/NixOS/nixpkgs/issues/392815
* https://bugzilla.suse.com/show_bug.cgi?id=1240764

We believe all these bugs to have the same root cause. We used
systemd-coredump to check the stacktraces upon crash, and they match
line-by-line what is seen in the bug reports.

For instance, in the logs, we see for docker:

Apr 16 07:00:34 de013-sh0011 gitlab-runner[1134]: WARNING: Failed to pull image 
with policy "always": unexpected EOF (manager.go:254:5s)  job=9735131137 projec>
Apr 16 07:00:34 de013-sh0011 gitlab-runner[1134]: Attempt #2: Trying 
"if-not-present" pull policy     job=9735131137 project=17731405 
runner=UiZzAk_dx
Apr 16 07:00:34 de013-sh0011 systemd[1]: docker.service: Main process exited, 
code=dumped, status=11/SEGV
Apr 16 07:00:34 de013-sh0011 systemd[1]: docker.service: Failed with result 
'core-dump'.
Apr 16 07:00:34 de013-sh0011 systemd[1]: docker.service: Consumed 2min 8.106s 
CPU time, 4.1G memory peak, 4K memory swap peak.
Apr 16 07:00:37 de013-sh0011 systemd[1]: docker.service: Scheduled restart job, 
restart counter is at 6.

And for podman:

Apr 16 07:48:18 de013-sh0011 gitlab-runner[1134]: WARNING: Job failed: failed 
to pull image "artifacts/charging-oci-bender.de-staging-gruenberg/be>
Apr 16 07:48:18 de013-sh0011 gitlab-runner[1134]:   duration_s=47.391613933 
job=9735395252 project=17731405 runner=t2_zpVfGn
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: Main process 
exited, code=killed, status=11/SEGV
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: Failed with result 
'signal'.
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: Unit process 144474 
(pasta.avx2) remains running after unit stopped.
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: Consumed 1min 
6.928s CPU time, 4.6G memory peak.
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: Found left-over 
process 144474 (pasta.avx2) in control group while starting unit. Ignoring.
Apr 16 07:48:18 de013-sh0011 systemd[1353]: podman.service: This usually 
indicates unclean termination of a previous run, or service implementation 
deficiencie>

Additionally, for podman, this results in all further containers failing
to start because some leftover files in
/var/run/user/<UID>/containers/networks/rootless-netns

The only resort is either a reboot, or manually cleaning up in /var/run
spurious files and restarting the podman service.

This happens multiple times per day.

Please be so kind as to include the patch from upstream in the Go
compiler, and then trigger a binary rebuild of dependent packages.

* podman version: 5.4.2+ds1-1
* docker.io version: 26.1.5+dfsg1-9+b2

Thanks!

-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.21-amd64 (SMP w/192 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages golang-1.24-go depends on:
pn  golang-1.24-src  <none>

Versions of packages golang-1.24-go recommends:
ii  g++        4:14.2.0-1
ii  gcc        4:14.2.0-1
ii  libc6-dev  2.41-6
pn  pkgconf    <none>

Versions of packages golang-1.24-go suggests:
pn  bzr | brz        <none>
ii  ca-certificates  20241223
ii  git              1:2.47.2-0.1
pn  mercurial        <none>
pn  subversion       <none>

Reply via email to