On Mon, 11 Nov 2024 11:22:26 +0100 Uwe Kleine-König wrote: [...] > Hello,
Hi Uwe, thanks for your followup. > > On Thu, Oct 31, 2024 at 07:53:52PM +0100, Francesco Poli (wintermute) wrote: [...] > > I filed this bug report against the Debian Linux kernel, in order > > to warn other users about this issue, and in order to ask the Debian > > Kernel Team to investigate the issue and/or to forward the bug report > > to the relevant upstream Linux kernel maintainers. > > > > Please do not reassign to package opensm with the intention of > > merging with bug [#1085300], unless you know for sure that the > > issue is in opensm and you know how to fix it. > > Please do not report multiple bugs for the same issue. The right(er) > thing to do is to make use of "affects". Now there are three bug reports > (2 for Debian and one upstream) and someone being aware of only one (or > two) of them, might miss some action which results in duplicate work. You are right, the "affects" field is the most appropriate means to show that a bug report against a given package also affects other packages. However, in this case, the lack of replies from opensm maintainers made me doubtful about the best possible course of action. Sorry about that. > > > Please help, I would very much like to run the head node with > > an up-to-date kernel! > > This is hard to act on without further input. Some questions to debug > this: > > I guess the kernel provides a directory "/sys/class/infiniband_mad". Do > its contents look different on 6.10.x and 6.11.x? I will look into this as soon as I can reboot the cluster head node. > > Can you please bisect the problem? [...] I have to find a time window where I can perform multiple reboots, which can result in a non-working InfiniBand network... It won't be easy, since the cluster has entered production and users keep launching jobs. Anyway, what I have done so far is: I have tried and rebuilt a Linux kernel image Debian package, following your instructions. After some failed attempts (due to missing dependencies and/or required tools), I think I succeeded, but I had to reply to a number of questions during the procedure: I have always replied with the default answer (by hitting [Enter]), I hope that was the right thing to do! Before I go on and try to install the resulting Debian package, could you please review the transcript of what I did (see the attached file)? Please bear with me, some of the questions were really obscure to me and I am not really familiar with the procedure: I think that the last time I rebuilt a Linux kernel image Debian package was some 15 years ago (I was still using the now-obsolete [kernel-package]!). [kernel-package]: <https://tracker.debian.org/pkg/kernel-package> Thanks for your time and for the help you are providing. -- http://www.inventati.org/frx/ There's not a second to spare! To the laboratory! ..................................................... Francesco Poli . GnuPG key fpr == CA01 1147 9CD2 EFDF FB82 3925 3E1C 27E1 1F69 BFFE
git_bisect.txt.gz
Description: application/gzip
pgpMYdcoCS2RM.pgp
Description: PGP signature