Hi Thomas,
On 3/17/25 16:50, Thomas Lange wrote:
In setup-storage, we call
udevadm settle --timeout=10
before we determine the list of disks like this:
# read the sizes and partition tables of all disks listed in $FAI::disks
&FAI::get_current_disks;
Maybe the timeout is too short for your hardware?
I have checked with our vendor and during testing their next to last
step during burn-in testing, is running
dd if=/dev/zero of=/dev/$h bs=1M count=4096
where $h is taken from
lsblk -ido Name,Type,Model | grep disk | cut -d " " -f 1
followed by running `sync` just before powering the machine off via sysrq.
So, maybe the power off happened to quickly and the NVMe had still some
writing to do but the `sync` call returned earlier? Frankly, I don't know.
So far, I have failed to reproduce the problem myself on one of the
nodes where this happened. Given that it only affected less than 5% of
the delivered nodes, at the moment I would probably attribute this issue
to the burn-in tests and not to fai-client. What do you think?
Thus, feel free to close and sorry for the noise!
Cheers
Carsten