On Mon, 23 Jun 2025 09:36:05 -0400 Peter Xu <pet...@redhat.com> wrote:
> On Mon, Jun 23, 2025 at 02:51:46PM +0200, Igor Mammedov wrote: > > On Fri, 20 Jun 2025 12:53:06 -0400 > > Peter Xu <pet...@redhat.com> wrote: > > > > > On Fri, Jun 20, 2025 at 05:14:16PM +0200, Igor Mammedov wrote: > > > > This patch brings back Jan's idea [1] of BQL-free IO access, > > > > with a twist that whitelist read access only. > > > > > > > > (as BQL-free write access in [1] used to cause issues [2] > > > > and still does (Windows crash) if write path is not lock protected) > > > > > > Can we add some explanation on why it would fail on lockless writes? > > > > > > I saw that acpi_pm_tmr_write() is no-op, so I don't yet understand what > > > raced, and also why guest writes to it at all.. > > > > root cause wasn't diagnosed back then, and I haven't able to > > reproduce that as well. So I erred on side of caution and > > implemented RO only. > > Ah OK, I think I got that feeling it can be reproduced as above mentioned > "still does (Windows crash) if write ...". that is leftover from experiments with lockless split irqchip, as we need to use it with more then 255 vCPU, and then we are back BQL contention as every IO exit will also trigger taking BQL for non-in-kernel irqchip. So this series addresses unboottable Windows issue only upto 255 vCPU. If I manage to make split irqchip checks lockless, it will be a separate series on top. > > > > > Theoretically write should be fine too, but I don't have > > an idea how to test that. > > Then the question is how do we justify it will work this time.. > > If nobody can reproduce it anymore, there's indeed one way to go if we > strongly want to have the optimization, which is to apply it again and wait > for the reproducer to pop up once more. Just like to double check is this > the case, and we have no way to reproduce? I'd prefer to reproduce issue if possible, but if that won't workout it might be better to try and see it explodes elsewhere. Let's see if I could reproduce with old Seabios as per Gerd's suggestions. > I also wonder whether it's still a bit late because such experiment might > be better done at the start of release cycle. Now we have roughly 3 weeks > to soft-freeze (July 15). I had a look, last time it was pretty late when > reverting the change: > > 975eb6a547 (tag: v2.6.0-rc4) Update version for v2.6.0-rc4 release > 1beb99f787 Revert "acpi: mark PMTIMER as unlocked" > > So there's also the question of whether we should land this for this > release or next when open. I don't see the need to rush this, so +1 to the next cycle. > Gerd mentioned this in the relevant bz: > > Note: root cause for the initrd issue noted in comment 5 is seabios > running into problems with ehci -> io errors -> corrupted initrd. > Sometimes it doesn't boot at all, probably in case the io errors > happen to hit the kernel not the initrd. > > This seems to be the last piece of information we have had that is closest > to the root cause. I sincerely wished there's still some way to move > forward, as it looks really close, but it might be that it was just too > late for 2.6 so we didn't got time to keep looking back then. > > Thanks, >