On 10/22/24 03:13, Stefan Hajnoczi wrote: > On Mon, Oct 21, 2024 at 09:32:50PM +0900, Damien Le Moal wrote: >> On 10/21/24 20:08, Kevin Wolf wrote: >>> Am 20.10.2024 um 03:03 hat Damien Le Moal geschrieben: >>>> On 10/18/24 23:37, Kevin Wolf wrote: >>>>> Am 04.10.2024 um 12:41 hat Sam Li geschrieben: >>>>>> When the file-posix driver emulates append write, it holds the lock >>>>>> whenever accessing wp, which limits the IO queue depth to one. >>>>>> >>>>>> The write IO flow can be optimized to allow concurrent writes. The lock >>>>>> is held in two cases: >>>>>> 1. Assumed that the write IO succeeds, update the wp before issuing the >>>>>> write. >>>>>> 2. If the write IO fails, report that zone and use the reported value >>>>>> as the current wp. >>>>> >>>>> What happens with the concurrent writes that started later and may not >>>>> have completed yet? Can we really just reset to the reported value >>>>> before all other requests have completed, too? >>>> >>>> Yes, because if one write fails, we know that the following writes >>>> will fail too as they will not be aligned to the write pointer. These >>>> subsequent failed writes will again trigger the report zones and >>>> update, but that is fine. All of them have failed and the report will >>>> give the same wp again. >>>> >>>> This is a typical pattern with zoned block device: if one write fails >>>> in a zone, the user has to expect failures for all other writes issued >>>> to the same zone, do a report zone to get the wp and restart writing >>>> from there. >>> >>> Ok, that makes sense. Can we be sure that requests are handled in the >>> order they were submitted, though? That is, if the failed request is >>> resubmitted, could the already pending next one still succeed if it's >>> overtaken by the resubmitted request? Not sure if this would even cause >>> a probem, but is it a case we have to consider? >> >> A zoned device will always handle writes in the order they were submitted >> (per >> zone) and that is true for emulated devices as well as real ones. > > Is there serialization code in the kernel so that zoned devices behind > multi-path keep requests ordered?
Yes: the kernel only issues at most one write per zone at any time, to preserve ordering. So there should be no issues at all. > Normally I don't assume any ordering between concurrent requests to a > block device, so I'm surprised that it's safe to submit multiple writes. Correct, the normal case does not provide any guarantees. But writes to zoned block devices are a special case. More on this here: https://zonedstorage.io/docs/linux/sched -- Damien Le Moal Western Digital Research