On 10/22/24 03:13, Stefan Hajnoczi wrote:
> On Mon, Oct 21, 2024 at 09:32:50PM +0900, Damien Le Moal wrote:
>> On 10/21/24 20:08, Kevin Wolf wrote:
>>> Am 20.10.2024 um 03:03 hat Damien Le Moal geschrieben:
>>>> On 10/18/24 23:37, Kevin Wolf wrote:
>>>>> Am 04.10.2024 um 12:41 hat Sam Li geschrieben:
>>>>>> When the file-posix driver emulates append write, it holds the lock
>>>>>> whenever accessing wp, which limits the IO queue depth to one.
>>>>>>
>>>>>> The write IO flow can be optimized to allow concurrent writes. The lock
>>>>>> is held in two cases:
>>>>>> 1. Assumed that the write IO succeeds, update the wp before issuing the
>>>>>> write.
>>>>>> 2. If the write IO fails, report that zone and use the reported value
>>>>>> as the current wp.
>>>>>
>>>>> What happens with the concurrent writes that started later and may not
>>>>> have completed yet? Can we really just reset to the reported value
>>>>> before all other requests have completed, too?
>>>>
>>>> Yes, because if one write fails, we know that the following writes
>>>> will fail too as they will not be aligned to the write pointer. These
>>>> subsequent failed writes will again trigger the report zones and
>>>> update, but that is fine. All of them have failed and the report will
>>>> give the same wp again.
>>>>
>>>> This is a typical pattern with zoned block device: if one write fails
>>>> in a zone, the user has to expect failures for all other writes issued
>>>> to the same zone, do a report zone to get the wp and restart writing
>>>> from there.
>>>
>>> Ok, that makes sense. Can we be sure that requests are handled in the
>>> order they were submitted, though? That is, if the failed request is
>>> resubmitted, could the already pending next one still succeed if it's
>>> overtaken by the resubmitted request? Not sure if this would even cause
>>> a probem, but is it a case we have to consider?
>>
>> A zoned device will always handle writes in the order they were submitted 
>> (per
>> zone) and that is true for emulated devices as well as real ones.
> 
> Is there serialization code in the kernel so that zoned devices behind
> multi-path keep requests ordered?

Yes: the kernel only issues at most one write per zone at any time, to preserve
ordering. So there should be no issues at all.

> Normally I don't assume any ordering between concurrent requests to a
> block device, so I'm surprised that it's safe to submit multiple writes.

Correct, the normal case does not provide any guarantees. But writes to zoned
block devices are a special case. More on this here:

https://zonedstorage.io/docs/linux/sched


-- 
Damien Le Moal
Western Digital Research

Reply via email to