Damien Le Moal <[email protected]> 于2022年9月11日周日 14:48写道:
>
> On 2022/09/11 15:33, Sam Li wrote:
> > Damien Le Moal <[email protected]> 于2022年9月11日周日 13:31写道:
> [...]
> >>> +/*
> >>> + * zone management operations - Execute an operation on a zone
> >>> + */
> >>> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs,
> >>> BlockZoneOp op,
> >>> + int64_t offset, int64_t len) {
> >>> +#if defined(CONFIG_BLKZONED)
> >>> + BDRVRawState *s = bs->opaque;
> >>> + RawPosixAIOData acb;
> >>> + int64_t zone_sector, zone_sector_mask;
> >>> + const char *zone_op_name;
> >>> + unsigned long zone_op;
> >>> + bool is_all = false;
> >>> +
> >>> + zone_sector = bs->bl.zone_sectors;
> >>> + zone_sector_mask = zone_sector - 1;
> >>> + if (offset & zone_sector_mask) {
> >>> + error_report("sector offset %" PRId64 " is not aligned to zone
> >>> size "
> >>> + "%" PRId64 "", offset, zone_sector);
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + if (len & zone_sector_mask) {
> >>
> >> Linux allows SMR drives to have a smaller last zone. So this needs to be
> >> accounted for here. Otherwise, a zone operation that includes the last
> >> smaller
> >> zone would always fail. Something like this would work:
> >>
> >> if (((offset + len) < capacity &&
> >> len & zone_sector_mask) ||
> >> offset + len > capacity) {
> >>
> >
> > I see. I think the offset can be removed, like:
> > if (((len < capacity && len & zone_sector_mask) || len > capacity) {
> > Then if we use the previous zone's len for the last smaller zone, it
> > will be greater than its capacity.
>
> Nope, you cannot remove the offset since the zone operation may be for that
> last
> zone only, that is, offset == last zone start and len == last zone smaller
> size.
> In that case, len is alwats smaller than capacity.
Ok, I was mixing opening one zone with opening several zones.
>
> >
> > I will also include "opening the last zone" as a test case later.
>
> Note that you can create such smaller last zone on the host with null_blk by
> specifying a device capacity that is *not* a multiple of the zone size.
>
> >
> >>> + error_report("number of sectors %" PRId64 " is not aligned to
> >>> zone size"
> >>> + " %" PRId64 "", len, zone_sector);
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + switch (op) {
> >>> + case BLK_ZO_OPEN:
> >>> + zone_op_name = "BLKOPENZONE";
> >>> + zone_op = BLKOPENZONE;
> >>> + break;
> >>> + case BLK_ZO_CLOSE:
> >>> + zone_op_name = "BLKCLOSEZONE";
> >>> + zone_op = BLKCLOSEZONE;
> >>> + break;
> >>> + case BLK_ZO_FINISH:
> >>> + zone_op_name = "BLKFINISHZONE";
> >>> + zone_op = BLKFINISHZONE;
> >>> + break;
> >>> + case BLK_ZO_RESET:
> >>> + zone_op_name = "BLKRESETZONE";
> >>> + zone_op = BLKRESETZONE;
> >>> + break;
> >>> + default:
> >>> + g_assert_not_reached();
> >>> + }
> >>> +
> >>> + acb = (RawPosixAIOData) {
> >>> + .bs = bs,
> >>> + .aio_fildes = s->fd,
> >>> + .aio_type = QEMU_AIO_ZONE_MGMT,
> >>> + .aio_offset = offset,
> >>> + .aio_nbytes = len,
> >>> + .zone_mgmt = {
> >>> + .zone_op = zone_op,
> >>> + .zone_op_name = zone_op_name,
> >>> + .all = is_all,
> >>> + },
> >>> + };
> >>> +
> >>> + return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, &acb);
> >>> +#else
> >>> + return -ENOTSUP;
> >>> +#endif
> >>> +}
>
> --
> Damien Le Moal
> Western Digital Research
>