Hello,

Thanks for this! Have you made some statistics on the gain?

Milos Nikic, le mar. 02 sept. 2025 17:58:37 -0700, a ecrit:
> -/* Strong override: only FILE_DATA uses bulk; others keep per-page path.  */
> +/* Bulk write for DISK pager across [offset, offset + length).
> +   We always iterate block-by-block, build exactly one strictly-consecutive
> +   run per batch (stop at first gap or when hitting EXT2_BULK_MAX_BLOCKS),
> +   and flush via pending_blocks_write(), which coalesces into large writes.
> +   Invariant: we either make forward progress or return an error.
> +   Progress is reported page-aligned via *written (if non-NULL). */
> +static error_t
> +disk_pager_write_pages (vm_offset_t offset,
> +                     vm_address_t buf,
> +                     vm_size_t length, vm_size_t *written)
> +{
[...]
> +       /* If at/past end of device, we cannot progress -> error. */
> +       if (dev_off >= dev_end)

You'd want to add vm_page_size and use >, since we are not writing only one 
byte.

> +         {
> +           err = ENOSPC;
> +           break;
> +         }
> +
> +       block_t block = boffs_block (dev_off);
> +
> +       /* Stop this batch at the first non-consecutive block (don’t consume 
> it). */
> +       if (prev && block != prev + 1)
> +         break;
> +       prev = block;

Do you actually want to break on non-consecutive blocks? The loop is not
actually keeping a mutex locked, so we can just let pending_blocks_add
call pending_blocks_write on non-consecutive blocks as appropriate, and
have only one while loop, there is not even a need to limit to
EXT2_BULK_MAX_BLOCKS.

> +       if (modified_global_blocks)
> +         {
> +           /* Avoid overspray when FS block < page size. */
> +           if (test_bit (block, modified_global_blocks))
> +             err = pending_blocks_add (&pb, block);
> +           else
> +             err = pending_blocks_skip (&pb);
> +         }
> +       else
> +         {
> +           /* No mask: enqueue every block; writer will coalesce contiguous 
> LBAs. */
> +           err = pending_blocks_add (&pb, block);
> +         }
> +
> +       if (err)
> +         break;
> +
> +       built += block_size;
> +       blocks_built++;
> +     }
> +
[...]

> +  if (written)
> +    {
> +      vm_size_t w = done;
> +      if (w > length)
> +     w = length;
> +      /* libpager expects page-aligned progress. */
> +      w -= (w % vm_page_size);
> +      *written = w;
> +    }

Realizing: does it actually happen that we are writing more than length?
I don't think we want to silently cap that, but raise an assert,
otherwise we might be silently corrupting some data.

(and same in file_pager_write_pages)

> @@ -518,11 +640,17 @@ pager_write_pages (struct user_pager_info *pager,
>                     vm_size_t length,
>                     vm_size_t *written)
>  {
> -  /* libpager will just hand this off to the pager_write_page. */
> -  if (pager->type != FILE_DATA)
> -    return EOPNOTSUPP;
> +  switch (pager->type)
> +    {
> +      case FILE_DATA:
> +        return file_pager_write_pages (pager->node, offset, data, length, 
> written);
>  
> -  return file_pager_write_pages (pager->node, offset, data, length, written);
> +      case DISK:
> +        return disk_pager_write_pages (offset, data, length, written);
> +
> +      default:
> +        return EOPNOTSUPP;

I don't think we'd ever add another case, so you can keep the same if()
else as in pager_write_page.

Samuel

Reply via email to