Hello,
Thanks for this! Have you made some statistics on the gain?
Milos Nikic, le mar. 02 sept. 2025 17:58:37 -0700, a ecrit:
> -/* Strong override: only FILE_DATA uses bulk; others keep per-page path. */
> +/* Bulk write for DISK pager across [offset, offset + length).
> + We always iterate block-by-block, build exactly one strictly-consecutive
> + run per batch (stop at first gap or when hitting EXT2_BULK_MAX_BLOCKS),
> + and flush via pending_blocks_write(), which coalesces into large writes.
> + Invariant: we either make forward progress or return an error.
> + Progress is reported page-aligned via *written (if non-NULL). */
> +static error_t
> +disk_pager_write_pages (vm_offset_t offset,
> + vm_address_t buf,
> + vm_size_t length, vm_size_t *written)
> +{
[...]
> + /* If at/past end of device, we cannot progress -> error. */
> + if (dev_off >= dev_end)
You'd want to add vm_page_size and use >, since we are not writing only one
byte.
> + {
> + err = ENOSPC;
> + break;
> + }
> +
> + block_t block = boffs_block (dev_off);
> +
> + /* Stop this batch at the first non-consecutive block (don’t consume
> it). */
> + if (prev && block != prev + 1)
> + break;
> + prev = block;
Do you actually want to break on non-consecutive blocks? The loop is not
actually keeping a mutex locked, so we can just let pending_blocks_add
call pending_blocks_write on non-consecutive blocks as appropriate, and
have only one while loop, there is not even a need to limit to
EXT2_BULK_MAX_BLOCKS.
> + if (modified_global_blocks)
> + {
> + /* Avoid overspray when FS block < page size. */
> + if (test_bit (block, modified_global_blocks))
> + err = pending_blocks_add (&pb, block);
> + else
> + err = pending_blocks_skip (&pb);
> + }
> + else
> + {
> + /* No mask: enqueue every block; writer will coalesce contiguous
> LBAs. */
> + err = pending_blocks_add (&pb, block);
> + }
> +
> + if (err)
> + break;
> +
> + built += block_size;
> + blocks_built++;
> + }
> +
[...]
> + if (written)
> + {
> + vm_size_t w = done;
> + if (w > length)
> + w = length;
> + /* libpager expects page-aligned progress. */
> + w -= (w % vm_page_size);
> + *written = w;
> + }
Realizing: does it actually happen that we are writing more than length?
I don't think we want to silently cap that, but raise an assert,
otherwise we might be silently corrupting some data.
(and same in file_pager_write_pages)
> @@ -518,11 +640,17 @@ pager_write_pages (struct user_pager_info *pager,
> vm_size_t length,
> vm_size_t *written)
> {
> - /* libpager will just hand this off to the pager_write_page. */
> - if (pager->type != FILE_DATA)
> - return EOPNOTSUPP;
> + switch (pager->type)
> + {
> + case FILE_DATA:
> + return file_pager_write_pages (pager->node, offset, data, length,
> written);
>
> - return file_pager_write_pages (pager->node, offset, data, length, written);
> + case DISK:
> + return disk_pager_write_pages (offset, data, length, written);
> +
> + default:
> + return EOPNOTSUPP;
I don't think we'd ever add another case, so you can keep the same if()
else as in pager_write_page.
Samuel