Hey Samuel I have actually experimenting with this is and for some reason we deadlock if we don't stop at non consecutive blocks.
I also did a bunch of testing like creating many directories and sparse files hoping that I could get some big savings on the amount of times store_write was called but I got to say I didn't. So either my tests are not great, or there isn't much benefit to gain here by batching these writes for the disk case. Kind regards, Milos On Wed, Sep 3, 2025, 12:20 PM Samuel Thibault <[email protected]> wrote: > Hello, > > Thanks for this! Have you made some statistics on the gain? > > Milos Nikic, le mar. 02 sept. 2025 17:58:37 -0700, a ecrit: > > -/* Strong override: only FILE_DATA uses bulk; others keep per-page > path. */ > > +/* Bulk write for DISK pager across [offset, offset + length). > > + We always iterate block-by-block, build exactly one > strictly-consecutive > > + run per batch (stop at first gap or when hitting > EXT2_BULK_MAX_BLOCKS), > > + and flush via pending_blocks_write(), which coalesces into large > writes. > > + Invariant: we either make forward progress or return an error. > > + Progress is reported page-aligned via *written (if non-NULL). */ > > +static error_t > > +disk_pager_write_pages (vm_offset_t offset, > > + vm_address_t buf, > > + vm_size_t length, vm_size_t *written) > > +{ > [...] > > + /* If at/past end of device, we cannot progress -> error. */ > > + if (dev_off >= dev_end) > > You'd want to add vm_page_size and use >, since we are not writing only > one byte. > > > + { > > + err = ENOSPC; > > + break; > > + } > > + > > + block_t block = boffs_block (dev_off); > > + > > + /* Stop this batch at the first non-consecutive block (don’t > consume it). */ > > + if (prev && block != prev + 1) > > + break; > > + prev = block; > > Do you actually want to break on non-consecutive blocks? The loop is not > actually keeping a mutex locked, so we can just let pending_blocks_add > call pending_blocks_write on non-consecutive blocks as appropriate, and > have only one while loop, there is not even a need to limit to > EXT2_BULK_MAX_BLOCKS. > > > + if (modified_global_blocks) > > + { > > + /* Avoid overspray when FS block < page size. */ > > + if (test_bit (block, modified_global_blocks)) > > + err = pending_blocks_add (&pb, block); > > + else > > + err = pending_blocks_skip (&pb); > > + } > > + else > > + { > > + /* No mask: enqueue every block; writer will coalesce > contiguous LBAs. */ > > + err = pending_blocks_add (&pb, block); > > + } > > + > > + if (err) > > + break; > > + > > + built += block_size; > > + blocks_built++; > > + } > > + > [...] > > > + if (written) > > + { > > + vm_size_t w = done; > > + if (w > length) > > + w = length; > > + /* libpager expects page-aligned progress. */ > > + w -= (w % vm_page_size); > > + *written = w; > > + } > > Realizing: does it actually happen that we are writing more than length? > I don't think we want to silently cap that, but raise an assert, > otherwise we might be silently corrupting some data. > > (and same in file_pager_write_pages) > > > @@ -518,11 +640,17 @@ pager_write_pages (struct user_pager_info *pager, > > vm_size_t length, > > vm_size_t *written) > > { > > - /* libpager will just hand this off to the pager_write_page. */ > > - if (pager->type != FILE_DATA) > > - return EOPNOTSUPP; > > + switch (pager->type) > > + { > > + case FILE_DATA: > > + return file_pager_write_pages (pager->node, offset, data, > length, written); > > > > - return file_pager_write_pages (pager->node, offset, data, length, > written); > > + case DISK: > > + return disk_pager_write_pages (offset, data, length, written); > > + > > + default: > > + return EOPNOTSUPP; > > I don't think we'd ever add another case, so you can keep the same if() > else as in pager_write_page. > > Samuel >
