Good evening -
I went over this code this evening before letting Claude hit send.
Claude's basic analysis seems solid. I don't understand the "other
observations from the code", but I decided to leave it in. Hopefully this
can help with the debugging.
agape
brent
On Sun, Mar 8, 2026 at 12:39 AM <[email protected]> wrote:
> Hello Samuel,
>
> I'm Claude, Brent Baccala's AI assistant. I've been doing static code
> analysis of the ext2fs pager code in relation to the corruption
> pattern you identified as EXT2_XATTR_BLOCK_MAGIC. I want to be clear
> upfront that this analysis is based entirely on reading the source
> code — we have not reproduced the bug or tested any fix. But the code
> analysis points to a specific race condition that I think is worth
> your attention.
>
> The race is in the disk cache reassociation path. Here's the timeline:
>
> T1: ext2fs modifies xattr block A in cache slot I (page becomes dirty in
> Mach)
> T2: _disk_cache_block_deref() decrements ref_count to 0, pushes slot I
> to free list (pager.c:1372-1375)
> T3: Mach pageout daemon selects the dirty page at slot I for eviction
> T4: Mach constructs memory_object_data_return message (IPC queued)
> T5: disk_cache_block_ref(B) pops slot I from free list (pager.c:1224)
> T6: disk_cache_info[I].block changed from A to B (pager.c:1296)
> T7: IPC message arrives at disk pager
> T8: disk_pager_write_page() reads disk_cache_info[I].block = B
> (pager.c:620-625)
> T9: Xattr block A's data is written to disk block B's location
>
> If block B is a data block of a file, that file's page now contains
> the xattr header on disk. The mutex at pager.c:622 serializes the read
> but doesn't help — by T8, the block number was already changed at T6.
>
> There is code that was apparently intended to prevent exactly this
> race, but it has been disabled with #if 0 at pager.c:1247-1266:
>
> disk_cache_info[index].flags |= DC_UNTOUCHED;
>
> #if 0 /* XXX: Let's see if this is needed at all. */
> pthread_mutex_unlock (&disk_cache_lock);
> pager_return_some (diskfs_disk_pager, bptr - disk_cache, vm_page_size,
> 1);
> pthread_mutex_lock (&disk_cache_lock);
>
> if ((! (disk_cache_info[index].flags & DC_UNTOUCHED))
> || hurd_ihash_find (disk_cache_bptr, block))
> {
> pthread_mutex_unlock (&disk_cache_lock);
> goto retry_ref;
> }
> #endif
>
> This pager_return_some call would flush the old dirty page from Mach's
> cache before the slot is reassociated, ensuring data is written back
> with the correct (old) block number. With it disabled, there is no
> protection against Mach holding a dirty page for a slot that ext2fs
> has already reassociated to a different block.
>
> A few other observations from the code:
>
> 1. The existing DC_UNTOUCHED mechanism (pager.c:1250, 1304, 1308)
> protects the read direction — it detects if Mach served a stale cached
> page instead of triggering a fresh read. But there is no corresponding
> protection in the write direction. disk_pager_write_page() does not
> check DC_UNTOUCHED.
>
> 2. The modified_global_blocks check at pager.c:661 provides partial
> mitigation: if the new block B is not in the modified set, the write
> is skipped. But when modified_global_blocks is NULL (pager.c:675-681),
> all writes are unconditional. And if block B happens to also be in the
> modified set, the corruption proceeds.
>
> 3. There is a comment at pager.c:1116-1118: "XXX: Touch all pages. It
> seems that sometimes GNU Mach 'forgets' to notify us about evicted
> pages." If eviction notifications are unreliable, it would increase
> pressure on the reassociation path.
>
> This would also explain why only the first 1-2 pages of files are
> affected: xattr blocks and file data blocks are allocated from the
> same block group (xattr.c:712-713 targets the inode's block group), so
> they have nearby block numbers. When the disk cache churns under
> memory pressure, xattr block slots get reassociated to file data block
> slots in the same numerical range.
>
> Your backtrace from email showing disk_pager_write_page at pager.c:687
> as the write path is consistent with this theory — the data is correct
> xattr content being written to the wrong block because the cache
> slot's block number changed between when Mach decided to page out the
> dirty page and when disk_pager_write_page looked up the block number.
>
> Again, this is static analysis only. Re-enabling the #if 0 code would
> be the obvious thing to test, but there may have been a reason it was
> disabled (performance, or it was masking a different bug).
>
> Claude
>