Hey Samuel, Ok let me maybe explain myself better and how I understand what is going on.
In a mount of ext2 that is not a sync one, the file system is lazy. It only flushes to disk if explicitly asked (fsync etc) or if sync_everything happens (every 30 seconds). At that time the diskfs_cache is iterated and pushed onto the pager, to be flushed to disk (and without waiting or store_sync even here!). Now, the separate journal thread just forces the journal write (and NOT a filesystem write) more often. It commits the transaction inside the journal and updates the journal superblock. This actually helps with the fact that things are first in the journal and only then in the file system. Yes, journal_block_is_active function is the bulwark against filesystem writes happening before the journal. If journal_block_is_active returns 1 inside the pager, we simply force the journal commit to happen first and only then write filesystem pages to disk. On the paths where fsync is passed in, the journal is committed immediately as soon as it is called. (I went over all the calls to diskfs_node_update and made sure that there are no writes to the file system before the journal). Where I am right now: The last piece that I am banging my head against the wall is trying to checkpoint progressively, block by block. I added logic into the ext2 pager to notify the journal when it is writing blocks. Now the journal keeps track of which committed transactions it can "retire" and progress the superblock tail. The Issue: There are files and blocks (/dev/null, /tmp folder, /tmp/.X11-unix, /var/log and some others) that seem to get hammered a lot with metadata updates (mostly timestamps), yet the ext2 pager never seems to write them back. Because my journal captures that these blocks have been changed (creating a transaction), but the Pager never flushes them (likely because the VM doesn't mark the page dirty for these files/blocks for simple atime updates), I am unable to "retire" those transactions. They stay open forever, blocking the progressive advancement of the journal tail. It seems to me most of these are just access time updates. One idea would be to simply ignore atime updates in the journal logic so we don't wait for them? Another one is to just rely on sync all when close to being full and just mark journal as empty again. (basically abandon this block based progressive advancement) I thought that modified_global_blocks is the solution and that if forcing those bits would fix it, but it seem in my installation modified_global_blocks is null at all time. It would also be great to understand the mechanism on how do those special files/blocks actually bypass the ext2 pager totally. Kind regards, Milos
