Hello,

Milos Nikic, le mar. 17 févr. 2026 11:00:34 -0800, a ecrit:
> Ok let me maybe explain myself better and how I understand what is going on.

Ok, but by point is that it's in the source code that this should be
explained :)

> This actually helps with the fact that things are first in the journal and 
> only
> then in the file system.

"Helping" is not enough :)

> Yes, journal_block_is_active function is the bulwark against filesystem writes
> happening before the journal. 

And thus definitely needs documented in the source code itself, so
readers get it easily.

> I added logic into the ext2 pager to notify the journal when it is writing
> blocks. Now the journal keeps track of which committed transactions it can
> "retire" and progress the superblock tail.

Cool :)

> The Issue:  There are files and blocks (/dev/null, /tmp folder,
> /tmp/.X11-unix, /var/log and some others) that seem to get hammered a
> lot with metadata updates (mostly timestamps),
[...]
> It seems to me most of these are just access time updates. One idea would be 
> to
> simply ignore atime updates in the journal logic so we don't wait for them?

/var/log is expected for data, but e.g. /dev/null is *really* not
expected. Normally, the relatime option should already be taking care of
updating atime only once a day per file when it's already younger than
mtime/ctime. If it's not, we should really fix it, we have no reason to
write that often.

> yet the ext2 pager never seems to write them back.

For translators, it is expected that no data is written. But still we
shouldn't need to update the time, that's a bug that should be fixed.

Milos Nikic, le mar. 17 févr. 2026 19:57:33 -0800, a ecrit:
> Yes some "files" like /dev/null are a translators and can be excluded based on
> mode alone. (whether that is good idea, is a separate question)

We shouldn't have to exclude explicitly, relatime should be enough.

> But there are other files that look perfectly ordinary:
> For instance:
>  /etc/resolv.conf
> or 
> /tmp/.X11-unix
> /tmp/.ICE-unix
> 
> Occasionally their mode drops to 0, but overall these files are regular files
> (not translators) that for some reason isn't handled by the ext2 pager.
> 
> And its not all atime updates either...there are "other" updates as well.
> This is all early boot though, but i still don't understand why isn't ext2
> pager handling them.

Possibly there's a bug to fix in there.

> To my mind comes a few things, if we want to pursue them:
> 1) Aggressive Filtering (The "Strict Lazy" approach)
>    Logic: If !S_ISREG(mode) && !S_ISDIR(mode), ignore ALL timestamp-only
> updates. Only journal if mode/uid/size changes.
>     Pros: Likely solves the issue completely.
>     Cons: Potentially risky if a file transitions states (e.g., git temp 
> files)
> or if we miss legitimate metadata updates on special nodes.

Yeah, we don't want that.

> 2) Active Checkpointing (The "Sweeper")
>     Logic: If a transaction is stuck waiting for blocks, the journal thread
> explicitly calls store_write for those blocks, bypassing the Pager's
> dirty-check.
>     Pros: Guarantees consistency.
>     Cons: High complexity. It fights the Pager's logic and seems like a large
> architectural change.

We don't want to paper over what looks like a pager bug. We want to fix
the pager.

> 3) Perhaps just abandon block by block tail advancement idea for now, and
> revert to "flush when almost full" approach which works well.

If the scenario doesn't happen too often, the flush-when-almost-full can
stay along the progressive eflush.

Samuel

Reply via email to