Hello, Milos Nikic, le jeu. 05 mars 2026 09:31:26 -0800, a ecrit: > Hurd VFS works in 3 layers: > > 1. Node cache layer: The abstract node lives here and it is the ground truth > of a running file system. When one does a stat myfile.txt, we get the > information straight from the cache. When we create a new file, it gets > placed in the cache, etc. > > 2. Pager layer: This is where nodes are serialized into the actual physical > representation (4KB blocks) that will later be written to disk. > > 3. Hard drive: The physical storage that receives the bytes from the pager. > > During normal operations (not a sync mount, fsync, etc.), the VFS operates > almost entirely on Layer 1: The Node cache layer. This is why it's super fast. > User changed atime? No problem. It just fetches a node from the node cache > (hash table lookup, amortized to O(1)) and updates the struct in memory. And > that is it.
Yes, so that we get as efficient as possible. > Only when the sync interval hits (every 30 seconds by default) does the Node > cache get iterated and serialized to the pager layer (diskfs_sync_everything > -> > write_all_disknodes -> write_node -> pager_sync). So basically, at that > moment, we create a snapshot of the state of the node cache and place it onto > the pager(s). It's not exactly a snapshot because the coherency between inodes and data is not completely enforced (we write all disknodes before asking the kernel to write back dirty pages, and then poke the writes). > Even then, pager_sync is called with wait = 0. It is handed to the pager, > which > sends it to Mach. At some later time (seconds or so later), Mach sends it back > to the ext2 pager, which finally issues store_write to write it to Layer 3 > (The > Hard drive). And even that depends on how the driver reorders or delays it. > > The effect of this architecture is that when store_write is finally called, > the > absolute latest version of the node cache snapshot is what gets written to the > storage. Is this basically correct? It seems to be so indeed. > Are there any edge cases or mechanics that are wrong in this model > that would make us receive a "stale" node cache snapshot? Well, it can be "stale" if another RPC hasn't called diskfs_node_update() yet, but that's what "safe" FS are all about: not actually provide more than coherency of the content on the disk so fsck is not suppposed to be needed. Then, if a program really wants coherency between some files etc. it has to issue sync calls, dpkg does it for instance. Samuel
