On 4/07/25 03:40, Milos Nikic wrote:
Hi all,
I’m working on a user-space metadata journaling layer for libdiskfs and
have run into a boot-time issue I hope to get guidance on.
The goal is to capture metadata changes very early in the boot process —
ideally even before fsck — and flush them to a raw device (backed by a
file or similar) synchronously (i.e. the calling thread waits for the
result of the write).
However, I’ve found that attempting to open() the raw device during
early boot (e.g. from journal_init(), called via diskfs_init_diskfs) can
block indefinitely. This happens even with O_NONBLOCK, and also applies
to access(), stat(), etc. — all of which appear to stall while the
device layer is still coming online.
As a workaround, I buffer entries in memory and defer writes until
later, and I’m also experimenting with watchdog threads that poll for
readiness. But I’d like to find a more robust and principled solution.
Specifically:
- When is it considered safe for user-space code to open and write to a
persistent block-backed device?
- Is there a recommended hook (or mechanism) I could wait for before
beginning replay or issuing synchronous flushes?
- Or is this limitation expected, and journaling systems should stage
initialization accordingly?
Thanks in advance for any insight or suggestions.
Two Devils-Advocate questions:
1) if there is no filesystem device, why does the journal exist?
- read is impossible.
- write is dubious.
Perhaps instead the journal init should be an implicit part of the
device activation sequence, not triggered separatly in the kernel
startup sequence.
- code needing to do early-write can do their own caching of the
important data and journal API catches it on the true write attempt
after FS exists.
2) for memory-only filesystems (eg initramfs), what is the point of a
journal?
- no disk sync issues
- everything is lost on memory wipe, including journal
Perhapse a dummy/no-op journal API is better for these cases.
HTH
Amos