On 4/07/25 03:40, Milos Nikic wrote:
Hi all,

I’m working on a user-space metadata journaling layer for libdiskfs and have run into a boot-time issue I hope to get guidance on.

The goal is to capture metadata changes very early in the boot process — ideally even before fsck — and flush them to a raw device (backed by a file or similar) synchronously (i.e. the calling thread waits for the result of the write).

However, I’ve found that attempting to open() the raw device during early boot (e.g. from journal_init(), called via diskfs_init_diskfs) can block indefinitely. This happens even with O_NONBLOCK, and also applies to access(), stat(), etc. — all of which appear to stall while the device layer is still coming online.

As a workaround, I buffer entries in memory and defer writes until later, and I’m also experimenting with watchdog threads that poll for readiness. But I’d like to find a more robust and principled solution.

Specifically:
- When is it considered safe for user-space code to open and write to a persistent block-backed device? - Is there a recommended hook (or mechanism) I could wait for before beginning replay or issuing synchronous flushes? - Or is this limitation expected, and journaling systems should stage initialization accordingly?

Thanks in advance for any insight or suggestions.




Two Devils-Advocate questions:

1) if there is no filesystem device, why does the journal exist?

 - read is impossible.
 - write is dubious.

Perhaps instead the journal init should be an implicit part of the device activation sequence, not triggered separatly in the kernel startup sequence. - code needing to do early-write can do their own caching of the important data and journal API catches it on the true write attempt after FS exists.


2) for memory-only filesystems (eg initramfs), what is the point of a journal?

 - no disk sync issues
 - everything is lost on memory wipe, including journal

 Perhapse a dummy/no-op journal API is better for these cases.


HTH
Amos



Reply via email to