The documentation of MDB_NOSYNC says:

    If the filesystem preserves write order and the MDB_WRITEMAP flag
    is not used, transactions exhibit ACI (atomicity, consistency,
    isolation) properties and only lose D (durability).

In practice, what file system + options preserve write order?

Asked this question elsewhere from Howard. I got the answer that ZFS
should do it, and ext4 with data=ordered _may_ do it. It seems to me
that ext4 with data=journal should be a very safe bet, too, would it
not? Are there any other recommendations?

I ran a few microbenchmarks to compare ext4 data=ordered and
data=journal. With the default sync, they can do about 600 and 400
write txn/s. With nosync + an mdb_env_sync() every second, they are
both at about 200k txn/s. For reference, the system can do about 5
million read txn/s. That makes me hopeful that ext4 with data=journal
could be a good option.

Cheers,
Gábor Melis

Reply via email to