This is an automated email from the ASF dual-hosted git repository.

vatamane pushed a commit to branch merge-3.4.3
in repository https://gitbox.apache.org/repos/asf/couchdb.git

commit 96c0698c1e0c9a5beace0a55e7e60c647effa348
Author: Nick Vatamaniuc <[email protected]>
AuthorDate: Mon Jan 13 15:59:25 2025 -0500

    Use fdatasync for commits
    
    We can use fdatasync to save 1 extra write per call, for a total of 2 writes
    per commit, since we do two sync, one for data block up to the header, then
    another after the header.
    
    As of OTP 25 (our oldest supported version):
      * On Linux/BSDs: fdatasync()
      * On Window: FlushFileBuffers() i.e. the same as for file:sync/1
      * On MacOS: fcntl(fd,F_FULLFSYNC/F_BARRIERFSYNC)
    
    According to https://linux.die.net/man/2/fdatasync
    
     > fdatasync() is similar to fsync(), but does not flush modified metadata
     unless that metadata is needed in order to allow a subsequent data 
retrieval
     to be correctly handled. For example, changes to st_atime or
     st_mtime (respectively, time of last access and time of last modification; 
see
     stat(2)) do not require flushing because they are not necessary for a
     subsequent data read to be handled correctly. On the other hand, a change 
to
     the file size (st_size, as made by say ftruncate(2)), would require a 
metadata
     flush.
    
    The key things for us are:
    
      * It updates the size (positions) correctly
      * We do not rely or care about atime/mtime for safety or correctness
      * Erlang VM does the right thing on all the supported OSes
---
 src/couch/src/couch_file.erl | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/couch/src/couch_file.erl b/src/couch/src/couch_file.erl
index 248a26097..02b2412f3 100644
--- a/src/couch/src/couch_file.erl
+++ b/src/couch/src/couch_file.erl
@@ -595,7 +595,12 @@ format_status(_Opt, [PDict, #file{} = File]) ->
 
 fsync(Fd) ->
     T0 = erlang:monotonic_time(),
-    Res = file:sync(Fd),
+    % We do not rely on mtime/atime for our safety/consitency so we can use
+    % fdatasync. As of version 25 OTP will use:
+    %  - On Linux/BSDs: fdatasync()
+    %  - On Window: FlushFileBuffers() i.e. the same as for file:sync/1
+    %  - On MacOS: fcntl(fd,F_FULLFSYNC/F_BARRIERFSYNC)
+    Res = file:datasync(Fd),
     T1 = erlang:monotonic_time(),
     % Since histograms can consume floating point values we can measure in
     % nanoseconds, then turn it into floating point milliseconds

Reply via email to