On Mar 11, 2009, at 22:43 , Cameron Simpson wrote:

On 11Mar2009 10:09, Joachim K?nig <h...@online.de> wrote:
Guido van Rossum wrote:
On Tue, Mar 10, 2009 at 1:11 PM, Christian Heimes <li...@cheimes.de> wrote:
[...]
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/54 .
[...]
If I understand the post properly, it's up to the app to call fsync(), and it's only necessary when you're doing one of the rename dances, or updating a file in place. Basically, as he explains, fsync() is a very
heavyweight operation; I'm against calling it by default anywhere.

To me, the flaw seem to be in the close() call (of the operating
system). I'd expect the data to be
in a persistent state once the close() returns. So there would be no
need to fsync if the file gets closed anyway.

Not really. On the whole, flush() means "the object has handed all data
to the OS".  close() means "the object has handed all data to the OS
and released the control data structures" (OS file descriptor release;
like the OS, the python interpreter may release python stuff later too).

By contrast, fsync() means "the OS has handed filesystem changes to the disc itself". Really really slow, by comparison with memory. It is Very
Expensive, and a very different operation to close().

...and at least on OS X there is one level more where you actually tell the
disc to flush its buffers to permanent storage with:

   fcntl(fd, F_FULLSYNC)

The fsync manpage says:

Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physi- cally write the data to the platters for quite some time and it may be
     written in an out-of-order sequence.

Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written. The disk drive may also re-order the data so that later writes may be present,
     while earlier writes are not.

This is not a theoretical edge case. This scenario is easily reproduced
     with real world workloads and drive power failures.

For applications that require tighter guarantees about the integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications, such as databases, that require a strict ordering of writes should use F_FULLFSYNC to ensure that their data is written in the order
     they expect.  Please see fcntl(2) for more detail.

It's not obvious what level of syncing is appropriate to automatically happen
from Python so I think it's better to let the application deal with it.

--Gisle

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to