On Fri, Dec 31, 2010 at 09:51:50AM -0200, Henrique de Moraes Holschuh wrote: > On Fri, 31 Dec 2010, Olaf van der Spek wrote: > > Ah, hehe. BTW, care to respond to the mail I send to you? > > There is nothing more I can add to this thread. You want O_ATOMIC. It > cannot be implemented for all use cases of the POSIX API, so it will not > be implemented by the kernel. That's all there is to it, AFAIK. > > You could ask for a new (non-POSIX?) API that does not ask of a > POSIX-like filesystem something it cannot provide (i.e. don't ask for > something that requires inode->path reverse mappings). You could ask > for syscalls to copy inodes, etc. You could ask for whatever is needed > to do a (open+write+close) that is atomic if the target already exists. > Maybe one of those has a better chance than O_ATOMIC.
The O_ATOMIC open flag is highly problematic, and it's not fully specified. What if the system is under a huge amount of memory pressure, and the badly behaved application program does: fd = open("file", O_ATOMIC | O_TRUNC); write(fd, buf, 2*1024*1024*1024); // write 2 gigs, heh, heh heh <sleep for one day> write(fd, buf2, 1024); close(fd); What happens if another program opens "file" for reading during the one day sleep period? Does it get the the old contents of "file"? The partially written, incomplete new version of "file"? What happens if the file is currently mmap'ed, as Henrique has asked? What if another program opens the file O_ATOMIC during the one day sleep period, so the file is in the middle of getting updated by two different processes using O_ATOMIC? How exactly do the semantics for O_ATOMIC work? And given at the momment ***zero*** file systems implement O_ATOMIC, what should an application do as a fallback? And given that it is highly unlikely this could ever be implemented for various file systems including NFS, I'll observe this won't really reduce application complexity, since you'll always need to have a fallback for file systems and kernels that don't support O_ATOMIC. And what are the use cases where this really makes sense? Will people really code to this interface, knowing that it only works on Linux (there are other operating systems, out there, like FreeBSD and Solaris and AIX, you know, and some application programmers _do_ care about portability), and the only benefits are (a) a marginal performance boost for insane people who like to write vast number of 2-4 byte files without any need for atomic updates across a large number of these small files, and (b) the ability to keep the the file owner unchanged when someone other than the owner updates said file (how important is this _really_; what is the use case where this really matters?). And of course, Olaf isn't actually offerring to implement this hypothetical O_ATOMIC. Oh, no! He's just petulently demanding it, even though he can't give us any concrete use cases where this would actually be a huge win over a userspace "safe-write" library that properly uses fsync() and rename(). If someone were to pay me a huge amount of money, and told me what was the file size range where such a thing would be used, and what sort of application would need it, and what kind of update frequency it should be optimized for, and other semantic details about parallel O_ATOMIC updates, what happens to users who are in the middle of reading the file, what are the implications for quota, etc., it's certainly something I can entertain. But at the moment, it's a vague specification (not even a solution) looking for a problem. - Ted -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110102070922.ga6...@thunk.org