Bug#577642: mv deletes files created while moving large directories

chrysn Wed, 14 Apr 2010 01:57:22 -0700

On Wed, Apr 14, 2010 at 08:32:39AM +0200, Jim Meyering wrote:
> In some sense, the behavior you've noticed is inevitable.
> Imagine that after copying, mv were to go back and check again:
> then it spots the new file (your "latestfile") and copies it.
> Do we continue iterating and looking for new files in each
> and every directory being copied?  At some point we have to
> stop and then begin the removal process (which requires removal
> of each entire tree/argument).  Between when we stop looking for
> new files and when the removal gets to any given directory, there
> will always be an interval during which someone can create a file/dir
> there that will silently be removed.
> 
> Also consider this: what if a file we've already copied is removed before
> the copy completes?  Should mv perform another iteration to detect that,
> and then remove it also in the destination tree?


i am aware that it is impossible to atomically move all files and remove
the directory on posix semantics, that's why i rather suggest leaving
left-over files where they are and not removing the directory.

for sake of completeness, there is even the problem with open file
handles: assume a process has just written a file that is now being
moved and still has a file handle. when move completes, the file is
unlinked, leaving the program with a write handle on a deleted file, to
which it can, to my knowledge, continue writing, but on close(), all is
lost -- in the typical case originally described, this is not the
case, though, and people who operate on files currently being written to
usually know that there can be issues.


> If we were to try to make mv remove source files only if we've copied
> them, not only would that introduce a significant amount of overhead,
> but [...]

i've now had a look at the implementation -- current coreutils really
does the equivalent of 'rm -r' if there were no errors when copying.
only removing the files moved would mean tracking all of them, while the
current theoretical memory requirement amounts to the maximum path
depth.

>                                                       [...] overhead,
> but it would change mv's semantics.
> 
> If you want to pursue this, I suggest that you bring it up with the
> Austin Group (they define the POSIX standard).
> http://www.opengroup.org/austin/

for what i looked up on posix specs, there are no statements about what
to do in case of EXDEV (rename didn't work) [1]. do you think the austin
group would bother to specify previously unspecified behavior?

[1] http://www.opengroup.org/onlinepubs/9699919799/utilities/mv.html


a solution that goes even deeper into the semantics but has no memory
overhead issues would be to delete files immediately after moving them.
this has a deeper effect on the semantics because its effect is not
limited to the case described above, but also affects cases in which
some files can't be read, which would be the only files left in this
solution (while originally, in that case there would be a copy of
readable files in the destination, but all unreadable files would be
left untouched).


in case we stick to the current semantics (or implement others but leave
the old as default), i suggest the following section to be inserted in
the man page:

------------------------------------------------------------------------

CAVEATS
       When directories are moved across file systems, the source is
       removed completely after successfully having copied all files to
       the destination with the equivalent of `rm -r`, regardless of
       files written while mv was running.
    
------------------------------------------------------------------------

signature.asc
Description: Digital signature

Bug#577642: mv deletes files created while moving large directories

Reply via email to