> Pádraig Brady wrote:
> >   mv no longer supports moving a file to a hardlink, instead issuing an 
> > error.
> >   The implementation was susceptible to races in the presence of multiple mv
> >   instances, which could result in both hardlinks being deleted.  Also on 
> > case
> >   insensitive file systems like HFS, mv would just remove a hardlinked 
> > 'file'
> >   if called like `mv file File`.  The feature was added in coreutils-5.0.1.

Unfortunately that description doesn't really say what was happening.
And because of that it is creating a misunderstanding of the problem.

> Shocking, low-quality behaviour from a gnu program to help apple.
> 
> In any case. thanks a lot for providing this information - I looked a bit
> through the changes, but couldn't find anything specific. It also means the
> nice behaviour is not coming back, I need to find another mv, and this bug
> ain't a bug.

I investigated this a little and found this behavior.

  mkdir /tmp/junk
  cd /tmp/junk
  touch foo
  ln foo bar
  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = 0

The kernel system call rename(2) returns 0 for SUCCESS.  And yet the
operation definitely did not succeed.

  ls -l
    total 0
    -rw-rw-r-- 2 rwp rwp 0 Mar  7 04:35 bar
    -rw-rw-r-- 2 rwp rwp 0 Mar  7 04:35 foo

The file still exists.  The kernel returned SUCCESS for the rename(2)
call.  And in fact this can be run back to back many times with the
same result.

  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = 0
  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = 0
  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = 0
  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = 0

I find the above result surprising!  Why does the Linux kernel
rename(2) have this behavior?  I do not know.  However it is
documented.

  man 2 rename

       If oldpath and newpath are existing hard links referring to the
       same file, then rename() does nothing, and returns a success
       status.

And POSIX has standardized this behavior between systems.  Therefore
all POSIX systems should have this identical behavior.

  http://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html

  If the old argument and the new argument resolve to the same
  existing file, rename() shall return successfully and perform no
  other action.

Just to prove that it will return an error under other conditions
remove the source file.

  rm foo
  strace -e trace=rename perl -le 'rename("foo","bar") or die;'
    rename("foo", "bar")                    = -1 ENOENT (No such file or 
directory)
    Died at -e line 1.

The underlying problem is that the kernel returns success if renaming
two hard linked files in this situation.  This (surprising) behavior
is now detected by mv and reported as an error.

  strace -e file mv foo bar
  ...
    stat("bar", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    lstat("foo", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    lstat("bar", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
  mv: 'foo' and 'bar' are the same file

So why did this appear to work previously with previous versions of
coreutils mv command?  Because 'mv' didn't previously rename(2) the
files at all.

  strace -e file mv foo bar
    stat("bar", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    lstat("foo", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    lstat("bar", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    stat("bar", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
    access("bar", W_OK)                     = 0
    unlink("foo")                           = 0

Previously mv would stat(2) all of the files involved and detect this
case and in that case unlink the file giving the appearance that it
was performing the requested action.  And there is the problem.  Users
expect 'mv' to be an atomic kernel rename(2) operation.  (At least I
do.)  And yet in the above case it was actually a series of six system
calls.  It was prone to race conditions due to this.  When the
description says that this is no longer supported it means that this
code subject to race conditions has been removed.  Knowing what I know
now I would argue that it shouldn't have been there in the first
place.  The utilities should not hide race condition behavior.  (If
the script author still wants the racy behavior then the script author
can perform the same actions explicitly and get identical behavior.)

The root cause of the problem seems to be the underlying kernel system
call behavior.  But that kernel system call can't be changed because
it is part of the standard kernel system interface specification.
Which means it was historical behavior on multiple kernels originally
and standardized specifically for the reason that changing it would
make portable programs impossible.

I hope this additional information helps understand what is happening.

Bob

Attachment: signature.asc
Description: PGP signature

Reply via email to