Can anybody suggest why a script which causes writes to an NFS mounted directory like so

   ssh remotenode 'command >/usr/common/tmp/outfile.txt'

could somehow fail that write silently, but this variant

ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile /usr/common/tmp/outfile.txt'

would always succeed?

(Actually it is slightly more complicated than this because
the whole command string shown above is constructed and then run in another program within a system() call. Initially this turned up inside a threaded version, but it does it even with a straight system() call. I cannot reproduce this problem by running the ssh commands from the command line, it only happens inside the script. The files so far have been relatively small, less than 50kb. "command" is a run of the NCBI blastn program, although that is probably irrelevant.)

I have even seen this happen:

ssh remotenode 'command >/usr/common/tmp/outfile.txt; ls -al /usr/common/tmp/outfile.txt'
   ls -al /usr/common/tmp/outfile.txt

where the first ls (running on the remote node) shows the output file while the second (running on the NFS server) does not.

This is on a CentOS 7 system. The server was last updated 8 days ago but the compute nodes have not been updated in almost a year.

Server kernel is  3.10.0-1160.6.1.el7.x86_64
Client kernel is  3.10.0-1062.12.1.el7.x86_64

There are no error messages in stderr, /var/log/messages, or dmesg.

The client's fstab has:

  server:/usr/common   /usr/common     nfs     bg,hard,intr,rw 1       1

and the server's /etc/exports has:

  /usr/common      *.cluster(rw,sync,no_root_squash)


Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to