Can anybody suggest why a script which causes writes to an NFS mounted
directory like so
ssh remotenode 'command >/usr/common/tmp/outfile.txt'
could somehow fail that write silently, but this variant
ssh remotenode 'command >/tmp/outfile; mv /tmp/outfile
/usr/common/tmp/outfile.txt'
would always succeed?
(Actually it is slightly more complicated than this because
the whole command string shown above is constructed and then run in
another program within a system() call. Initially this turned up inside
a threaded version, but it does it even with a straight system() call.
I cannot reproduce this problem by running the ssh commands from the
command line, it only happens inside the script. The files so far have
been relatively small, less than 50kb. "command" is a run of the NCBI
blastn program, although that is probably irrelevant.)
I have even seen this happen:
ssh remotenode 'command >/usr/common/tmp/outfile.txt; ls -al
/usr/common/tmp/outfile.txt'
ls -al /usr/common/tmp/outfile.txt
where the first ls (running on the remote node) shows the output file
while the second (running on the NFS server) does not.
This is on a CentOS 7 system. The server was last updated 8 days ago
but the compute nodes have not been updated in almost a year.
Server kernel is 3.10.0-1160.6.1.el7.x86_64
Client kernel is 3.10.0-1062.12.1.el7.x86_64
There are no error messages in stderr, /var/log/messages, or dmesg.
The client's fstab has:
server:/usr/common /usr/common nfs bg,hard,intr,rw 1 1
and the server's /etc/exports has:
/usr/common *.cluster(rw,sync,no_root_squash)
Thanks,
David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf