Dear all, further my last email, the problem is sorted. In the end it turned out that the SCSI HBA had a problem. Trying to update the firmware resulted in a complete inoperable card. :-( Fortunately, as I had a different card the problem is sorted now.
Thanks to everybody for their suggestions. All the best from London Jörg On Sonntag 21 September 2014 Jörg Saßmannshausen wrote: > Dear all, > > I got a rather strange problem with one of my file servers which I recently > have upgraded in order to accommodate more disc space. > > The problem: I have copies the files from the old file space to a temporary > disc storage space using this rsync command: > > rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo tempspace:baa > > I am doing this now for some years and never had any problems. > > As always, I am running md5sum afterwards to be sure ther is not a problem > later and the user is loosing data. This time around a rather large file > (around 16 GB) the md5sum failed after I moved the files from the temp > space back to the new destination using the same command as above. > > Having still access to the old file space, I decided to move this file from > the old file space. Strangely enough, rsync does not sync the file again > so I had to delete the file. Even after deleting the file and re-sync it > from the old source, the md5sum is wrong. > > Copying the file to a different file space did not cause these problem, > i.e. the md5sum is correct. > As it is a tar.gz file, I simply decided to decompress the original file on > the different file server. That worked. The file where the md5sum is wrong > did not decompress on the different file server but crashed with an error > message when I executed gunzip. So the file is broken. > > The setup: > > Originally I was using an old Infortrand box which had old PATA discs in > it. This box is connected via scsi to a frontend server which exports the > file space via iscsi. The backend for that, i.e. the one the user is > accessing is on a different physical machine and it is a XEN guest. The > reason behind that setting is as the frontend is acting as a backup server > and I don't want people to have access to it. > I then exchanged the Infortrend box with a more recent model which got SATA > capeabilities but still got scsi connection to the frontend. The frontend > is the same. I got a new controller for that box as the old one was > broken. There is no changes in the backend, that is still the same XEN > guest on the same hardware. > > What I cannot work out is why the old Infortrend box does not have any > problems with the new file, the newer one has a problem here. Also, when I > have copied over some files (again using the rsync command above) a few > files did not copy correctly (again md5sum) in the first instance but done > so later. > > I find that highly alarming as that means that at least for larger and/or > some binary files there seems to be a problem. However, I am not sure > there to look at it as I am out of ideas. > > Could it be there is a problem with the 'new' controller? > In all cases I was using ext4 as a file system and I did not have any > problems with that. > > Anybody got some sentiments here? > > All the best from a sunny London > > Jörg > > P.S. To make things worse I am off on a work related trip from Monday > onwards and I am working on that problem since Friday evening. -- ************************************************************* Dr. Jörg Saßmannshausen, MRSC University College London Department of Chemistry Gordon Street London WC1H 0AJ email: j.sassmannshau...@ucl.ac.uk web: http://sassy.formativ.net Please avoid sending me Word or PowerPoint attachments. See http://www.gnu.org/philosophy/no-word-attachments.html
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf