On Fri, 14 Sep 2007, Bruce Allen wrote:

 I will try to get fsprobe deployed on as much of the Nordic LHC storage as
 possible.

I'll get fsprobe up and going on the new systems I am putting together in Hannover, and will also try and encourage the right people to get it running on some of the LIGO Scientific Collaboration's other storage systems.

I might be dense after holiday, but I still don't get the reasons for such an interest in running fsprobe. I can see it being used as a burn-in test and to prove that a running system can write then read data correctly, but what does it mean about the data that is already written or about the data that is in flight at the time fsprobe is run? (someone else asked this question earlier in the thread and didn't get an answer either) How is fsprobe as a burn-in test better than, say, badblocks ?

I am genuinely interested in these answers because I have written a somehow similar tool 5-6 years ago to test new storage, simply because I didn't trust enough the vendors' burn-in test(s). My interest was a bit larger in the sense that apart from data correctness I was also checking the behaviour of FS quota accounting (by creating randomly sized files with random ownership) and of the disk+FS in face of fragmentation (by measuring "instantaneous" speed). But I never saw the potential usage by other people mainly because I could not find answers to the above questions, so I never thought about making it public... and now it's too late ;-)

There is another issue that I could never find a good answer to: how much testing a storage device should withstand before the testing itself becomes dangerous or disturbing ? Access by the test tool requires usage of resources: sharing of connections, poluting of caches, heads that have to be moved. For example, for the 1.something GB/s figure that was mentioned earlier in this thread, would you accept a halving of the speed while the data integrity test is being run ? Or more generally, how much of the overall performance of the storage system would you be willing to give up for the benefit of knowing that data can still be written and then read correctly ? And sadly I miss some data in the results that Google and others published recently: how much were the disks seeking (moving heads) during their functioning ? I imagine that it's hard to get such data (should probably be from the disk as opposed to kernel, as firmware could still reorder), but IMHO is valuable for those designing multi-user storage systems where disks move heads frequently to access files belonging to different users (and therefore spread on the disk) that are used "simultaneously".

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [EMAIL PROTECTED]
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to