Dear All, I am clustering 8 32-bits machines and 5 64 bits machines. 64 bits machines are heavily used for MPI jobs and rest of the machines are simply being terminal for PhD students. One of 64-bits machine has all the users' home directory and softwares (not kernel) for both 32bits machines and 64 bits machines and it exports these directories to client machines via NFS. About 10 LDAP users are maintained by the server. This server has 2 external harddisk connected by firewire and these two harddisks are used to store the software and home directories for users under RAID 1 system. We deliverately did NOT use the internal harddisk for the home directory because the internal harddisk (each machine has about 500-750 GB) on each machine can be used for the calculation. Raid 1 slows down the system a bit but we did not expect any reduction of the speed of calculation using the local harddisk.
When I did a very small calculation on my laptop which is standing alone, the execution time was faster than my local cluster machine ( no parallelisation, serial job ). The laptop has the CPU which is 32bit with 1G of memory and has vendor_id : GenuineIntel model name : Intel(R) Pentium(R) 4 Mobile CPU 2.00GHz stepping : 4 cpu MHz : 1994.395 cache size : 512 KB cpuid level : 2 bogomips : 3992.82 The test machine in the cluster has dual core with 4GB of memory and each core has vendor_id : AuthenticAMD model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ stepping : 1 cpu MHz : 2200.000 cache size : 512 KB cpuid level : 1 bogomips : 4404.88 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual The same programme and the same compiler option and the same compiler are used for these two machines. On the laptop: 174.650u 4.752s 3:01.70 98.7% On the local cluster under /home( i.e., under RAID1) 209.457u 1.116s 4:07.05 85.2% On the local cluster under /local 204.268u 1.696s 3:26.04 99.9% What I can not understand is that under /local area we are supposed to be achieving a good/better performance than the laptop ( at least it was the case when each machine was standing alone having the home for each user) But this is not the case in reality. When it comes to the situation where we have to run a large job, the difference is not in the order of "second" but order of "days". I am wondering if any of us in this mailing list has similar experience in the past and if you have, I would like to know how you solved this type of problems. At the moment, I have just transfered the home directories and software directories into the the internal harddisk (discarding the raid 1 system) as a test and when we calculate and produce the results under home directories or under the spare local area of the server, again the system is too slow to bear.. Thank you very much Best wishes, Fumie _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf