Profile your code... oprofile or, if you want to get fancy, set up TAU ;) Eric Thibodeau
Reuti wrote: > Hi, > > Am 13.11.2007 um 10:50 schrieb [EMAIL PROTECTED]: > >> I am clustering 8 32-bits machines and 5 64 bits machines. >> 64 bits machines are heavily used for MPI jobs >> and rest of the machines are simply being terminal for >> PhD students. >> One of 64-bits machine has all the users' home >> directory and softwares (not kernel) >> for both 32bits machines and 64 bits machines >> and it exports these directories to client machines via NFS. >> About 10 LDAP users are maintained by the server. >> This server has 2 external harddisk connected by firewire >> and these two harddisks are used to store the software and home >> directories for users under RAID 1 system. >> We deliverately did NOT use the internal harddisk >> for the home directory because the internal harddisk (each machine has >> about 500-750 GB) on each machine >> can be used for the calculation. >> Raid 1 slows down the system a bit but we did not expect any reduction >> of the speed of calculation using the local harddisk. >> >> When I did a very small calculation >> on my laptop which is standing alone, >> the execution time was faster than my local cluster machine >> ( no parallelisation, serial job ). >> >> The laptop has the CPU which is 32bit with 1G of memory and has >> vendor_id : GenuineIntel >> model name : Intel(R) Pentium(R) 4 Mobile CPU 2.00GHz >> stepping : 4 >> cpu MHz : 1994.395 >> cache size : 512 KB >> cpuid level : 2 >> bogomips : 3992.82 >> >> The test machine in the cluster has dual core with 4GB of memory and >> each core has >> vendor_id : AuthenticAMD >> model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ >> stepping : 1 >> cpu MHz : 2200.000 >> cache size : 512 KB >> cpuid level : 1 >> bogomips : 4404.88 >> TLB size : 1024 4K pages >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 40 bits physical, 48 bits virtual >> >> The same programme and the same compiler option and the same compiler >> are used for these two machines. > > maybe the program you compiled was optimized to run on Intel CPUs, > hence perform worse on AMD's. What compiler and what options were > used? Even any BLAS or LAPACK lib might be optimized for one of the > platforms. > > Were other jobs running at the same time on the cluster node? > > -- Reuti > >> On the laptop: >> 174.650u 4.752s 3:01.70 98.7% >> >> On the local cluster under /home( i.e., under RAID1) >> 209.457u 1.116s 4:07.05 85.2% >> >> On the local cluster under /local >> 204.268u 1.696s 3:26.04 99.9% >> >> What I can not understand is that >> under /local area we are supposed to be achieving >> a good/better performance than the laptop >> ( at least it was the case when each machine >> was standing alone having the home for each user) >> But this is not the case in reality. >> >> When it comes to the situation where we have to run >> a large job, the difference is not in the order of "second" >> but order of "days". >> >> I am wondering if any of us in this mailing list >> has similar experience in the past and if you have, >> I would like to know how you solved this type of problems. >> At the moment, I have just transfered the home directories and >> software directories into the the internal harddisk (discarding the raid >> 1 system) as a test and when we calculate and produce the >> results under home directories or under the spare local area of the >> server, again the system is too slow to bear.. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf