Keith Goodman <[EMAIL PROTECTED]> [2007-04-18 12:46]: > Thanks for that. For a variety of reasons I'm sticking with atlas. > Does the parallel flag give you a big speed increase? I imagine it > speeds things up more for larger matrices.
Surprisingly little. Below are the results of running Scimark with various icc and gcc compiler flags set. The maximum Scimark score is 55% larger with icc than with gcc, though there may be flags other than -O3 that would help gcc. The optimized (for Xeon, not for Core 2 Duo) LINPACK that ships with MKL runs at about 7 gigaflops max on my Core 2 Duo overclocked to 2.93 GHz (it's different from LINPACK 1000). There is a Core 2 Duo optimized version for OSX. icc with no flags set: > icc *.c -o no_flags > ./noflags -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 605.84 FFT Mflops: 111.70 (N=1048576) SOR Mflops: 868.52 (1000 x 1000) MonteCarlo: Mflops: 120.37 Sparse matmult Mflops: 853.33 (N=100000, nz=1000000) LU Mflops: 1075.27 (M=1000, N=1000) > icc -fast *.c -o fast > ./fast -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 785.63 FFT Mflops: 108.31 (N=1048576) SOR Mflops: 985.81 (1000 x 1000) MonteCarlo: Mflops: 848.81 Sparse matmult Mflops: 825.81 (N=100000, nz=1000000) LU Mflops: 1159.42 (M=1000, N=1000) > icc -fast -parallel *.c -o fast_para IPO: performing multi-file optimizations IPO: generating object file /tmp/ipo_iccvHW42m.o scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED. kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED. kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED. > ./fast_para -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 796.33 FFT Mflops: 111.70 (N=1048576) SOR Mflops: 1001.91 (1000 x 1000) MonteCarlo: Mflops: 855.57 Sparse matmult Mflops: 832.52 (N=100000, nz=1000000) LU Mflops: 1179.94 (M=1000, N=1000) > icc -fast -parallel -fno-alias *.c -o fast_para_noali IPO: performing multi-file optimizations IPO: generating object file /tmp/ipo_iccLUySDv.o scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED. kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED. kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED. > ./fast_para_noali -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 890.46 FFT Mflops: 109.70 (N=1048576) SOR Mflops: 1488.28 (1000 x 1000) MonteCarlo: Mflops: 855.57 Sparse matmult Mflops: 829.15 (N=100000, nz=1000000) LU Mflops: 1169.59 (M=1000, N=1000) > icc -fast -parallel -fno-alias -funroll-loops *.c -o fast_para_noali_unr IPO: performing multi-file optimizations IPO: generating object file /tmp/ipo_icc2KA1ui.o scimark2.c(63) : (col. 18) remark: LOOP WAS VECTORIZED. kernel.c(157) : (col. 13) remark: LOOP WAS VECTORIZED. kernel.c(212) : (col. 17) remark: LOOP WAS VECTORIZED. > ./fast_para_noali_unr -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 901.11 FFT Mflops: 113.48 (N=1048576) SOR Mflops: 1510.28 (1000 x 1000) MonteCarlo: Mflops: 865.92 Sparse matmult Mflops: 835.92 (N=100000, nz=1000000) LU Mflops: 1179.94 (M=1000, N=1000) > gcc -lm *.c -o ggc_none > ./ggc_none -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 323.63 FFT Mflops: 83.56 (N=1048576) SOR Mflops: 729.97 (1000 x 1000) MonteCarlo: Mflops: 73.75 Sparse matmult Mflops: 329.26 (N=100000, nz=1000000) LU Mflops: 401.61 (M=1000, N=1000) > gcc -lm -O3 *.c -o ggc_O3 > ./gcc_O3 -large ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to [EMAIL PROTECTED]) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 580.55 FFT Mflops: 108.86 (N=1048576) SOR Mflops: 842.27 (1000 x 1000) MonteCarlo: Mflops: 115.70 Sparse matmult Mflops: 825.81 (N=100000, nz=1000000) LU Mflops: 1010.10 (M=1000, N=1000) -rex _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion