Hi, I assume that someone here could maybe help me, and I'm hoping it's not too much off topic. I have 2 arrays of 2d point coordinates and would like to calculate all pairwise distances as fast as possible. Going from Python/Numpy to a (Swigged) C extension already gave me a 55x speedup. (.9ms vs. 50ms for arrays of length 329 and 340). I'm using gcc on Linux. Now I'm wondering if I could go even faster !? My hope that the compiler might automagically do some SSE2 optimization got disappointed. Since I have a 4 core CPU I thought OpenMP might be an option; I never used that, and after some playing around I managed to get (only) 50% slowdown(!) :-(
My code in short is this: (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i) -------<Ccode> ---------- void dists2d( double *a_ps, int nx1, int na, double *b_ps, int nx2, int nb, double *dist, int nx3, int ny3) throw (char*) { if(nx1 != 2) throw (char*) "a must be of shape (n,2)"; if(nx2 != 2) throw (char*) "b must be of shape (n,2)"; if(nx3 != nb || ny3 != na) throw (char*) "c must be of shape (na,nb)"; double *dist_; int i, j; #pragma omp parallel private(dist_, j, i) { #pragma omp for nowait for(i=0;i<na;i++) { //num_threads=omp_get_num_threads(); --> 4 dist_ = dist+i*nb; // dists_ is only introduced for OpenMP for(j=0;j<nb;j++) { *dist_++ = sqrt( sq(a_ps[i*nx1] - b_ps[j*nx2]) + sq(a_ps[i*nx1+1] - b_ps[j*nx2+1]) ); } } } } -------</Ccode> ---------- There is probably a simple mistake in this code - as I said I never used OpenMP before. It should be not too difficult to use OpenMP correctly here or - maybe better - is there a simple SSE(2,3,4) version that might be even better than OpenMP... !? I supposed, that I did not get the #pragma omp lines right - any idea ? Or is it in general not possible to speed this kind of code up using OpenMP !? Thanks, Sebastian Haase _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion