Hello, For a while now, I've been trying to come up with a fast Haskell-only function which implements Euclidean distance in n-dimensional space.
So far, I've been disappointed by the performance of my best effort
in Haskell, compared to C. I'm hoping some of the Haskell experts
and/or performance gurus on this list can help me out on resolving this,
and also maybe shed some light on some strange (imho) things I've run
into.
My current best try uses the uvector package, has two 'vectors' of type
(UArr Double) as input, and relies on the sumU and zipWithU functions
which use streaming to compute the result:
dist_fast :: UArr Double -> UArr Double -> Double
dist_fast p1 p2 = sumDs `seq` sqrt sumDs
where
sumDs = sumU ds
ds = zipWithU euclidean p1 p2
euclidean x y = d*d
where
d = x-y
I've been benchmarking this function against various alternatives
using the MicroBench [1] package, which allows to get accurate timings
of
single function calls.
I've also mimicked the MicroBench approach in pure C, to get comparable
timings for a C-only implementation.
The C-only function is quite straightforward:
double dist(int dim, double* p1, double* p2){
int i;
double d = 0.0;
for(i=0; i < dim; i++){
d += (p1[i] - p2[i])*(p1[i] - p2[i]);
}
return sqrt(d);
}
(Note that the optimizer takes care of the p1-p2 operation
appearing twice in the code).
All code is attached if you'd like to play around with it.
All numbers reported below are using GHC 6.10.2 and gcc 4.3.3
on Linux/x86.
The compilation details can be found in the Makefile attached, but
in short, I'm using -O2 -fexcess-precision or
-O2 -fexcess-precision -fvia-C -optc-O3 with GHC, and -O3 with gcc.
Dist.hs
Description: Binary data
microbench_dist.c
Description: Binary data
microbench_dist.hs
Description: Binary data
dist_c.c
Description: Binary data
dist_c.h
Description: Binary data
Makefile
Description: Binary data
Now the bad news: for small dimensions, e.g. 2D/3D-space, the dist_fast function is 70-240x slower than a pure C implementation, depending on the architecture. For example, in 2D-space on an Intel Pentium 4 (3.0GHz, 1M L2 cache),a single call to dist_fast takes about 1.75 microseconds (or 0.00000175s),
while a call to dist_c (C implementation of Eucl. dist), takes about 0.025 microseconds (70x slowdown). On a Core 2 2.5GHz with 6MB L2 this maps to 1.9 and 0.008 microseconds,resp. (i.e. 240x slower), while on a Core i7 2.66GHz with 8MB L2 the numbers
are 1.53 and 0.011 microseconds (i.e. 140x slower). For larger dimensions, the gap becomes less big, but is still worrying: 10D: 40-110x; 100D: 10-17x; >1kD: 2.5x-6x. I'm mostly interested in the range 10D to 100D, so seeing that Haskell is over 10x and up to 100x slower than C is kind of making me cry. I've tried some things to improve on this without much luck, on the contrary: *) Marking dist_fast for inlining makes things worse; in general the inlined version is 2x slower for low dimensionality, and even 5x slower for larger dimensionality. This was somewhat surprising to me... *) In a moment of weakness, I used the Foreign Function Interface to call the dist_c C-only implementation from Haskell. Unfortunately, there seems to be a lot of overhead in calling dist_c from Haskell. Most of the performance gain from using C melts away, and sometimes the performance of the FFI'd dist_c is 15-30% worse than the native dist_fast version (especially at low dimensionality). Only for the largest dimensionalities (10k-100kD), the FFI'd version reaches the performance of the native C approach. But, since I'm mostly interested in the 10-100D range, this is of little use to me. One thing I noticed is that compiling through C using -fvia-C -optc-O3 might be a bad idea, depending on your system. On an Intel Pentium 4 system, -fvia-C -optc-O3 was giving me a speedup of up 70% (large dim.), while on Core 2 and Core i7 it resulted in a slowdown of 15-20% ! I was using roughly equal versions of GCC with this, i.e. a self-bootstrapped GCC 4.3.x. So, my question to the list if simple: how can I get better performance out of a Haskell-only approach? Any comments/suggestions are highly appreciated. I'd prefer a Haskell-only approach, but my main concern is speed.The Euclidean distance function will be used quite heavily in various tools.
I currently have a C-version of some of the tools, but the amount of code that is needed for those tools is becoming ridiculously big. I believe using Haskell will allow me to come up with a more easy to maintain code base. However,
I don't want to pay a huge price for this in terms of performance. greetings, Kenneth [1] MicroBench: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/microbench -- Kenneth Hoste Paris research group - ELIS - Ghent University, Belgium email: [email protected] website: http://www.elis.ugent.be/~kehoste blog: http://boegel.kejo.be
_______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe
