> > > > > > On Apr 26 20:46:51, b...@comstyle.com wrote: > > > > > > > Implement SSE2 lrint() and lrintf() on amd64. > > > > > > > > > > > > I don't think this is worth the added complexity: > > > > > > seven more patches to have a different lrint()? > > > > > > Does it make the resampling noticably better/faster? > > > > > > > > https://github.com/libsndfile/libsndfile/pull/663 > > > -> https://quick-bench.com/q/OabKT-gEOZ8CYDriy1JEwq1lEsg > > > where there's a huge difference in clang builds. > > > > Sorry, I don't understand at all how this concerns > > the OpenBSD port of libsamplerate: the Benchmark does not > > mention an OS or an architecture, so what is this being run on? > > > > Anyway, just running it (Run Benchmark) gives the result > > of cpu_time of 722.537 for BM_d2les_array (using lrint) > > and cpu_time of 0 for BM_d2les_array_sse2 (using psf_lrint), > > reporting a speedup ratio of 200,000,000. > > > > That's not an example of what I have in mind: a simple application > > of libsamplerate, sped up by the usage of the new SSE2 lrint
> OK, here is a test that's a modified version of what Stuart linked, > testing the performance of the lrint() itself (code below). A better test below, lrint()ing a random sequence. The SSE version is slower on every SSE2 machine I tried. Is that the case for you too? Jan #include <immintrin.h> #include <math.h> static inline int psf_lrint(double const x) { return _mm_cvtsd_si32(_mm_load_sd(&x)); } static void d2l(const double *src, long *dst, size_t len) { for (size_t i = 0; i < len; i++) dst[i] = lrint(src[i]); } static void d2l_sse(const double *src, long *dst, size_t len) { for (size_t i = 0; i < len; i++) dst[i] = psf_lrint(src[i]); } int main() { size_t i, len = 1000 * 1000 * 100; double *src = NULL; long *dst = NULL; src = calloc(len, sizeof(double)); dst = calloc(len, sizeof(long)); arc4random_buf(src, len * sizeof(double)); d2l_sse(src, dst, len); /*d2l(src, dst, len);*/ return 0; }