> > > > > > On Apr 26 20:46:51, b...@comstyle.com wrote:
> > > > > > > Implement SSE2 lrint() and lrintf() on amd64.
> > > > > > 
> > > > > > I don't think this is worth the added complexity:
> > > > > > seven more patches to have a different lrint()?
> > > > > > Does it make the resampling noticably better/faster?
> > > > > 
> > > https://github.com/libsndfile/libsndfile/pull/663
> > > -> https://quick-bench.com/q/OabKT-gEOZ8CYDriy1JEwq1lEsg
> > > where there's a huge difference in clang builds.
> > 
> > Sorry, I don't understand at all how this concerns
> > the OpenBSD port of libsamplerate: the Benchmark does not
> > mention an OS or an architecture, so what is this being run on?
> > 
> > Anyway, just running it (Run Benchmark) gives the result
> > of cpu_time of 722.537 for BM_d2les_array (using lrint)
> > and cpu_time of 0 for BM_d2les_array_sse2 (using psf_lrint),
> > reporting a speedup ratio of 200,000,000.
> > 
> > That's not an example of what I have in mind: a simple application
> > of libsamplerate, sped up by the usage of the new SSE2 lrint

> OK, here is a test that's a modified version of what Stuart linked,
> testing the performance of the lrint() itself (code below).

A better test below, lrint()ing a random sequence.
The SSE version is slower on every SSE2 machine I tried.
Is that the case for you too?

        Jan


#include <immintrin.h>
#include <math.h>

static inline int 
psf_lrint(double const x)
{
        return _mm_cvtsd_si32(_mm_load_sd(&x));
}

static void
d2l(const double *src, long *dst, size_t len)
{
        for (size_t i = 0; i < len; i++)
                dst[i] = lrint(src[i]);
}

static void
d2l_sse(const double *src, long *dst, size_t len)
{
        for (size_t i = 0; i < len; i++)
                dst[i] = psf_lrint(src[i]);
}

int
main()
{
        size_t i, len = 1000 * 1000 * 100;
        double *src = NULL;
        long *dst = NULL;

        src = calloc(len, sizeof(double));
        dst = calloc(len, sizeof(long));

        arc4random_buf(src, len * sizeof(double));
        d2l_sse(src, dst, len);
        /*d2l(src, dst, len);*/

        return 0;
}

Reply via email to