On Mon, Aug 15, 2011 at 5:25 PM, Michael Matz <m...@suse.de> wrote: > On Mon, 15 Aug 2011, Michael Matz wrote: > >> > > .LFB0: >> > > .cfi_startproc >> > > movsd .LC0(%rip), %xmm2 >> > > movapd %xmm0, %xmm1 >> > > andpd %xmm2, %xmm1 >> > > andnpd %xmm0, %xmm2 >> > > addsd .LC1(%rip), %xmm1 >> > > roundsd $1, %xmm1, %xmm1 >> > > orpd %xmm2, %xmm1 >> > > movapd %xmm1, %xmm0 >> > > ret >> > >> > Hm, why do we need the sign-copy? If I read the docs correctly >> > we can simply use roundsd directly, no? >> >> round-half-away-from-zero breaks your neck. round[ps][sd] only supports >> the usual four IEEE rounding modes. > > But, you should be able to apply the sign to the 0.5, which wouldn't > require building the absolute value of input: > > round(x) = trunc(x + (copysign (0.5, x))) > > which should roughly be expanded to: > > movsd signbits(%rip), %xmm1 > andpd %xmm0, %xmm1 > movsd nextof0.5(%rip), %xmm2 > orpd %xmm1, %xmm2 > addpd %xmm2, %xmm0 > roundsd $1, %xmm0, %xmm0 > ret > > Which has one logical operation less (and one move because I chose a more > optimal register assignment).
Thanks for the suggestion, I will implement and test it ASAP. Uros.