Signed division with rounding towards -infinity (and floating point rounding)

2007-09-18 Thread Christopher Key
Hello,

Apologies if this has already been covered; I've searched the archives
and not found anything.

I have some code that needs to perform signed division by a power of two
with rounding towards minus infinity, i.e. it requires an arithmetic
right shift.  Now in the C specification, right shifting a signed
integer is implementation defined.  gcc's behaviour is implement this as
a right shift, but it's possible that the code will be compiled on
different platforms where the implemented behaviour is different.

What I'm looking for is a way of expressing myself that will always give
the correct behaviour on all c99 implementations, but that gcc will
recognise as being equivalent to an arithmetic right shift, and hence
implement as such.


The most concise form that I've found so far is:

const int d = 8; // 16, 32 etc
x = y / d - ((y % d < 0) ? 1 : 0)

although this still produces rather longer code (see example below).


On a similar point, is there a good way get floats rounded to the
nearest integer value rather than truncated.  The following give the
correct rounding behaviour (I'm only interested in +ve values),

x = (int) (f + 0.5)

although gets compiled as an addition of 0.5 followed by a truncation
(again, see example).


Please note that I'm not expecting gcc to recognise these specific
cases, I'm just wondering if logic has been built in to recognise
certain specific constructs and optimise them appropriately.

Regards,

Chris



Example follows:

# gcc -v
Using built-in specs.
Target: i586-suse-linux
Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib
--enable-languages=c,c++,objc,fortran,obj-c++,java,ada
--enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.1.2
--enable-ssp --disable-libssp --disable-libgcj --with-slibdir=/lib
--with-system-zlib --enable-shared --enable-__cxa_atexit
--enable-libstdcxx-allocator=new --program-suffix=-4.1
--enable-version-specific-runtime-libs --without-system-libunwind
--with-cpu=generic --host=i586-suse-linux
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)


# cat div.c
#include 

int div8(int x) { return (x / 8) - ((x % 8 < 0) ? 1 : 0); }
int sft3(int x) { return x>>3; }
int trnc(double d) { return (int) d; }
int rnd(double d) { return (int) (d + 0.5); }

int main (void) {
int x = -5;

fprintf(stderr, "%d\n", div8(x));
fprintf(stderr, "%d\n", div8(x));

fprintf(stderr, "%d\n", trnc(42.6));
fprintf(stderr, "%d\n", rnd(42.6));

return 0;
}

# gcc -O3 --save-temps div.c

#cat div.s

.file   "div.c"
.text
.p2align 4,,15
.globl div8
.type   div8, @function
div8:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %edx
popl%ebp
movl%edx, %ecx
sarl$31, %ecx
shrl$29, %ecx
leal(%ecx,%edx), %edx
movl%edx, %eax
andl$7, %edx
subl%ecx, %edx
sarl$3, %eax
shrl$31, %edx
subl%edx, %eax
ret
.size   div8, .-div8
.p2align 4,,15
.globl sft3
.type   sft3, @function
sft3:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %eax
popl%ebp
sarl$3, %eax
ret
.size   sft3, .-sft3
.p2align 4,,15
.globl trnc
.type   trnc, @function
trnc:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
fnstcw  -2(%ebp)
fldl8(%ebp)
movzwl  -2(%ebp), %eax
movb$12, %ah
movw%ax, -4(%ebp)
fldcw   -4(%ebp)
fistpl  -8(%ebp)
fldcw   -2(%ebp)
movl-8(%ebp), %eax
leave
ret
.size   trnc, .-trnc
.section.rodata.cst4,"aM",@progbits,4
.align 4
.LC1:
.long   1056964608
.text
.p2align 4,,15
.globl rnd
.type   rnd, @function
rnd:
pushl   %ebp
movl%esp, %ebp
subl$8, %esp
fnstcw  -2(%ebp)
flds.LC1
faddl   8(%ebp)
movzwl  -2(%ebp), %eax
movb$12, %ah
movw%ax, -4(%ebp)
fldcw   -4(%ebp)
fistpl  -8(%ebp)
fldcw   -2(%ebp)
movl-8(%ebp), %eax
leave
ret
.size   rnd, .-rnd
.section.rodata.str1.1,"aMS",@progbits,1
.LC3:
.string "%d\n"
.text
.p2align 4,,15
.globl main
.type   main, @function
main:
leal4(%esp), %ecx
andl$-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl%esp, %ebp
pushl   %ecx
subl$20, %esp
movl$-1, 8(%esp)
movl$.LC3, 4(%esp)
movlstderr, %eax
movl%eax, (%esp)
callfprintf
movl$-1, 8(%esp)
movl$.LC3, 4(%esp

Re: Signed division with rounding towards -infinity (and floating point rounding)

2007-09-18 Thread Christopher Key
Tim Prince wrote:
> Christopher Key wrote:
>
>   
>> I have some code that needs to perform signed division by a power of two
>> with rounding towards minus infinity, i.e. it requires an arithmetic
>> right shift.  Now in the C specification, right shifting a signed
>> integer is implementation defined. 
>> 
> Because C may be compiled for ints other than 2's complement.  Perhaps
> you could use pre-processing to specialize for 2's complement for most
> of your targets.
>   
Indeed so.  It's presumably quite simple to implement a preprocessor
macro to detect whether the target system is 2's complement or not. 
Does this guarantee however that a right shift on a signed value will be
implemented as a arithmetic shift?
>   
>> On a similar point, is there a good way get floats rounded to the
>> nearest integer value rather than truncated.  The following give the
>> correct rounding behaviour (I'm only interested in +ve values),
>>
>> x = (int) (f + 0.5)
>>
>> 
> I don't see how this is similar, unless you are harking back to the
> early days of BASIC when there were 2's complement floating point
> implementations.  IEEE standards ruled that out over 20 years ago.  Even
> among those 2's complement implementations which did exist, the results
> varied for -1. < f < -.5.  If you use this only for positive f, the case
>  for which most people would consider this wrong is where f takes odd
> integral values, so your code will increment the result.  But you said
> this is OK, so where is the problem?
>
> Why not use the lrint() and related C intrinsics?
>   
Sorry, perhaps didn't make myself clear.

In both case, I want behaviour that can expressed very tersely in
assembler, but not equivalently so in c.  What I was wanting to write is
code that will do as intended under any c(99) implementation, but that
gcc can see as being equivalent to its terse assembler implementation.

The first case is slightly different in that I can express what I want
(i.e. y = x>>d) and have it compile to suitable assembler, but the code
isn't guaranteed to behave as desired on an arbitrary compiler.

using rint() is an alternative, although I was put off for one reason. 
As this is library code, I don't want to change the global execution
state (is this the correct term) without restoring it afterwards, and I
didn't really want to add fegetround() and fesetround() wrappers around
various bits of code.

Regards,

Chris