[Bug c++/32270] New: warning for potential undesired operator& and operator== evaluation order

2007-06-10 Thread bugzilla at little-bat dot de
Hello,

i don't know whether this was already requested way back in 1985, and maybe
there is an evangelical answer to this. 

This is a request to add a compiler option for warnings if the evaluation of
operator& and operator== (and similar) may be not 'as expected'. I personally
feel it in most cases utter nonsens to apply operator& on a boolean result, as
it is defined in the C/C++ operator precedence hierarchy; and every now and
then i forget to add brackets around each and every operation in doubt in an
expression and then i'm puzzled for hours what goes wrong.

Example:

  if( value&mask == $7F00 ) { … }

does not evaluate as expected and, in my opionion, the only reasonable way, but
evaluates as:

  if( value & (mask==$7F00) ) { … }

The reasoning for adding this warning is comparable to the reasoning which lead
to add a warning if the result of operator= is used as part of the boolean
expression in "if(…){…}": It is _probably_ not what the programmer intended.
The way to circumvent the warning (if enabled) would be the same as for
operator= too: add brackets around the expression. Eventually these warnings
could even be combined into a single compiler option, but they probably should
go into the same compiler option sets.

Exact scope for this kind of warning should be:

  operator &, ^ and |   versus   operator == and !=

because it makes no sense to apply a bit masking operator on a boolean result,
as it is done if no brackets are used to reorder the sequence of evaluation.

And the same applies in my eyes to:

  operator << and >>   versus   operator +, -, *, / and %

because operator<< and >> do an exponentation 2**n and the priority of
exponentation is (should be) higher than that of multiplication.

The C/C++ standard cannot be changed, though it handles it wrong in my opinion,
but adding a warning if the default evaluation order is applied to 2 operations
from the above sets would be very appreciated in my eyes.

Thanks for an answer,

... kio !


-- 
   Summary: warning for potential undesired operator& and operator==
evaluation order
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: bugzilla at little-bat dot de
 GCC build triplet: (GCC) 4.0.1 (Apple Computer, Inc. build 5367)
  GCC host triplet: powerpc-apple-darwin8-g++-4.0.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #11 from ubizjak at gmail dot com  2007-06-10 08:28 ---
I have experimented a bit with rcpss, trying to measure the effect of
additional NR step to the performance. NR step was calculated based on
http://en.wikipedia.org/wiki/N-th_root_algorithm, and for N=-1 (1/A) we can
simplify to:

x1 = x0 (2.0 - A X0)

To obtain 24bit precision, we have to use a reciprocal, two multiplies and
subtraction (+ a constant load).

First, please note that "divss" instruction is quite _fast_, clocking at 23
cycles, where approximation with NR step would sum up to 20 cycles, not
counting load of constant.

I have checked the performance of following testcase with various
implementetations on x86_64 C2D:

--cut here--
float test(float a)
{
  return 1.0 / a;
}


int main()
{
  float a = 1.12345;
  volatile float t;
  int i;

  for (i = 1; i < 10; i++)
{
  t += test (a);
  a += 1.0;
}

  printf("%f\n", t);

  return 0;
}
--cut here--

divss : 3.132s
rcpss NR  : 3.264s
rcpss only: 3.080s

To enhance the precision of 1/sqrt(A), additional NR step is calculated as

x1 = 0.5 X0 (3.0 - A x0 x0 x0)

and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of
clocks ;) ), additional NR step just isn't worth it.

The experimental patch:

Index: i386.md
===
--- i386.md (revision 125599)
+++ i386.md (working copy)
@@ -15399,6 +15399,15 @@
 ;; Gcc is slightly more smart about handling normal two address instructions
 ;; so use special patterns for add and mull.

+(define_insn "*rcpsf2_sse"
+  [(set (match_operand:SF 0 "register_operand" "=x")
+   (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm")]
+  UNSPEC_RCP))]
+  "TARGET_SSE"
+  "rcpss\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "mode" "SF")])
+
 (define_insn "*fop_sf_comm_mixed"
   [(set (match_operand:SF 0 "register_operand" "=f,x")
(match_operator:SF 3 "binary_fp_operator"
@@ -15448,6 +15457,29 @@
   (const_string "fop")))
(set_attr "mode" "SF")])

+(define_insn_and_split "*rcp_sf_1_sse"
+  [(set (match_operand:SF 0 "register_operand" "=x")
+   (div:SF (match_operand:SF 1 "immediate_operand" "F")
+   (match_operand:SF 2 "nonimmediate_operand" "xm")))
+   (clobber (match_scratch:SF 3 "=&x"))
+   (clobber (match_scratch:SF 4 "=&x"))]
+  "TARGET_SSE_MATH
+   && operands[1] == CONST1_RTX (SFmode)
+   && flag_unsafe_math_optimizations"
+   "#"
+   "&& reload_completed"
+   [(set (match_dup 3)(match_dup 2))
+(set (match_dup 4)(match_dup 5))
+(set (match_dup 0)(unspec:SF [(match_dup 3)] UNSPEC_RCP))
+(set (match_dup 3)(mult:SF (match_dup 3)(match_dup 0)))
+(set (match_dup 4)(minus:SF (match_dup 4)(match_dup 3)))
+(set (match_dup 0)(mult:SF (match_dup 0)(match_dup 4)))]
+{
+  rtx two = const_double_from_real_value (dconst2, SFmode);
+
+  operands[5] = validize_mem (force_const_mem (SFmode, two));
+})
+
 (define_insn "*fop_sf_1_mixed"
   [(set (match_operand:SF 0 "register_operand" "=f,f,x")
(match_operator:SF 3 "binary_fp_operator"

Based on these findings, I guess that NR step is just not worth it. If we want
to have noticeable speed-up on division and square root, we have to use 12bit
implementations, without any refinements - mainly for benchmarketing, I'm
afraid.

BTW: on x86_64, patched gcc compiles "test" function to:

test:
movaps  %xmm0, %xmm1
rcpss   %xmm0, %xmm0
movss   .LC1(%rip), %xmm2
mulss   %xmm0, %xmm1
subss   %xmm1, %xmm2
mulss   %xmm2, %xmm0
ret


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug c++/32270] warning for potential undesired operator& and operator== evaluation order

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #1 from pinskia at gcc dot gnu dot org  2007-06-10 08:34 ---
The warning works on the trunk:
[pinskia-laptop:gcc/mips/gcc] pinskia% ./cc1plus t.c -W -Wall
 int f(int, int)
t.c:3: warning: suggest parentheses around comparison in operand of &


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

  GCC build triplet|(GCC) 4.0.1 (Apple Computer,|
   |Inc. build 5367)|
   GCC host triplet|powerpc-apple-darwin8-g++-  |
   |4.0.1   |
   Keywords||diagnostic
Summary|warning for potential   |warning for potential
   |undesired operator& and |undesired operator& and
   |operator== evaluation order |operator== evaluation order
Version|4.3.0   |4.0.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270



[Bug c++/32270] warning for potential undesired operator& and operator== evaluation order

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #2 from pinskia at gcc dot gnu dot org  2007-06-10 08:38 ---
Well that is because it was fixed on the trunk last December by:
2006-12-13  Ian Lance Taylor  <[EMAIL PROTECTED]>

PR c++/19564
PR c++/19756

This is a dup of bug 19564.

*** This bug has been marked as a duplicate of 19564 ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270



[Bug c++/19564] -Wparentheses does not work with the C++ front-end

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #10 from pinskia at gcc dot gnu dot org  2007-06-10 08:38 
---
*** Bug 32270 has been marked as a duplicate of this bug. ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||bugzilla at little-bat dot
   ||de


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19564



[Bug libstdc++/31970] set<>::iterator vs type-safety

2007-06-10 Thread chris at bubblescope dot net


--- Comment #8 from chris at bubblescope dot net  2007-06-10 08:57 ---
Hmm.. I thought I did have a good example, I had a function that looked like:

template
int count_unique(It begin, It end)
{
  set counter(begin, end);
  return counter.size();
}

But, while you might get multiple copies of this function for each iterator
type, the "work parts" (the building of the set and the call to size()) will be
the same regardless of if this is fixed.

The only good example I can come up with would be if someone decided to build
multiple maps of set::iterators, which I've never wanted to do...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31970



[Bug preprocessor/32271] New: Incorrect warnings in disabled code.

2007-06-10 Thread pcmoen at gmail dot com
The preprocessor will report warnings when there is an unterminated ' or " in a
disabled section.

Example code that triggers two warnings:
 Code begin 
#if 0
This shouln"t cause a problem.
This shouln't cause a problem.
#endif

int
main()
{
return 0;
}
 Code end 

Output from the preprocessor:
$ cpp-4.2 -v -save-temps bug.cpp 
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr
--enable-targets=all --disable-werror --enable-checking=release
--build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.1 20070528 (prerelease) (Ubuntu 4.2-20070528-0ubuntu2)
 /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -E -quiet -v -D_GNU_SOURCE bug.cpp
-mtune=generic -fpch-preprocess
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory
"/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/c++/4.2
 /usr/include/c++/4.2/i486-linux-gnu
 /usr/include/c++/4.2/backward
 /usr/local/include
 /usr/lib/gcc/i486-linux-gnu/4.2.1/include
 /usr/include
End of search list.
# 1 "bug.cpp"
# 1 ""
# 1 ""
# 1 "bug.cpp"
bug.cpp:2:12: warning: missing terminating " character
bug.cpp:3:12: warning: missing terminating ' character





int
main()
{
 return 0;
}


-- 
   Summary: Incorrect warnings in disabled code.
   Product: gcc
   Version: 4.2.1
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: preprocessor
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: pcmoen at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271



[Bug preprocessor/32271] Incorrect warnings in disabled code.

2007-06-10 Thread pcmoen at gmail dot com


--- Comment #1 from pcmoen at gmail dot com  2007-06-10 09:25 ---
Created an attachment (id=13672)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13672&action=view)
Test case that shows the error.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271



[Bug preprocessor/32271] Incorrect warnings in disabled code.

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #2 from pinskia at gcc dot gnu dot org  2007-06-10 09:34 ---
Actually the warning is correct as the code is undefined at compile time and
this is documented:

# Do not use @code{#if 0} for comments which are not C code.  Use a real
# comment, instead.  The interior of @code{#if 0} must consist of complete
# tokens; in particular, single-quote characters must balance.



*** This bug has been marked as a duplicate of 14634 ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271



[Bug preprocessor/14634] Unterminated literals not diagnosed

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #13 from pinskia at gcc dot gnu dot org  2007-06-10 09:34 
---
*** Bug 32271 has been marked as a duplicate of this bug. ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||pcmoen at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14634



[Bug target/32264] gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3

2007-06-10 Thread axel at freakout dot de


--- Comment #5 from axel at freakout dot de  2007-06-10 10:05 ---
Subject: Re:  gcc 4.2.0 compiled vanilla kernel
 2.4.34.5 crashes when VIA C3 optimized -march=c3

According to rguenth at gcc dot gnu dot org:
> 
> --- Comment #4 from rguenth at gcc dot gnu dot org  2007-06-09 10:27 
> ---
> We need this reduced to a managable testcase that gcc miscompiles.
> 

Sorry - but kernel debugging in the early boot stage goes far
beyond my capabilities. I tried to gather as much information as
i can. The crash can be reproduced just with the kernel itself,
no modules involved.

I've added an archive with two dirs gcc-4.1.2 and gcc-4.2.0 - in
each dir is the compiled kernel vmlinux and the boot image
vmlinuz, with can be loaded by any bootloader (grub, lilo,
syslinux, loadlin, ...).  i also added the corresponding
System.map's.

The kernel were produced from identical (the same) source trees
with gcc 4.1.2 and gcc 4.2.0 on the same machine.  The gcc 4.1.2
compiled kernel boots until panic - no root fs - works ok.  The
gcc 4.2.0 kernel crashes with this output:

==
Kernel command line: BOOT_IMAGE=vmlinuz4.434
Initializing CPU#0
Detected 797.420 MHz processor.
Console: colour VGA+ 80x25
Unable to handle kernel paging request at virtual address
f000fec4
 printing eip:
c0295690
*pde = 
Oops: 0002
CPU:0
EIP:0010:[]   Not tainted
EFLAGS: 00010017
eax: f00fec4ebx:    ecx: 0037   edx: 0010
esi: 000994c1   edi: c0105000   ebp: 0008e000   esp: c0251fe4
ds: 0018   es: 0020   ss: 0018
Process swapper (pid: 0, stackpage=c0251000)
Stack: 0020 c0252290 0010 0216 c0252630 c0295ae0
c0100191
Call Trace:

Code: 10 00 f3 a5 ea 19 00 00 90 bf f4 3f 8e d8 8e d0 3f a3 c1 8c
 <0>Kernel panic: Attempted to kill the idle task!
In idle task - not syncing
==

This output is also in the archive dir of gcc-4.2.0/crash.txt

The working kernel (produced from gcc-4.1.2) prints:
==
Calibrating delay loop... 1592.52 BogoMIPS
==
at the point where the gcc-4.2.0 produced kernel crashes with the
above messages.

Hope this helps.

Axel


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32264



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #12 from ubizjak at gmail dot com  2007-06-10 10:47 ---
Here are the results of mubench insn timings for various x86 processors:
http://mubench.sourceforge.net/results.html (target processor can be
benchmarked by downloading mubench from
http://mubench.sourceforge.net/index.html).

And finally an interesting read how commercial compilers trade accurracy for
speed (please read at least about SPEC2006 benchmark):
http://www.hpcwire.com/hpc/1556972.html


-- 

ubizjak at gmail dot com changed:

   What|Removed |Added

 CC||ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread jb at gcc dot gnu dot org


--- Comment #13 from jb at gcc dot gnu dot org  2007-06-10 11:06 ---
(In reply to comment #11)

Thanks for the work.

> First, please note that "divss" instruction is quite _fast_, clocking at 23
> cycles, where approximation with NR step would sum up to 20 cycles, not
> counting load of constant.
> 
> I have checked the performance of following testcase with various
> implementetations on x86_64 C2D:
> 
> --cut here--
> float test(float a)
> {
>   return 1.0 / a;
> }
>
> divss : 3.132s
> rcpss NR  : 3.264s
> rcpss only: 3.080s

Interesting, on ubuntu/i686/K8 I get (average of 3 runs)

divss: 7.485 s
rcpss NR: 9.915 s

> To enhance the precision of 1/sqrt(A), additional NR step is calculated as
> 
> x1 = 0.5 X0 (3.0 - A x0 x0 x0)
> 
> and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of
> clocks ;) ), additional NR step just isn't worth it.

Well, I suppose it depends on the hardware. IIRC older cpu:s did division with
microcode whereas at least core2 and K8 do it in hardware, so I guess the
hundreds of cycles doesn't apply to current cpu:s. 

Also, supposedly Penryn will have a much improved divider..

That being said, I think there is still a case for the reciprocal square root,
as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn
linked to in the first message in this PR (in short, ifort does sqrt(a/b) about
twice as fast as gfortran by using reciprocal approximations + NR). If indeed
div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it
suggests almost all the performance benefit ifort gets is due to the
rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn the
sqrt(a/b) loop fills an array, whereas your benchmark accumulates..

> Based on these findings, I guess that NR step is just not worth it. If we want
> to have noticeable speed-up on division and square root, we have to use 12bit
> implementations, without any refinements - mainly for benchmarketing, I'm
> afraid.

I hear that it's possible to pass spec2k6/gromacs without the NR step. As most
MD programs, gromacs spends almost all it's time in the force calculations,
where the majority of time is spent calculating 1/sqrt(...). So perhaps one
should watch out for compilers that get suspiciously high scores on that
benchmark. :)

No, I'm not suggesting gcc should do this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread rguenth at gcc dot gnu dot org


--- Comment #14 from rguenth at gcc dot gnu dot org  2007-06-10 12:07 
---
The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
the former have throughput of 1/16 while the latter are 1/1 (latencies compare
21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
(sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))

So the optimization would be mainly to improve instruction throughput, not
overall latency.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread rguenth at gcc dot gnu dot org


--- Comment #15 from rguenth at gcc dot gnu dot org  2007-06-10 12:09 
---
And of course optimizing division or square root this way violates IEEE 754
which
specifies these as intrinsic operations.  So a separate flag from
-funsafe-math-optimization should be used for this optimization.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug target/32264] gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3

2007-06-10 Thread axel at freakout dot de


--- Comment #6 from axel at freakout dot de  2007-06-10 13:00 ---
Subject: Re:  gcc 4.2.0 compiled vanilla kernel
 2.4.34.5 crashes when VIA C3 optimized -march=c3

please see:

 http://www.bnhof.de/~ho1158/gcc-4.2.0-Bug-target-32264.tar.bz2

for the kernle files mentioned above. It is too large to attach.

Axel


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32264



[Bug bootstrap/32272] New: make exit because build/genmodes.exe doesn't exist

2007-06-10 Thread jdeifik at weasel dot com
I started bash and ran

../gcc/configure --enable-threads

I then typed make

Here is the output:

TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \
/bin/sh ../gcc/mkconfig.sh config.h
TARGET_CPU_DEFAULT="" \
HEADERS="options.h config/i386/i386.h config/i386/unix.h
config/i386/bsd
.h config/i386/gas.h config/dbxcoff.h config/i386/cygming.h
config/i386/cygwin.h
 defaults.h" DEFINES="" \
/bin/sh ../gcc/mkconfig.sh tm.h
gawk -f ../gcc/opt-gather.awk ../gcc/ada/lang.opt ../gcc/fortran/lang.opt
../gcc
/java/lang.opt ../gcc/treelang/lang.opt ../gcc/c.opt ../gcc/common.opt
../gcc/co
nfig/i386/i386.opt ../gcc/config/i386/cygming.opt > tmp-optionlist
/bin/sh ../gcc/../move-if-change tmp-optionlist optionlist
echo timestamp > s-options
gawk -f ../gcc/opt-functions.awk -f ../gcc/opth-gen.awk \
   < optionlist > tmp-options.h
/bin/sh ../gcc/../move-if-change tmp-options.h options.h
echo timestamp > s-options-h
TARGET_CPU_DEFAULT="" \
HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \
/bin/sh ../gcc/mkconfig.sh bconfig.h
gcc -c   -g  -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
-Wmissing-p
rototypes -Wold-style-definition -Wmissing-format-attribute-DHAVE_CONFIG_H
-
DGENERATOR_FILE -I. -Ibuild -I../gcc -I../gcc/build -I../gcc/../include
-I../gcc
/../libcpp/include  -I../gcc/../libdecnumber -I../libdecnumber-o
build/error
s.o ../gcc/errors.c
build/genmodes.exe -h > tmp-modes.h
/bin/sh: build/genmodes.exe: No such file or directory
make: *** [s-modes-h] Error 127


-- 
   Summary: make exit because build/genmodes.exe doesn't exist
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jdeifik at weasel dot com
 GCC build triplet: i686-pc-cygwi
  GCC host triplet: i686-pc-cygwi
GCC target triplet: i686-pc-cygwi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32272



[Bug c/32273] New: 'restrict' is forgotten after loop unrolling

2007-06-10 Thread tomash dot brechko at gmail dot com
The following two functions are equivalent (especially after loop unrolling):

void
foo(const int *restrict a, int *restrict b, int *restrict c)
{
  b[0] += a[0];
  c[0] += a[0];

  b[1] += a[1];
  c[1] += a[1];
}

void
bar(const int *restrict a, int *restrict b, int *restrict c)
{
  for (int i = 0; i < 2; ++i)
{
  b[i] += a[i];
  c[i] += a[i];
}
}


However gcc forgets about 'restrict' after the first iteration of the loop, and
foo() and bar() produce different code:

foo:
pushl   %ebx
movl8(%esp), %ebx
movl12(%esp), %eax
movl16(%esp), %edx
movl(%ebx), %ecx
addl%ecx, (%eax)
addl%ecx, (%edx) ;; Correct: no reloading of (%ebx) is needed.
movl4(%ebx), %ecx
addl%ecx, 4(%eax)
addl%ecx, 4(%edx);; Correct: no reloading of 4(%ebx) is needed.
popl%ebx
ret

bar:
pushl   %ebx
movl8(%esp), %ebx
movl12(%esp), %edx
movl16(%esp), %ecx
movl(%ebx), %eax
addl%eax, (%edx)
addl%eax, (%ecx);; Correct: no reloading of (%ebx) is needed.
movl4(%ebx), %eax
addl%eax, 4(%edx)
movl4(%ebx), %eax   ;; BUG: unnecessary reloading of 4(%ebx).
addl%eax, 4(%ecx)
popl%ebx
ret

For any number of iterations only the first iteration honors the 'restrict'
qualifier.  This is wrong, because 'restrict' is a property of a pointer, not
data, so if p and q pointers reference different objects, then (p + OFF1) and
(q + OFF2) also expected to reference different objects.  Correct assembler for
foo() supports that.


-- 
   Summary: 'restrict' is forgotten after loop unrolling
   Product: gcc
   Version: 4.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tomash dot brechko at gmail dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #16 from ubizjak at gmail dot com  2007-06-10 16:24 ---
(In reply to comment #13)

> > x1 = 0.5 X0 (3.0 - A x0 x0 x0)

Whops! One x0 too much above. Correct calcualtion reads:

rsqrt = 0.5 rsqrt(a) (3.0 - a rsqrt(a) rsqrt(a)).

> Well, I suppose it depends on the hardware. IIRC older cpu:s did division with
> microcode whereas at least core2 and K8 do it in hardware, so I guess the
> hundreds of cycles doesn't apply to current cpu:s. 
> 
> Also, supposedly Penryn will have a much improved divider..

Well, mubench says for my Core2Duo that _all_ sqrt and div functions have
latency of 6 clocks and rcp throughput of 5 clks. By _all_ I mean divss, divps,
divsd, divpd, sqrtss, sqrtps, sqrtsd and sqrtpd. OTOH, rsqrtss and rcpss have
latency of 3 clks and rcp throughput of 2 clks. This is just amazing.

> That being said, I think there is still a case for the reciprocal square root,
> as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn
> linked to in the first message in this PR (in short, ifort does sqrt(a/b) 
> about
> twice as fast as gfortran by using reciprocal approximations + NR). If indeed
> div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it
> suggests almost all the performance benefit ifort gets is due to the
> rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn 
> the
> sqrt(a/b) loop fills an array, whereas your benchmark accumulates..

It is true, that only a trivial accumulation function is benchmarked by my
"benchmark". I can prepare a bunch of expanders to expand:

a / b <=> a [rcpss(b) (2.0 - b rcpss(b))]

a / sqrtss(b) <=> a [0.5 rsqrtss(b) (3.0 - b rsqrtss(b) rsqrtss(b))].

sqrtss (a) <=> a 0.5 rsqrtss(a) (3.0 - a rsqrtss(a) rsqrtss(a))

second and third case indeed look similar...

> I hear that it's possible to pass spec2k6/gromacs without the NR step. As most
> MD programs, gromacs spends almost all it's time in the force calculations,
> where the majority of time is spent calculating 1/sqrt(...). So perhaps one
> should watch out for compilers that get suspiciously high scores on that
> benchmark. :)

Yes, look at hpcwire article in Comment #12

> No, I'm not suggesting gcc should do this.

;))


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug target/32274] New: FAIL: gcc.dg/vect/pr32224.c

2007-06-10 Thread hjl at lucon dot org
I got

Executing on host: /export/build/gnu/gcc/build-ia64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-ia64-linux/gcc/
/net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr31343.c   -O2
-ftree-vectorize -fdump-tree-vect-details -fno-show-column -S  -o pr31343.s   
(timeout = 300)
PASS: gcc.dg/vect/pr31343.c (test for excess errors)
UNSUPPORTED: gcc.dg/vect/pr31699.c
UNSUPPORTED: gcc.dg/vect/pr32216.c
Executing on host: /export/build/gnu/gcc/build-ia64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-ia64-linux/gcc/
/net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c   -O2
-ftree-vectorize -fdump-tree-vect-details -fno-show-column -S  -o pr32224.s   
(timeout = 300)
/net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c: In
function 'gmpz_export':^M
/net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c:13:
error: invalid 'asm': ia64_print_operand: unknown code^M
compiler exited with status 1


-- 
   Summary: FAIL: gcc.dg/vect/pr32224.c
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: hjl at lucon dot org
GCC target triplet: ia64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32274



[Bug target/24894] ICE building newlib/libc/misc/init.c

2007-06-10 Thread eweddington at cso dot atmel dot com


--- Comment #4 from eweddington at cso dot atmel dot com  2007-06-10 16:43 
---
This looks like a duplicate of bug #31786. Closing this bug as #31786 has more
analysis in the comments and is confirmed.

*** This bug has been marked as a duplicate of 31786 ***


-- 

eweddington at cso dot atmel dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24894



[Bug target/31786] [4.1/4.2/4.3 Regression][avr] error: unable to find a register to spill in class 'BASE_POINTER_REGS'

2007-06-10 Thread eweddington at cso dot atmel dot com


--- Comment #11 from eweddington at cso dot atmel dot com  2007-06-10 16:43 
---
*** Bug 24894 has been marked as a duplicate of this bug. ***


-- 

eweddington at cso dot atmel dot com changed:

   What|Removed |Added

 CC||joel at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31786



[Bug target/32275] New: [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org
I got

 varargs0: n[1] = 0 expected 1
 varargs0: n[2] = 1 expected 2
FAIL: gcc.c-torture/execute/va-arg-24.c execution,  -O3 -fomit-frame-pointer
-funroll-loops

 varargs0: n[1] = 0 expected 1
 varargs0: n[2] = 1 expected 2
FAIL: gcc.c-torture/execute/va-arg-24.c execution,  -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions


-- 
   Summary: [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-
24.c execution
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: hjl at lucon dot org
GCC target triplet: ia64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #17 from ubizjak at gmail dot com  2007-06-10 16:49 ---
(In reply to comment #0)

>   /* Mathematically equivalent to 1/sqrt(b*(1/a))  */
>   return sqrtf(a/b);

Whoa, this one is a little gem, but ATM in the opposite direction. At least for
-ffast-math we could optimize (a / sqrt (b/c)) into a * sqrt (c/b), thus
loosing one division. I'm sure that richi knows by his heart, how to write this
kind of folding ;)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug target/32276] New: [4.3 Regression] New libmudflap failures

2007-06-10 Thread hjl at lucon dot org
FAIL: libmudflap.c++/pass41-frag.cxx execution test
FAIL: libmudflap.c++/pass41-frag.cxx (-O2) execution test
FAIL: libmudflap.c++/pass41-frag.cxx (-O3) execution test
FAIL: libmudflap.c++/pass41-frag.cxx ( -O) execution test
FAIL: libmudflap.c++/pass41-frag.cxx (-static) execution test


-- 
   Summary: [4.3 Regression]  New libmudflap failures
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: hjl at lucon dot org
GCC target triplet: ia64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32276



[Bug target/31786] [4.1/4.2/4.3 Regression][avr] error: unable to find a register to spill in class 'BASE_POINTER_REGS'

2007-06-10 Thread eweddington at cso dot atmel dot com


--- Comment #12 from eweddington at cso dot atmel dot com  2007-06-10 16:50 
---
According to a comment in duplicate bug #24894, bug #19636 may be related.

Ralf, can you try the test case using a 4.3 snapshot?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31786



[Bug target/32277] New: [4.3 Regression] g++ failures

2007-06-10 Thread hjl at lucon dot org
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct
call.* AA transformation on insn


-- 
   Summary: [4.3 Regression]  g++ failures
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: hjl at lucon dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32277



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #18 from ubizjak at gmail dot com  2007-06-10 17:34 ---
(In reply to comment #14)
> The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> the former have throughput of 1/16 while the latter are 1/1 (latencies compare
> 21 vs. 3).  This is on K10.  The optimization guide only mentions calculating
> the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
> (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))
> 
> So the optimization would be mainly to improve instruction throughput, not
> overall latency.

If this is the case, then middle-end will need to fold sqrtss in different way
for targets that prefer rsqrtss. According to Comment #16, it is better to fold
to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one
multiplication during NR expansion by rsqrt [due to sqrt(x) <=>  x * (1.0 /
sqrt(x))].

IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together
with proper folding functionality that expands directly to (NR-enhanced) rsqrt
optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c)
[where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as
NR-rsqrt. In this case, I see no RTL pass that would be able to combine
everything together in order to swap (b/c) operands to produce NR-enhanced
a*rsqrt(c/b) equivalent.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug fortran/32257] Scoping problem in implied do loop in I/O statement

2007-06-10 Thread tkoenig at gcc dot gnu dot org


--- Comment #1 from tkoenig at gcc dot gnu dot org  2007-06-10 18:09 ---
Two points:

- The scoping is correct (i is indeed the same variable)

- i becomes undefined on exit of the implied do loop, so
  the code is illegal.

http://groups.google.de/group/comp.lang.fortran/browse_thread/thread/a991e9f53d97f0ce/ca1b856d01bdbcf2?lnk=st&q=scoping+for+implied+do+loops&rnum=2#

Resolving as invalid.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||tkoenig at gcc dot gnu dot
   ||org
 Status|UNCONFIRMED |RESOLVED
 Resolution||INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32257



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org


--- Comment #1 from hjl at lucon dot org  2007-06-10 19:18 ---
Revision 122814 is bad and revision 122792 is good.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug libmudflap/19319] Mudflap produce many violations on simple, correct c++ program

2007-06-10 Thread awl03 at doc dot ic dot ac dot uk


--- Comment #27 from awl03 at doc dot ic dot ac dot uk  2007-06-10 19:32 
---
I have been writing my own bounds-checker based on Mudflap.  While doing so I
had to tackle this same problem.  My flatmate and I tracked it down to the fact
that, although function parameters and variables are registered if their
address is ever taken, the return value is not.  This is a problem in
return-by-value where the result is returned directly without an intermediate
variable.  For example:

class bob {
  public:
int i;
bob(int n) { i = n; }
};

bob f(int n)
{
  return bob(n);
}

int main()
{
  bob b = f(0);
}

Here bob is constructed directly in the return statement in f().  In GIMPLE
this looks like:

bob f(int) (n)
{
:
  __comp_ctor  (&, n);
  return ;
}

Notice that  has its address taken.  Inside the constructor
__comp_ctor() the object is created in the location given by .
 has not been registered by f() as return values are not registered,
nor has it been registered by main() (where the object finally ends up) because
nothing there uses its address.  

This happens a lot in the STL, hence why it shows up whenever template, map
etc., are used:

iterator begin()
{
return iterator (this->_M_impl._M_start);
}

which is gimplified to into:

iterator begin()
{
comp_ctor (&, &this->_M_impl._M_start);
return ;
} 

If Mudflap is changed to register these return values, the violations go away
:)  I have created a patch that does this but, as I'm a relative newbie, it
could all be complete rubbish in which case I apologise.

This deals with the problem for the initial testcase, the simplified test by
Frank Ch. Eigler and the test by Paul Pluzhnikov.  It does not fix the others
as these are caused by a different problem, namely objects created by external
library calls are not registered by Mudflap and so it thinks there is a
violation if you use one of these foreign pointers.

I hope this helps and I would be very glad of feedback.

Alex Lamaison


-- 

awl03 at doc dot ic dot ac dot uk changed:

   What|Removed |Added

 CC||awl03 at doc dot ic dot ac
   ||dot uk


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319



[Bug libmudflap/19319] Mudflap produce many violations on simple, correct c++ program

2007-06-10 Thread awl03 at doc dot ic dot ac dot uk


--- Comment #28 from awl03 at doc dot ic dot ac dot uk  2007-06-10 19:35 
---
Created an attachment (id=13673)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13673&action=view)
Patch for tree-mudflap.c

This is the patch mentioned in my explanation.  It is against the 4.1.1 release
source.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319



[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling

2007-06-10 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2007-06-10 20:07 ---
Danny, as looked at restrict handling a few days ago - maybe you know instantly
why it doesn't work ;)  (apart from us not recomputing aliasing after loop
optimizations on the tree level -- and the complete unrolling happens there)


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||dberlin at gcc dot gnu dot
   ||org, rguenth at gcc dot gnu
   ||dot org
   Keywords||alias


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org


--- Comment #2 from hjl at lucon dot org  2007-06-10 20:12 ---
(In reply to comment #1)
> Revision 122814 is bad and revision 122792 is good.
> 

Correction. Revision 122780 is bad and revision 122738 is good.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org


--- Comment #3 from hjl at lucon dot org  2007-06-10 20:24 ---
Revision 122748 is good.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org


--- Comment #4 from hjl at lucon dot org  2007-06-10 20:42 ---
Revision 122761 is bad.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread hjl at lucon dot org


--- Comment #5 from hjl at lucon dot org  2007-06-10 20:58 ---
I have verified that this patch:

http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00545.html

causes this regression.


-- 

hjl at lucon dot org changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu dot
   ||org
OtherBugsDependingO||30643
  nThis||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread rguenther at suse dot de


--- Comment #19 from rguenther at suse dot de  2007-06-10 21:39 ---
Subject: Re:  Use reciprocal and reciprocal square root
 with -ffast-math

On Sun, 10 Jun 2007, ubizjak at gmail dot com wrote:

> 
> 
> --- Comment #18 from ubizjak at gmail dot com  2007-06-10 17:34 ---
> (In reply to comment #14)
> > The interesting difference between sqrtss, divss and rcpss, rsqrtss is that
> > the former have throughput of 1/16 while the latter are 1/1 (latencies 
> > compare
> > 21 vs. 3).  This is on K10.  The optimization guide only mentions 
> > calculating
> > the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss
> > (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a)))
> > 
> > So the optimization would be mainly to improve instruction throughput, not
> > overall latency.
> 
> If this is the case, then middle-end will need to fold sqrtss in different way
> for targets that prefer rsqrtss. According to Comment #16, it is better to 
> fold
> to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one
> multiplication during NR expansion by rsqrt [due to sqrt(x) <=>  x * (1.0 /
> sqrt(x))].
> 
> IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together
> with proper folding functionality that expands directly to (NR-enhanced) rsqrt
> optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c)
> [where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as
> NR-rsqrt. In this case, I see no RTL pass that would be able to combine
> everything together in order to swap (b/c) operands to produce NR-enhanced
> a*rsqrt(c/b) equivalent.

We just need a new builtin function, __builtin_rsqrt and at some stage
replace reciprocals of sqrt with the new builtin.  For example in
tree-ssa-math-opts.c which does the existing reciprocal transforms.
For example a target hook could be provided that would for example look
like

   tree target_fn_for_expr (tree expr);

and return a target builtin decl for the given expression.

And we should start splitting this PR ;)  One for a/sqrt(b/c) and one
for the above transformation.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/32279] New: Fold 1.0/sqrt(x/y) to sqrt(y/x)

2007-06-10 Thread rguenth at gcc dot gnu dot org
This may even work for -funsafe-math-optimizations only (we round differently).
One has to enumerate all interesting cases (mainly x == 0) and see if NaN/Inf
are properly preserved in all cases.


-- 
   Summary: Fold 1.0/sqrt(x/y) to sqrt(y/x)
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32279



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread rguenth at gcc dot gnu dot org


--- Comment #20 from rguenth at gcc dot gnu dot org  2007-06-10 21:46 
---
PR32279 for 1/sqrt(x/y) to sqrt(y/x)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread rguenth at gcc dot gnu dot org


--- Comment #21 from rguenth at gcc dot gnu dot org  2007-06-10 21:48 
---
The other issue is really about this bug, so not splitting.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling

2007-06-10 Thread dberlin at gcc dot gnu dot org


--- Comment #2 from dberlin at gcc dot gnu dot org  2007-06-10 22:41 ---
Complete guess:

alias.c relies not on TYPE_RESTRICT, but on DECL_BASED_ON_RESTRICT_P
I never noticed we even had such a thing :)

My guess is that loop unrolling makes new ssa names, and when they get
transformed during un-ssa, this flag no longer exists on them.

Realistically, may-alias should propagate the DECL_* stuff to
SSA_NAME_PTR_INFO, which loop unrolling copies.

When they get un-ssa'd, we should then copy the restrict info from the ssa name
back to the base variable we create.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273



[Bug fortran/32235] [4.3 Regression] incorrectly position text file after backspace

2007-06-10 Thread jvdelisle at gcc dot gnu dot org


--- Comment #8 from jvdelisle at gcc dot gnu dot org  2007-06-10 22:50 
---
Subject: Bug 32235

Author: jvdelisle
Date: Sun Jun 10 22:50:47 2007
New Revision: 125606

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125606
Log:
2007-06-10  Jerry DeLisle  <[EMAIL PROTECTED]>

PR libgfortran/32235
* io/transfer.c (st_read): Remove test for end of file condition.
(next_record_r): Add test for end of file condition.


Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/io/transfer.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32235



[Bug c++/32089] Winline reports bogus warning

2007-06-10 Thread mckelvey at maskull dot com


--- Comment #4 from mckelvey at maskull dot com  2007-06-10 22:52 ---
Created an attachment (id=13674)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13674&action=view)
Preprocessed source


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32089



[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2007-06-10 22:55 ---
This works on the pointer_plus branch :)  Also Predictive commoning fixes it up
even without unrolling at the tree level so it works at -O3 (this is on the
pointer_plus branch I have not tried on the mainline).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273



[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #4 from pinskia at gcc dot gnu dot org  2007-06-11 00:21 ---
Yes this is fixed on the pointer_plus branch, the pointer_plus branch is better
at keeping track of what the decl is the restrict pointer's base.

-;; *D.1537 = *D.1539 + *D.1537
+;; *D.1538 = *D.1541 + *D.1538
 (insn 14 13 15 t.c:16 (set (reg:SI 66)
-(mem:SI (reg:SI 59 [ D.1539 ]) [8 S4 A32])) -1 (nil)
+(mem:SI (reg:SI 59 [ D.1541 ]) [2 S4 A32])) -1 (nil)
 (nil))

 (insn 15 14 0 t.c:16 (parallel [
-(set (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32])
-(plus:SI (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32])
+(set (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32])
+(plus:SI (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32])
 (reg:SI 66)))
 (clobber (reg:CC 17 flags))
 ]) -1 (nil)
-(expr_list:REG_EQUAL (plus:SI (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32])
-(mem:SI (reg:SI 59 [ D.1539 ]) [8 S4 A32]))
+(expr_list:REG_EQUAL (plus:SI (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32])
+(mem:SI (reg:SI 59 [ D.1541 ]) [2 S4 A32]))
 (nil)))


See how the - has different aliasing sets than the +, the - has the correct
aliasing set.

So this is now mine.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |pinskia at gcc dot gnu dot
   |dot org |org
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
  GCC build triplet|i686-pc-linux-gnu   |
   GCC host triplet|i686-pc-linux-gnu   |
 GCC target triplet|i686-pc-linux-gnu   |
   Last reconfirmed|-00-00 00:00:00 |2007-06-11 00:21:57
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273



[Bug tree-optimization/29751] not optimizing access a[0] , a[1]

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #2 from pinskia at gcc dot gnu dot org  2007-06-11 00:30 ---
Confirmed, this is only a tree level missed optimization.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Keywords||alias, TREE
   Last reconfirmed|-00-00 00:00:00 |2007-06-11 00:30:03
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29751



[Bug middle-end/14192] Restrict pointers don't help

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #10 from pinskia at gcc dot gnu dot org  2007-06-11 00:34 
---
> The second case is the following loop:

This is just caused by how we represent pointer addition.  I have a fix for
that one, we now get the correct aliasing sets for it.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

   Keywords||alias


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14192



[Bug tree-optimization/16913] [4.0/4.1/4.2/4.3 Regression] restrict does not make a difference

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #10 from pinskia at gcc dot gnu dot org  2007-06-11 00:47 
---
There are a couple of issues here, first pointer_plus improves the aliasing set
issue, but then PRE comes around and messes it up because it does not add
pointer types which have DECL_BASED_ON_RESTRICT_P/DECL_GET_RESTRICT_BASE setup
correctly.  

Disabling PRE on powerpc-linux-gnu (on the pointer_plus branch) is enough to
get the RTL optimizers to optimize away the extra loads and we get for the
inner loop:
.L3:
stfsx 0,9,3
addi 9,9,4
bdnz .L3
Which is almost the best you can do :).



One more issue (for x86) is expand emits code that causes the rtl optimizers
not to optimize well as they only look into loads in sets.  I don't know how to
fix that issue without fixing restrict at the tree level.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16913



[Bug tree-optimization/14187] [tree-ssa] restricted pointers should not alias on the tree level

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2007-06-11 00:48 ---
(In reply to comment #3)
> Interestingly the following code is optimized:
That is because we create a new may_alias variable for malloc to point to so we
know that it cannot alias anything.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14187



[Bug tree-optimization/20643] [4.0/4.1/4.2/4.3 Regression] Tree loop optimizer does worse job than RTL loop optimizer

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #17 from pinskia at gcc dot gnu dot org  2007-06-11 00:53 
---
the pointer_plus branch improves the code here (I can't tell if it fixes the
problem fully).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20643



[Bug rtl-optimization/32280] New: _mm_srli_si128, heinous code for some shifts

2007-06-10 Thread tbptbp at gmail dot com
I lack words to describe what happens on x86-64 to
<-->
#include 


__m128i foo(__m128i a) { return _mm_srli_si128(a, 8); }

int main() { return 0; }
<-->

# /usr/local/gcc-4.2-20060916/bin/gcc -O1 pr-psrldq.c -o pr-psrldq

0040042e :
  40042e:   66 0f 7f 44 24 d8   movdqa %xmm0,0xffd8(%rsp)
  400434:   48 8b 54 24 e0  mov0xffe0(%rsp),%rdx
  400439:   48 89 d0mov%rdx,%rax
  40043c:   31 d2   xor%edx,%edx
  40043e:   48 89 44 24 e8  mov%rax,0xffe8(%rsp)
  400443:   48 89 54 24 f0  mov%rdx,0xfff0(%rsp)
  400448:   66 0f 6f 44 24 e8   movdqa 0xffe8(%rsp),%xmm0
  40044e:   c3  retq

gcc-4.3-20070105 is still that creative.

As far as i know, it's specific to x86-64 but i'm not sure if other shifting
ops or specific values also are pathologic.


-- 
   Summary:  _mm_srli_si128, heinous code for some shifts
   Product: gcc
   Version: 4.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tbptbp at gmail dot com
  GCC host triplet: x86-64, linux, gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32280



[Bug rtl-optimization/32280] _mm_srli_si128, heinous code for some shifts

2007-06-10 Thread tbptbp at gmail dot com


--- Comment #1 from tbptbp at gmail dot com  2007-06-11 03:02 ---
s/gcc-4.3-20070105/gcc-4.3-20070608/


-- 

tbptbp at gmail dot com changed:

   What|Removed |Added

Summary| _mm_srli_si128, heinous|_mm_srli_si128, heinous code
   |code for some shifts|for some shifts


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32280



[Bug fortran/32235] [4.3 Regression] incorrectly position text file after backspace

2007-06-10 Thread jvdelisle at gcc dot gnu dot org


--- Comment #9 from jvdelisle at gcc dot gnu dot org  2007-06-11 03:06 
---
Subject: Bug 32235

Author: jvdelisle
Date: Mon Jun 11 03:06:01 2007
New Revision: 125611

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125611
Log:
2007-06-10  Jerry DeLisle  <[EMAIL PROTECTED]>

PR libgfortran/32235
* gfortran.dg/backspace_9.f: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/backspace_9.f
Modified:
trunk/gcc/testsuite/ChangeLog


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32235



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread tbptbp at gmail dot com


--- Comment #22 from tbptbp at gmail dot com  2007-06-11 03:32 ---
I'm a bit late to the debate but...

At some point icc did such transformations (for 1/x and sqrt) but, apparently,
they're now removed. It didn't bother to plug every holes (ie wrt infinities)
but at least got the case of 0 covered even when set lose; it's cheap to do.
I've repeatedly been pointed to the peculiar semantic of -ffast-math in the
past, so i know there's little chance for me to succeed, but would it be
possible to consider that as an option?

PS: Yes, i do rely on infinities and -ffast-math and deserve to die a slow and
painful way.


-- 

tbptbp at gmail dot com changed:

   What|Removed |Added

 CC||tbptbp at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug c++/32281] New: A problem of gcc4.1.0(O3 optimize)

2007-06-10 Thread stillzhang at tencent dot com
When I user gcc4.1.0 to compile mysql4.1.22, I find some errors. I¡¯m not sure
whether it¡¯s a gcc bug or not, so I need your help.

   The version of gcc:

   gcc -v

Using built-in specs.

Target: i586-suse-linux

Configured with: ../configure --enable-threads=posix --prefix=/usr
--with-local-prefix=/usr/local --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib
--enable-languages=c,c++,objc,fortran,java,ada --enable-checking=release
--with-gxx-include-dir=/usr/include/c++/4.1.0 --enable-ssp --disable-libssp
--enable-java-awt=gtk --enable-gtk-cairo --disable-libjava-multilib
--with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit
--enable-libstdcxx-allocator=new --without-system-libunwind --with-cpu=generic
--host=i586-suse-linux

Thread model: posix

gcc version 4.1.0 (SUSE Linux)

Linux verson:

Linux 2.6.16.21-0.8-TENCENT #1 SMP Sat Jan 13 19:17:08 CST 2007 i686 i686 i386
GNU/Linux

Mysql4.1.22 is from http://mysql.he.net/Downloads/MySQL-4.1/mysql-4.1.22.tar.gz
.



The error is in function mysql_stmt_execute(THD *thd, char *packet, uint
packet_length)( mysql-4.1.22/sql/sql_prepare.cc:1786).

The file is complied by these arguments:

g++ -DMYSQL_SERVER -DDEFAULT_MYSQL_HOME="\"/data/home/c4b/still/bin/mysql/\""
-DDATADIR="\"/data/home/c4b/still/bin/mysql//var\""
-DSHAREDIR="\"/data/home/c4b/still/bin/mysql//share/mysql\"" -DHAVE_CONFIG_H
-I. -I. -I.. -I../innobase/include -I../include -I../include -I../regex -I.
-O3 -DDBUG_OFF-fno-implicit-templates -fno-exceptions -fno-rtti -MT
sql_prepare.o -MD -MP -MF ".deps/sql_prepare.Tpo" -g -c -o sql_prepare.o
sql_prepare.cc



In line 1822-1824

1822if (setup_conversion_functions(stmt, (uchar **) &packet, packet_end) ||

   1823 stmt->set_params(stmt, null_array, (uchar *) packet,
packet_end,

   1824  &expanded_query))

And the function ¡°setup_conversion_functions¡± is compiled as inline function.

The lase sentence of in function setup_conversion_functions is *data= read_pos;



The three sentences is compiled to 

0x08197bff <_Z18mysql_stmt_executeP3THDPcj+703>: mov0xc(%ebp),%ecx

0x08197c02 <_Z18mysql_stmt_executeP3THDPcj+706>:mov   
0xffc0(%ebp),%ebx

0x08197c05 <_Z18mysql_stmt_executeP3THDPcj+709>:mov   
0xffb4(%ebp),%eax

0x08197c08 <_Z18mysql_stmt_executeP3THDPcj+712>:mov%ecx,0x8(%esp)

0x08197c0c <_Z18mysql_stmt_executeP3THDPcj+716>:mov   
0xffd0(%ebp),%edx

0x08197c0f <_Z18mysql_stmt_executeP3THDPcj+719>:mov   
0xffb8(%ebp),%ecx

0x08197c12 <_Z18mysql_stmt_executeP3THDPcj+722>:mov%ebx,0xc(%ebp)  
   //*data= read_pos

0x08197c15 <_Z18mysql_stmt_executeP3THDPcj+725>:lea   
0xffdc(%ebp),%ebx

0x08197c18 <_Z18mysql_stmt_executeP3THDPcj+728>:mov%ebx,0x10(%esp)

0x08197c1c <_Z18mysql_stmt_executeP3THDPcj+732>:mov%eax,0xc(%esp)

0x08197c20 <_Z18mysql_stmt_executeP3THDPcj+736>:mov%edx,0x4(%esp)

0x08197c24 <_Z18mysql_stmt_executeP3THDPcj+740>:mov%ecx,(%esp)

0x08197c27 <_Z18mysql_stmt_executeP3THDPcj+743>:call   *0x764(%ecx)



0xc(%ebp) is the address of &packet(in function mysql_stmt_execute) and also
the address of *data(in function setup_conversion_functions).

In 703 and 712, we can see the value of 0xc(%ebp) is push to stack for the
third argument of function stmt->set_params.

The sentence 722 is for *data= read_pos, Move the read_pos to *data(address
0xc(%ebp)).



So the third argument of function stmt->set_params use the old value not the
new value. 





Am I right£¿Wait for your reply, and thank you very much.





Best wishes,

Still


-- 
   Summary: A problem of gcc4.1.0(O3 optimize)
   Product: gcc
   Version: 4.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stillzhang at tencent dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281



[Bug c++/32281] A problem of gcc4.1.0(O3 optimize)

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #1 from pinskia at gcc dot gnu dot org  2007-06-11 03:41 ---
So packet is char*, and you are accessing it as uchar*, so this code is
violating C/C++ aliasing rules.

*** This bug has been marked as a duplicate of 21920 ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE
Summary|A problem of gcc4.1.0(O3|A problem of gcc4.1.0(O3
   |optimize)   |optimize)


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281



[Bug c/21920] aliasing violations

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #113 from pinskia at gcc dot gnu dot org  2007-06-11 03:41 
---
*** Bug 32281 has been marked as a duplicate of this bug. ***


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||stillzhang at tencent dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21920



[Bug rtl-optimization/29589] incorrect conversion of (ior (ashiftrt (plus ...))) in combine.c

2007-06-10 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2007-06-11 04:44 ---
I have a fix from our local tree which also fixes up the regression which we
found with a different patch.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |pinskia at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29589



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #23 from ubizjak at gmail dot com  2007-06-11 05:51 ---
(In reply to comment #22)

> At some point icc did such transformations (for 1/x and sqrt) but, apparently,
> they're now removed. It didn't bother to plug every holes (ie wrt infinities)
> but at least got the case of 0 covered even when set lose; it's cheap to do.
> I've repeatedly been pointed to the peculiar semantic of -ffast-math in the
> past, so i know there's little chance for me to succeed, but would it be
> possible to consider that as an option?

But both, rcpss and rsqrtss handle infinties correctly (they return zero) and
return [-]inf when [-]0.0 is used as an argument.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math

2007-06-10 Thread tbptbp at gmail dot com


--- Comment #24 from tbptbp at gmail dot com  2007-06-11 05:58 ---
Yes, but there's some fuss at 0 when you pile up a NR round.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723



[Bug middle-end/32077] [Regression 4.3] Profile-use: ICE: Segmentation fault

2007-06-10 Thread burnus at gcc dot gnu dot org


--- Comment #2 from burnus at gcc dot gnu dot org  2007-06-11 06:04 ---
Seems to be fixed since 2007-06-07. -> Close PR.


-- 

burnus at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32077



[Bug c++/32281] A problem of gcc4.1.0(O3 optimize)

2007-06-10 Thread stillzhang at tencent dot com


--- Comment #2 from stillzhang at tencent dot com  2007-06-11 06:07 ---
Thank you.

But if i compiled it without -O3, it work fine.
If I compiled it under gcc3.3 with -O3, it also work fine.

The same program with different optimize has different, so i think it should
not be like this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281



[Bug middle-end/32279] Fold 1.0/sqrt(x/y) to sqrt(y/x)

2007-06-10 Thread ubizjak at gmail dot com


--- Comment #1 from ubizjak at gmail dot com  2007-06-11 06:36 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00655.html

Patch was also checked with 0.0, __builtin_inf and __builtin_nan, and the
results were the same as for unpatched gcc for all combinations that were
thrown in.


-- 

ubizjak at gmail dot com changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |ubizjak at gmail dot com
   |dot org |
URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2007-
   ||06/msg00655.html
 Status|UNCONFIRMED |ASSIGNED
 Ever Confirmed|0   |1
   Keywords||patch
   Last reconfirmed|-00-00 00:00:00 |2007-06-11 06:36:21
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32279



[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution

2007-06-10 Thread bonzini at gnu dot org


--- Comment #6 from bonzini at gnu dot org  2007-06-11 06:54 ---
can you please show the difference in assembly code between the two?


-- 

bonzini at gnu dot org changed:

   What|Removed |Added

 CC||bonzini at gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275