[Bug c++/32270] New: warning for potential undesired operator& and operator== evaluation order
Hello, i don't know whether this was already requested way back in 1985, and maybe there is an evangelical answer to this. This is a request to add a compiler option for warnings if the evaluation of operator& and operator== (and similar) may be not 'as expected'. I personally feel it in most cases utter nonsens to apply operator& on a boolean result, as it is defined in the C/C++ operator precedence hierarchy; and every now and then i forget to add brackets around each and every operation in doubt in an expression and then i'm puzzled for hours what goes wrong. Example: if( value&mask == $7F00 ) { } does not evaluate as expected and, in my opionion, the only reasonable way, but evaluates as: if( value & (mask==$7F00) ) { } The reasoning for adding this warning is comparable to the reasoning which lead to add a warning if the result of operator= is used as part of the boolean expression in "if( ){ }": It is _probably_ not what the programmer intended. The way to circumvent the warning (if enabled) would be the same as for operator= too: add brackets around the expression. Eventually these warnings could even be combined into a single compiler option, but they probably should go into the same compiler option sets. Exact scope for this kind of warning should be: operator &, ^ and | versus operator == and != because it makes no sense to apply a bit masking operator on a boolean result, as it is done if no brackets are used to reorder the sequence of evaluation. And the same applies in my eyes to: operator << and >> versus operator +, -, *, / and % because operator<< and >> do an exponentation 2**n and the priority of exponentation is (should be) higher than that of multiplication. The C/C++ standard cannot be changed, though it handles it wrong in my opinion, but adding a warning if the default evaluation order is applied to 2 operations from the above sets would be very appreciated in my eyes. Thanks for an answer, ... kio ! -- Summary: warning for potential undesired operator& and operator== evaluation order Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: bugzilla at little-bat dot de GCC build triplet: (GCC) 4.0.1 (Apple Computer, Inc. build 5367) GCC host triplet: powerpc-apple-darwin8-g++-4.0.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #11 from ubizjak at gmail dot com 2007-06-10 08:28 --- I have experimented a bit with rcpss, trying to measure the effect of additional NR step to the performance. NR step was calculated based on http://en.wikipedia.org/wiki/N-th_root_algorithm, and for N=-1 (1/A) we can simplify to: x1 = x0 (2.0 - A X0) To obtain 24bit precision, we have to use a reciprocal, two multiplies and subtraction (+ a constant load). First, please note that "divss" instruction is quite _fast_, clocking at 23 cycles, where approximation with NR step would sum up to 20 cycles, not counting load of constant. I have checked the performance of following testcase with various implementetations on x86_64 C2D: --cut here-- float test(float a) { return 1.0 / a; } int main() { float a = 1.12345; volatile float t; int i; for (i = 1; i < 10; i++) { t += test (a); a += 1.0; } printf("%f\n", t); return 0; } --cut here-- divss : 3.132s rcpss NR : 3.264s rcpss only: 3.080s To enhance the precision of 1/sqrt(A), additional NR step is calculated as x1 = 0.5 X0 (3.0 - A x0 x0 x0) and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of clocks ;) ), additional NR step just isn't worth it. The experimental patch: Index: i386.md === --- i386.md (revision 125599) +++ i386.md (working copy) @@ -15399,6 +15399,15 @@ ;; Gcc is slightly more smart about handling normal two address instructions ;; so use special patterns for add and mull. +(define_insn "*rcpsf2_sse" + [(set (match_operand:SF 0 "register_operand" "=x") + (unspec:SF [(match_operand:SF 1 "nonimmediate_operand" "xm")] + UNSPEC_RCP))] + "TARGET_SSE" + "rcpss\t{%1, %0|%0, %1}" + [(set_attr "type" "sse") + (set_attr "mode" "SF")]) + (define_insn "*fop_sf_comm_mixed" [(set (match_operand:SF 0 "register_operand" "=f,x") (match_operator:SF 3 "binary_fp_operator" @@ -15448,6 +15457,29 @@ (const_string "fop"))) (set_attr "mode" "SF")]) +(define_insn_and_split "*rcp_sf_1_sse" + [(set (match_operand:SF 0 "register_operand" "=x") + (div:SF (match_operand:SF 1 "immediate_operand" "F") + (match_operand:SF 2 "nonimmediate_operand" "xm"))) + (clobber (match_scratch:SF 3 "=&x")) + (clobber (match_scratch:SF 4 "=&x"))] + "TARGET_SSE_MATH + && operands[1] == CONST1_RTX (SFmode) + && flag_unsafe_math_optimizations" + "#" + "&& reload_completed" + [(set (match_dup 3)(match_dup 2)) +(set (match_dup 4)(match_dup 5)) +(set (match_dup 0)(unspec:SF [(match_dup 3)] UNSPEC_RCP)) +(set (match_dup 3)(mult:SF (match_dup 3)(match_dup 0))) +(set (match_dup 4)(minus:SF (match_dup 4)(match_dup 3))) +(set (match_dup 0)(mult:SF (match_dup 0)(match_dup 4)))] +{ + rtx two = const_double_from_real_value (dconst2, SFmode); + + operands[5] = validize_mem (force_const_mem (SFmode, two)); +}) + (define_insn "*fop_sf_1_mixed" [(set (match_operand:SF 0 "register_operand" "=f,f,x") (match_operator:SF 3 "binary_fp_operator" Based on these findings, I guess that NR step is just not worth it. If we want to have noticeable speed-up on division and square root, we have to use 12bit implementations, without any refinements - mainly for benchmarketing, I'm afraid. BTW: on x86_64, patched gcc compiles "test" function to: test: movaps %xmm0, %xmm1 rcpss %xmm0, %xmm0 movss .LC1(%rip), %xmm2 mulss %xmm0, %xmm1 subss %xmm1, %xmm2 mulss %xmm2, %xmm0 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug c++/32270] warning for potential undesired operator& and operator== evaluation order
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-06-10 08:34 --- The warning works on the trunk: [pinskia-laptop:gcc/mips/gcc] pinskia% ./cc1plus t.c -W -Wall int f(int, int) t.c:3: warning: suggest parentheses around comparison in operand of & -- pinskia at gcc dot gnu dot org changed: What|Removed |Added GCC build triplet|(GCC) 4.0.1 (Apple Computer,| |Inc. build 5367)| GCC host triplet|powerpc-apple-darwin8-g++- | |4.0.1 | Keywords||diagnostic Summary|warning for potential |warning for potential |undesired operator& and |undesired operator& and |operator== evaluation order |operator== evaluation order Version|4.3.0 |4.0.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270
[Bug c++/32270] warning for potential undesired operator& and operator== evaluation order
--- Comment #2 from pinskia at gcc dot gnu dot org 2007-06-10 08:38 --- Well that is because it was fixed on the trunk last December by: 2006-12-13 Ian Lance Taylor <[EMAIL PROTECTED]> PR c++/19564 PR c++/19756 This is a dup of bug 19564. *** This bug has been marked as a duplicate of 19564 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32270
[Bug c++/19564] -Wparentheses does not work with the C++ front-end
--- Comment #10 from pinskia at gcc dot gnu dot org 2007-06-10 08:38 --- *** Bug 32270 has been marked as a duplicate of this bug. *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||bugzilla at little-bat dot ||de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19564
[Bug libstdc++/31970] set<>::iterator vs type-safety
--- Comment #8 from chris at bubblescope dot net 2007-06-10 08:57 --- Hmm.. I thought I did have a good example, I had a function that looked like: template int count_unique(It begin, It end) { set counter(begin, end); return counter.size(); } But, while you might get multiple copies of this function for each iterator type, the "work parts" (the building of the set and the call to size()) will be the same regardless of if this is fixed. The only good example I can come up with would be if someone decided to build multiple maps of set::iterators, which I've never wanted to do... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31970
[Bug preprocessor/32271] New: Incorrect warnings in disabled code.
The preprocessor will report warnings when there is an unterminated ' or " in a disabled section. Example code that triggers two warnings: Code begin #if 0 This shouln"t cause a problem. This shouln't cause a problem. #endif int main() { return 0; } Code end Output from the preprocessor: $ cpp-4.2 -v -save-temps bug.cpp Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --disable-werror --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.2.1 20070528 (prerelease) (Ubuntu 4.2-20070528-0ubuntu2) /usr/lib/gcc/i486-linux-gnu/4.2.1/cc1plus -E -quiet -v -D_GNU_SOURCE bug.cpp -mtune=generic -fpch-preprocess ignoring nonexistent directory "/usr/local/include/i486-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/i486-linux-gnu/4.2.1/../../../../i486-linux-gnu/include" ignoring nonexistent directory "/usr/include/i486-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/include/c++/4.2 /usr/include/c++/4.2/i486-linux-gnu /usr/include/c++/4.2/backward /usr/local/include /usr/lib/gcc/i486-linux-gnu/4.2.1/include /usr/include End of search list. # 1 "bug.cpp" # 1 "" # 1 "" # 1 "bug.cpp" bug.cpp:2:12: warning: missing terminating " character bug.cpp:3:12: warning: missing terminating ' character int main() { return 0; } -- Summary: Incorrect warnings in disabled code. Product: gcc Version: 4.2.1 Status: UNCONFIRMED Severity: minor Priority: P3 Component: preprocessor AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: pcmoen at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271
[Bug preprocessor/32271] Incorrect warnings in disabled code.
--- Comment #1 from pcmoen at gmail dot com 2007-06-10 09:25 --- Created an attachment (id=13672) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13672&action=view) Test case that shows the error. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271
[Bug preprocessor/32271] Incorrect warnings in disabled code.
--- Comment #2 from pinskia at gcc dot gnu dot org 2007-06-10 09:34 --- Actually the warning is correct as the code is undefined at compile time and this is documented: # Do not use @code{#if 0} for comments which are not C code. Use a real # comment, instead. The interior of @code{#if 0} must consist of complete # tokens; in particular, single-quote characters must balance. *** This bug has been marked as a duplicate of 14634 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32271
[Bug preprocessor/14634] Unterminated literals not diagnosed
--- Comment #13 from pinskia at gcc dot gnu dot org 2007-06-10 09:34 --- *** Bug 32271 has been marked as a duplicate of this bug. *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||pcmoen at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14634
[Bug target/32264] gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3
--- Comment #5 from axel at freakout dot de 2007-06-10 10:05 --- Subject: Re: gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3 According to rguenth at gcc dot gnu dot org: > > --- Comment #4 from rguenth at gcc dot gnu dot org 2007-06-09 10:27 > --- > We need this reduced to a managable testcase that gcc miscompiles. > Sorry - but kernel debugging in the early boot stage goes far beyond my capabilities. I tried to gather as much information as i can. The crash can be reproduced just with the kernel itself, no modules involved. I've added an archive with two dirs gcc-4.1.2 and gcc-4.2.0 - in each dir is the compiled kernel vmlinux and the boot image vmlinuz, with can be loaded by any bootloader (grub, lilo, syslinux, loadlin, ...). i also added the corresponding System.map's. The kernel were produced from identical (the same) source trees with gcc 4.1.2 and gcc 4.2.0 on the same machine. The gcc 4.1.2 compiled kernel boots until panic - no root fs - works ok. The gcc 4.2.0 kernel crashes with this output: == Kernel command line: BOOT_IMAGE=vmlinuz4.434 Initializing CPU#0 Detected 797.420 MHz processor. Console: colour VGA+ 80x25 Unable to handle kernel paging request at virtual address f000fec4 printing eip: c0295690 *pde = Oops: 0002 CPU:0 EIP:0010:[] Not tainted EFLAGS: 00010017 eax: f00fec4ebx: ecx: 0037 edx: 0010 esi: 000994c1 edi: c0105000 ebp: 0008e000 esp: c0251fe4 ds: 0018 es: 0020 ss: 0018 Process swapper (pid: 0, stackpage=c0251000) Stack: 0020 c0252290 0010 0216 c0252630 c0295ae0 c0100191 Call Trace: Code: 10 00 f3 a5 ea 19 00 00 90 bf f4 3f 8e d8 8e d0 3f a3 c1 8c <0>Kernel panic: Attempted to kill the idle task! In idle task - not syncing == This output is also in the archive dir of gcc-4.2.0/crash.txt The working kernel (produced from gcc-4.1.2) prints: == Calibrating delay loop... 1592.52 BogoMIPS == at the point where the gcc-4.2.0 produced kernel crashes with the above messages. Hope this helps. Axel -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32264
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #12 from ubizjak at gmail dot com 2007-06-10 10:47 --- Here are the results of mubench insn timings for various x86 processors: http://mubench.sourceforge.net/results.html (target processor can be benchmarked by downloading mubench from http://mubench.sourceforge.net/index.html). And finally an interesting read how commercial compilers trade accurracy for speed (please read at least about SPEC2006 benchmark): http://www.hpcwire.com/hpc/1556972.html -- ubizjak at gmail dot com changed: What|Removed |Added CC||ubizjak at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #13 from jb at gcc dot gnu dot org 2007-06-10 11:06 --- (In reply to comment #11) Thanks for the work. > First, please note that "divss" instruction is quite _fast_, clocking at 23 > cycles, where approximation with NR step would sum up to 20 cycles, not > counting load of constant. > > I have checked the performance of following testcase with various > implementetations on x86_64 C2D: > > --cut here-- > float test(float a) > { > return 1.0 / a; > } > > divss : 3.132s > rcpss NR : 3.264s > rcpss only: 3.080s Interesting, on ubuntu/i686/K8 I get (average of 3 runs) divss: 7.485 s rcpss NR: 9.915 s > To enhance the precision of 1/sqrt(A), additional NR step is calculated as > > x1 = 0.5 X0 (3.0 - A x0 x0 x0) > > and considering that sqrtss also clocks at 23 clocks (_far_ from hundreds of > clocks ;) ), additional NR step just isn't worth it. Well, I suppose it depends on the hardware. IIRC older cpu:s did division with microcode whereas at least core2 and K8 do it in hardware, so I guess the hundreds of cycles doesn't apply to current cpu:s. Also, supposedly Penryn will have a much improved divider.. That being said, I think there is still a case for the reciprocal square root, as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn linked to in the first message in this PR (in short, ifort does sqrt(a/b) about twice as fast as gfortran by using reciprocal approximations + NR). If indeed div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it suggests almost all the performance benefit ifort gets is due to the rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn the sqrt(a/b) loop fills an array, whereas your benchmark accumulates.. > Based on these findings, I guess that NR step is just not worth it. If we want > to have noticeable speed-up on division and square root, we have to use 12bit > implementations, without any refinements - mainly for benchmarketing, I'm > afraid. I hear that it's possible to pass spec2k6/gromacs without the NR step. As most MD programs, gromacs spends almost all it's time in the force calculations, where the majority of time is spent calculating 1/sqrt(...). So perhaps one should watch out for compilers that get suspiciously high scores on that benchmark. :) No, I'm not suggesting gcc should do this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #14 from rguenth at gcc dot gnu dot org 2007-06-10 12:07 --- The interesting difference between sqrtss, divss and rcpss, rsqrtss is that the former have throughput of 1/16 while the latter are 1/1 (latencies compare 21 vs. 3). This is on K10. The optimization guide only mentions calculating the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))) So the optimization would be mainly to improve instruction throughput, not overall latency. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #15 from rguenth at gcc dot gnu dot org 2007-06-10 12:09 --- And of course optimizing division or square root this way violates IEEE 754 which specifies these as intrinsic operations. So a separate flag from -funsafe-math-optimization should be used for this optimization. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug target/32264] gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3
--- Comment #6 from axel at freakout dot de 2007-06-10 13:00 --- Subject: Re: gcc 4.2.0 compiled vanilla kernel 2.4.34.5 crashes when VIA C3 optimized -march=c3 please see: http://www.bnhof.de/~ho1158/gcc-4.2.0-Bug-target-32264.tar.bz2 for the kernle files mentioned above. It is too large to attach. Axel -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32264
[Bug bootstrap/32272] New: make exit because build/genmodes.exe doesn't exist
I started bash and ran ../gcc/configure --enable-threads I then typed make Here is the output: TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \ /bin/sh ../gcc/mkconfig.sh config.h TARGET_CPU_DEFAULT="" \ HEADERS="options.h config/i386/i386.h config/i386/unix.h config/i386/bsd .h config/i386/gas.h config/dbxcoff.h config/i386/cygming.h config/i386/cygwin.h defaults.h" DEFINES="" \ /bin/sh ../gcc/mkconfig.sh tm.h gawk -f ../gcc/opt-gather.awk ../gcc/ada/lang.opt ../gcc/fortran/lang.opt ../gcc /java/lang.opt ../gcc/treelang/lang.opt ../gcc/c.opt ../gcc/common.opt ../gcc/co nfig/i386/i386.opt ../gcc/config/i386/cygming.opt > tmp-optionlist /bin/sh ../gcc/../move-if-change tmp-optionlist optionlist echo timestamp > s-options gawk -f ../gcc/opt-functions.awk -f ../gcc/opth-gen.awk \ < optionlist > tmp-options.h /bin/sh ../gcc/../move-if-change tmp-options.h options.h echo timestamp > s-options-h TARGET_CPU_DEFAULT="" \ HEADERS="auto-host.h ansidecl.h config/i386/xm-cygwin.h" DEFINES="" \ /bin/sh ../gcc/mkconfig.sh bconfig.h gcc -c -g -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-p rototypes -Wold-style-definition -Wmissing-format-attribute-DHAVE_CONFIG_H - DGENERATOR_FILE -I. -Ibuild -I../gcc -I../gcc/build -I../gcc/../include -I../gcc /../libcpp/include -I../gcc/../libdecnumber -I../libdecnumber-o build/error s.o ../gcc/errors.c build/genmodes.exe -h > tmp-modes.h /bin/sh: build/genmodes.exe: No such file or directory make: *** [s-modes-h] Error 127 -- Summary: make exit because build/genmodes.exe doesn't exist Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jdeifik at weasel dot com GCC build triplet: i686-pc-cygwi GCC host triplet: i686-pc-cygwi GCC target triplet: i686-pc-cygwi http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32272
[Bug c/32273] New: 'restrict' is forgotten after loop unrolling
The following two functions are equivalent (especially after loop unrolling): void foo(const int *restrict a, int *restrict b, int *restrict c) { b[0] += a[0]; c[0] += a[0]; b[1] += a[1]; c[1] += a[1]; } void bar(const int *restrict a, int *restrict b, int *restrict c) { for (int i = 0; i < 2; ++i) { b[i] += a[i]; c[i] += a[i]; } } However gcc forgets about 'restrict' after the first iteration of the loop, and foo() and bar() produce different code: foo: pushl %ebx movl8(%esp), %ebx movl12(%esp), %eax movl16(%esp), %edx movl(%ebx), %ecx addl%ecx, (%eax) addl%ecx, (%edx) ;; Correct: no reloading of (%ebx) is needed. movl4(%ebx), %ecx addl%ecx, 4(%eax) addl%ecx, 4(%edx);; Correct: no reloading of 4(%ebx) is needed. popl%ebx ret bar: pushl %ebx movl8(%esp), %ebx movl12(%esp), %edx movl16(%esp), %ecx movl(%ebx), %eax addl%eax, (%edx) addl%eax, (%ecx);; Correct: no reloading of (%ebx) is needed. movl4(%ebx), %eax addl%eax, 4(%edx) movl4(%ebx), %eax ;; BUG: unnecessary reloading of 4(%ebx). addl%eax, 4(%ecx) popl%ebx ret For any number of iterations only the first iteration honors the 'restrict' qualifier. This is wrong, because 'restrict' is a property of a pointer, not data, so if p and q pointers reference different objects, then (p + OFF1) and (q + OFF2) also expected to reference different objects. Correct assembler for foo() supports that. -- Summary: 'restrict' is forgotten after loop unrolling Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tomash dot brechko at gmail dot com GCC build triplet: i686-pc-linux-gnu GCC host triplet: i686-pc-linux-gnu GCC target triplet: i686-pc-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #16 from ubizjak at gmail dot com 2007-06-10 16:24 --- (In reply to comment #13) > > x1 = 0.5 X0 (3.0 - A x0 x0 x0) Whops! One x0 too much above. Correct calcualtion reads: rsqrt = 0.5 rsqrt(a) (3.0 - a rsqrt(a) rsqrt(a)). > Well, I suppose it depends on the hardware. IIRC older cpu:s did division with > microcode whereas at least core2 and K8 do it in hardware, so I guess the > hundreds of cycles doesn't apply to current cpu:s. > > Also, supposedly Penryn will have a much improved divider.. Well, mubench says for my Core2Duo that _all_ sqrt and div functions have latency of 6 clocks and rcp throughput of 5 clks. By _all_ I mean divss, divps, divsd, divpd, sqrtss, sqrtps, sqrtsd and sqrtpd. OTOH, rsqrtss and rcpss have latency of 3 clks and rcp throughput of 2 clks. This is just amazing. > That being said, I think there is still a case for the reciprocal square root, > as evidenced by the benchmarks in #5 and #7 as well as my analysis of gas_dyn > linked to in the first message in this PR (in short, ifort does sqrt(a/b) > about > twice as fast as gfortran by using reciprocal approximations + NR). If indeed > div(p|s)s is about equally fast as rcp(p|s)s as your benchmarks show, then it > suggests almost all the performance benefit ifort gets is due to the > rsqrt(p|s)s, no? Or perhaps there is some issue with pipelining? In gas_dyn > the > sqrt(a/b) loop fills an array, whereas your benchmark accumulates.. It is true, that only a trivial accumulation function is benchmarked by my "benchmark". I can prepare a bunch of expanders to expand: a / b <=> a [rcpss(b) (2.0 - b rcpss(b))] a / sqrtss(b) <=> a [0.5 rsqrtss(b) (3.0 - b rsqrtss(b) rsqrtss(b))]. sqrtss (a) <=> a 0.5 rsqrtss(a) (3.0 - a rsqrtss(a) rsqrtss(a)) second and third case indeed look similar... > I hear that it's possible to pass spec2k6/gromacs without the NR step. As most > MD programs, gromacs spends almost all it's time in the force calculations, > where the majority of time is spent calculating 1/sqrt(...). So perhaps one > should watch out for compilers that get suspiciously high scores on that > benchmark. :) Yes, look at hpcwire article in Comment #12 > No, I'm not suggesting gcc should do this. ;)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug target/32274] New: FAIL: gcc.dg/vect/pr32224.c
I got Executing on host: /export/build/gnu/gcc/build-ia64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-ia64-linux/gcc/ /net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr31343.c -O2 -ftree-vectorize -fdump-tree-vect-details -fno-show-column -S -o pr31343.s (timeout = 300) PASS: gcc.dg/vect/pr31343.c (test for excess errors) UNSUPPORTED: gcc.dg/vect/pr31699.c UNSUPPORTED: gcc.dg/vect/pr32216.c Executing on host: /export/build/gnu/gcc/build-ia64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-ia64-linux/gcc/ /net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c -O2 -ftree-vectorize -fdump-tree-vect-details -fno-show-column -S -o pr32224.s (timeout = 300) /net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c: In function 'gmpz_export':^M /net/gnu-13/export/gnu/src/gcc/gcc/gcc/testsuite/gcc.dg/vect/pr32224.c:13: error: invalid 'asm': ia64_print_operand: unknown code^M compiler exited with status 1 -- Summary: FAIL: gcc.dg/vect/pr32224.c Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org GCC target triplet: ia64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32274
[Bug target/24894] ICE building newlib/libc/misc/init.c
--- Comment #4 from eweddington at cso dot atmel dot com 2007-06-10 16:43 --- This looks like a duplicate of bug #31786. Closing this bug as #31786 has more analysis in the comments and is confirmed. *** This bug has been marked as a duplicate of 31786 *** -- eweddington at cso dot atmel dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24894
[Bug target/31786] [4.1/4.2/4.3 Regression][avr] error: unable to find a register to spill in class 'BASE_POINTER_REGS'
--- Comment #11 from eweddington at cso dot atmel dot com 2007-06-10 16:43 --- *** Bug 24894 has been marked as a duplicate of this bug. *** -- eweddington at cso dot atmel dot com changed: What|Removed |Added CC||joel at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31786
[Bug target/32275] New: [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
I got varargs0: n[1] = 0 expected 1 varargs0: n[2] = 1 expected 2 FAIL: gcc.c-torture/execute/va-arg-24.c execution, -O3 -fomit-frame-pointer -funroll-loops varargs0: n[1] = 0 expected 1 varargs0: n[2] = 1 expected 2 FAIL: gcc.c-torture/execute/va-arg-24.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -- Summary: [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg- 24.c execution Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org GCC target triplet: ia64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #17 from ubizjak at gmail dot com 2007-06-10 16:49 --- (In reply to comment #0) > /* Mathematically equivalent to 1/sqrt(b*(1/a)) */ > return sqrtf(a/b); Whoa, this one is a little gem, but ATM in the opposite direction. At least for -ffast-math we could optimize (a / sqrt (b/c)) into a * sqrt (c/b), thus loosing one division. I'm sure that richi knows by his heart, how to write this kind of folding ;) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug target/32276] New: [4.3 Regression] New libmudflap failures
FAIL: libmudflap.c++/pass41-frag.cxx execution test FAIL: libmudflap.c++/pass41-frag.cxx (-O2) execution test FAIL: libmudflap.c++/pass41-frag.cxx (-O3) execution test FAIL: libmudflap.c++/pass41-frag.cxx ( -O) execution test FAIL: libmudflap.c++/pass41-frag.cxx (-static) execution test -- Summary: [4.3 Regression] New libmudflap failures Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org GCC target triplet: ia64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32276
[Bug target/31786] [4.1/4.2/4.3 Regression][avr] error: unable to find a register to spill in class 'BASE_POINTER_REGS'
--- Comment #12 from eweddington at cso dot atmel dot com 2007-06-10 16:50 --- According to a comment in duplicate bug #24894, bug #19636 may be related. Ralf, can you try the test case using a 4.3 snapshot? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31786
[Bug target/32277] New: [4.3 Regression] g++ failures
FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn FAIL: g++.dg/tree-prof/indir-call-prof.C scan-tree-dump Indirect call -> direct call.* AA transformation on insn -- Summary: [4.3 Regression] g++ failures Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl at lucon dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32277
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 --- (In reply to comment #14) > The interesting difference between sqrtss, divss and rcpss, rsqrtss is that > the former have throughput of 1/16 while the latter are 1/1 (latencies compare > 21 vs. 3). This is on K10. The optimization guide only mentions calculating > the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss > (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))) > > So the optimization would be mainly to improve instruction throughput, not > overall latency. If this is the case, then middle-end will need to fold sqrtss in different way for targets that prefer rsqrtss. According to Comment #16, it is better to fold to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one multiplication during NR expansion by rsqrt [due to sqrt(x) <=> x * (1.0 / sqrt(x))]. IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together with proper folding functionality that expands directly to (NR-enhanced) rsqrt optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c) [where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as NR-rsqrt. In this case, I see no RTL pass that would be able to combine everything together in order to swap (b/c) operands to produce NR-enhanced a*rsqrt(c/b) equivalent. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug fortran/32257] Scoping problem in implied do loop in I/O statement
--- Comment #1 from tkoenig at gcc dot gnu dot org 2007-06-10 18:09 --- Two points: - The scoping is correct (i is indeed the same variable) - i becomes undefined on exit of the implied do loop, so the code is illegal. http://groups.google.de/group/comp.lang.fortran/browse_thread/thread/a991e9f53d97f0ce/ca1b856d01bdbcf2?lnk=st&q=scoping+for+implied+do+loops&rnum=2# Resolving as invalid. -- tkoenig at gcc dot gnu dot org changed: What|Removed |Added CC||tkoenig at gcc dot gnu dot ||org Status|UNCONFIRMED |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32257
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #1 from hjl at lucon dot org 2007-06-10 19:18 --- Revision 122814 is bad and revision 122792 is good. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug libmudflap/19319] Mudflap produce many violations on simple, correct c++ program
--- Comment #27 from awl03 at doc dot ic dot ac dot uk 2007-06-10 19:32 --- I have been writing my own bounds-checker based on Mudflap. While doing so I had to tackle this same problem. My flatmate and I tracked it down to the fact that, although function parameters and variables are registered if their address is ever taken, the return value is not. This is a problem in return-by-value where the result is returned directly without an intermediate variable. For example: class bob { public: int i; bob(int n) { i = n; } }; bob f(int n) { return bob(n); } int main() { bob b = f(0); } Here bob is constructed directly in the return statement in f(). In GIMPLE this looks like: bob f(int) (n) { : __comp_ctor (&, n); return ; } Notice that has its address taken. Inside the constructor __comp_ctor() the object is created in the location given by . has not been registered by f() as return values are not registered, nor has it been registered by main() (where the object finally ends up) because nothing there uses its address. This happens a lot in the STL, hence why it shows up whenever template, map etc., are used: iterator begin() { return iterator (this->_M_impl._M_start); } which is gimplified to into: iterator begin() { comp_ctor (&, &this->_M_impl._M_start); return ; } If Mudflap is changed to register these return values, the violations go away :) I have created a patch that does this but, as I'm a relative newbie, it could all be complete rubbish in which case I apologise. This deals with the problem for the initial testcase, the simplified test by Frank Ch. Eigler and the test by Paul Pluzhnikov. It does not fix the others as these are caused by a different problem, namely objects created by external library calls are not registered by Mudflap and so it thinks there is a violation if you use one of these foreign pointers. I hope this helps and I would be very glad of feedback. Alex Lamaison -- awl03 at doc dot ic dot ac dot uk changed: What|Removed |Added CC||awl03 at doc dot ic dot ac ||dot uk http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319
[Bug libmudflap/19319] Mudflap produce many violations on simple, correct c++ program
--- Comment #28 from awl03 at doc dot ic dot ac dot uk 2007-06-10 19:35 --- Created an attachment (id=13673) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13673&action=view) Patch for tree-mudflap.c This is the patch mentioned in my explanation. It is against the 4.1.1 release source. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319
[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling
--- Comment #1 from rguenth at gcc dot gnu dot org 2007-06-10 20:07 --- Danny, as looked at restrict handling a few days ago - maybe you know instantly why it doesn't work ;) (apart from us not recomputing aliasing after loop optimizations on the tree level -- and the complete unrolling happens there) -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||dberlin at gcc dot gnu dot ||org, rguenth at gcc dot gnu ||dot org Keywords||alias http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #2 from hjl at lucon dot org 2007-06-10 20:12 --- (In reply to comment #1) > Revision 122814 is bad and revision 122792 is good. > Correction. Revision 122780 is bad and revision 122738 is good. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #3 from hjl at lucon dot org 2007-06-10 20:24 --- Revision 122748 is good. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #4 from hjl at lucon dot org 2007-06-10 20:42 --- Revision 122761 is bad. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #5 from hjl at lucon dot org 2007-06-10 20:58 --- I have verified that this patch: http://gcc.gnu.org/ml/gcc-patches/2007-03/msg00545.html causes this regression. -- hjl at lucon dot org changed: What|Removed |Added CC||aoliva at gcc dot gnu dot ||org OtherBugsDependingO||30643 nThis|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #19 from rguenther at suse dot de 2007-06-10 21:39 --- Subject: Re: Use reciprocal and reciprocal square root with -ffast-math On Sun, 10 Jun 2007, ubizjak at gmail dot com wrote: > > > --- Comment #18 from ubizjak at gmail dot com 2007-06-10 17:34 --- > (In reply to comment #14) > > The interesting difference between sqrtss, divss and rcpss, rsqrtss is that > > the former have throughput of 1/16 while the latter are 1/1 (latencies > > compare > > 21 vs. 3). This is on K10. The optimization guide only mentions > > calculating > > the reciprocal y = a/b via rcpss and the square root (!) via rsqrtss > > (sqrt a = 0.5 * a * rsqrtss(a) * (3.0 - a * rsqrtss(a) * rsqrtss(a))) > > > > So the optimization would be mainly to improve instruction throughput, not > > overall latency. > > If this is the case, then middle-end will need to fold sqrtss in different way > for targets that prefer rsqrtss. According to Comment #16, it is better to > fold > to 1.0/sqrt(c/b) instead of sqrt(b/c) because this way, we will loose one > multiplication during NR expansion by rsqrt [due to sqrt(x) <=> x * (1.0 / > sqrt(x))]. > > IMO we need a new tree code to handle reciprocal sqrt - RSQRT_EXPR, together > with proper folding functionality that expands directly to (NR-enhanced) rsqrt > optab. If we consider a*sqrt(b/c), then b/c will be expanded as b* NR-rcp(c) > [where NR-rcp stands for NR enhanced rcp] and sqrt will be expanded as > NR-rsqrt. In this case, I see no RTL pass that would be able to combine > everything together in order to swap (b/c) operands to produce NR-enhanced > a*rsqrt(c/b) equivalent. We just need a new builtin function, __builtin_rsqrt and at some stage replace reciprocals of sqrt with the new builtin. For example in tree-ssa-math-opts.c which does the existing reciprocal transforms. For example a target hook could be provided that would for example look like tree target_fn_for_expr (tree expr); and return a target builtin decl for the given expression. And we should start splitting this PR ;) One for a/sqrt(b/c) and one for the above transformation. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/32279] New: Fold 1.0/sqrt(x/y) to sqrt(y/x)
This may even work for -funsafe-math-optimizations only (we round differently). One has to enumerate all interesting cases (mainly x == 0) and see if NaN/Inf are properly preserved in all cases. -- Summary: Fold 1.0/sqrt(x/y) to sqrt(y/x) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rguenth at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32279
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #20 from rguenth at gcc dot gnu dot org 2007-06-10 21:46 --- PR32279 for 1/sqrt(x/y) to sqrt(y/x) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #21 from rguenth at gcc dot gnu dot org 2007-06-10 21:48 --- The other issue is really about this bug, so not splitting. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling
--- Comment #2 from dberlin at gcc dot gnu dot org 2007-06-10 22:41 --- Complete guess: alias.c relies not on TYPE_RESTRICT, but on DECL_BASED_ON_RESTRICT_P I never noticed we even had such a thing :) My guess is that loop unrolling makes new ssa names, and when they get transformed during un-ssa, this flag no longer exists on them. Realistically, may-alias should propagate the DECL_* stuff to SSA_NAME_PTR_INFO, which loop unrolling copies. When they get un-ssa'd, we should then copy the restrict info from the ssa name back to the base variable we create. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273
[Bug fortran/32235] [4.3 Regression] incorrectly position text file after backspace
--- Comment #8 from jvdelisle at gcc dot gnu dot org 2007-06-10 22:50 --- Subject: Bug 32235 Author: jvdelisle Date: Sun Jun 10 22:50:47 2007 New Revision: 125606 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125606 Log: 2007-06-10 Jerry DeLisle <[EMAIL PROTECTED]> PR libgfortran/32235 * io/transfer.c (st_read): Remove test for end of file condition. (next_record_r): Add test for end of file condition. Modified: trunk/libgfortran/ChangeLog trunk/libgfortran/io/transfer.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32235
[Bug c++/32089] Winline reports bogus warning
--- Comment #4 from mckelvey at maskull dot com 2007-06-10 22:52 --- Created an attachment (id=13674) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13674&action=view) Preprocessed source -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32089
[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling
--- Comment #3 from pinskia at gcc dot gnu dot org 2007-06-10 22:55 --- This works on the pointer_plus branch :) Also Predictive commoning fixes it up even without unrolling at the tree level so it works at -O3 (this is on the pointer_plus branch I have not tried on the mainline). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273
[Bug middle-end/32273] 'restrict' is forgotten after loop unrolling
--- Comment #4 from pinskia at gcc dot gnu dot org 2007-06-11 00:21 --- Yes this is fixed on the pointer_plus branch, the pointer_plus branch is better at keeping track of what the decl is the restrict pointer's base. -;; *D.1537 = *D.1539 + *D.1537 +;; *D.1538 = *D.1541 + *D.1538 (insn 14 13 15 t.c:16 (set (reg:SI 66) -(mem:SI (reg:SI 59 [ D.1539 ]) [8 S4 A32])) -1 (nil) +(mem:SI (reg:SI 59 [ D.1541 ]) [2 S4 A32])) -1 (nil) (nil)) (insn 15 14 0 t.c:16 (parallel [ -(set (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32]) -(plus:SI (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32]) +(set (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32]) +(plus:SI (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32]) (reg:SI 66))) (clobber (reg:CC 17 flags)) ]) -1 (nil) -(expr_list:REG_EQUAL (plus:SI (mem:SI (reg:SI 60 [ D.1537 ]) [7 S4 A32]) -(mem:SI (reg:SI 59 [ D.1539 ]) [8 S4 A32])) +(expr_list:REG_EQUAL (plus:SI (mem:SI (reg:SI 60 [ D.1538 ]) [2 S4 A32]) +(mem:SI (reg:SI 59 [ D.1541 ]) [2 S4 A32])) (nil))) See how the - has different aliasing sets than the +, the - has the correct aliasing set. So this is now mine. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |pinskia at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 GCC build triplet|i686-pc-linux-gnu | GCC host triplet|i686-pc-linux-gnu | GCC target triplet|i686-pc-linux-gnu | Last reconfirmed|-00-00 00:00:00 |2007-06-11 00:21:57 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32273
[Bug tree-optimization/29751] not optimizing access a[0] , a[1]
--- Comment #2 from pinskia at gcc dot gnu dot org 2007-06-11 00:30 --- Confirmed, this is only a tree level missed optimization. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Keywords||alias, TREE Last reconfirmed|-00-00 00:00:00 |2007-06-11 00:30:03 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29751
[Bug middle-end/14192] Restrict pointers don't help
--- Comment #10 from pinskia at gcc dot gnu dot org 2007-06-11 00:34 --- > The second case is the following loop: This is just caused by how we represent pointer addition. I have a fix for that one, we now get the correct aliasing sets for it. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Keywords||alias http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14192
[Bug tree-optimization/16913] [4.0/4.1/4.2/4.3 Regression] restrict does not make a difference
--- Comment #10 from pinskia at gcc dot gnu dot org 2007-06-11 00:47 --- There are a couple of issues here, first pointer_plus improves the aliasing set issue, but then PRE comes around and messes it up because it does not add pointer types which have DECL_BASED_ON_RESTRICT_P/DECL_GET_RESTRICT_BASE setup correctly. Disabling PRE on powerpc-linux-gnu (on the pointer_plus branch) is enough to get the RTL optimizers to optimize away the extra loads and we get for the inner loop: .L3: stfsx 0,9,3 addi 9,9,4 bdnz .L3 Which is almost the best you can do :). One more issue (for x86) is expand emits code that causes the rtl optimizers not to optimize well as they only look into loads in sets. I don't know how to fix that issue without fixing restrict at the tree level. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16913
[Bug tree-optimization/14187] [tree-ssa] restricted pointers should not alias on the tree level
--- Comment #5 from pinskia at gcc dot gnu dot org 2007-06-11 00:48 --- (In reply to comment #3) > Interestingly the following code is optimized: That is because we create a new may_alias variable for malloc to point to so we know that it cannot alias anything. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14187
[Bug tree-optimization/20643] [4.0/4.1/4.2/4.3 Regression] Tree loop optimizer does worse job than RTL loop optimizer
--- Comment #17 from pinskia at gcc dot gnu dot org 2007-06-11 00:53 --- the pointer_plus branch improves the code here (I can't tell if it fixes the problem fully). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20643
[Bug rtl-optimization/32280] New: _mm_srli_si128, heinous code for some shifts
I lack words to describe what happens on x86-64 to <--> #include __m128i foo(__m128i a) { return _mm_srli_si128(a, 8); } int main() { return 0; } <--> # /usr/local/gcc-4.2-20060916/bin/gcc -O1 pr-psrldq.c -o pr-psrldq 0040042e : 40042e: 66 0f 7f 44 24 d8 movdqa %xmm0,0xffd8(%rsp) 400434: 48 8b 54 24 e0 mov0xffe0(%rsp),%rdx 400439: 48 89 d0mov%rdx,%rax 40043c: 31 d2 xor%edx,%edx 40043e: 48 89 44 24 e8 mov%rax,0xffe8(%rsp) 400443: 48 89 54 24 f0 mov%rdx,0xfff0(%rsp) 400448: 66 0f 6f 44 24 e8 movdqa 0xffe8(%rsp),%xmm0 40044e: c3 retq gcc-4.3-20070105 is still that creative. As far as i know, it's specific to x86-64 but i'm not sure if other shifting ops or specific values also are pathologic. -- Summary: _mm_srli_si128, heinous code for some shifts Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: tbptbp at gmail dot com GCC host triplet: x86-64, linux, gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32280
[Bug rtl-optimization/32280] _mm_srli_si128, heinous code for some shifts
--- Comment #1 from tbptbp at gmail dot com 2007-06-11 03:02 --- s/gcc-4.3-20070105/gcc-4.3-20070608/ -- tbptbp at gmail dot com changed: What|Removed |Added Summary| _mm_srli_si128, heinous|_mm_srli_si128, heinous code |code for some shifts|for some shifts http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32280
[Bug fortran/32235] [4.3 Regression] incorrectly position text file after backspace
--- Comment #9 from jvdelisle at gcc dot gnu dot org 2007-06-11 03:06 --- Subject: Bug 32235 Author: jvdelisle Date: Mon Jun 11 03:06:01 2007 New Revision: 125611 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=125611 Log: 2007-06-10 Jerry DeLisle <[EMAIL PROTECTED]> PR libgfortran/32235 * gfortran.dg/backspace_9.f: New test. Added: trunk/gcc/testsuite/gfortran.dg/backspace_9.f Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32235
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #22 from tbptbp at gmail dot com 2007-06-11 03:32 --- I'm a bit late to the debate but... At some point icc did such transformations (for 1/x and sqrt) but, apparently, they're now removed. It didn't bother to plug every holes (ie wrt infinities) but at least got the case of 0 covered even when set lose; it's cheap to do. I've repeatedly been pointed to the peculiar semantic of -ffast-math in the past, so i know there's little chance for me to succeed, but would it be possible to consider that as an option? PS: Yes, i do rely on infinities and -ffast-math and deserve to die a slow and painful way. -- tbptbp at gmail dot com changed: What|Removed |Added CC||tbptbp at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug c++/32281] New: A problem of gcc4.1.0(O3 optimize)
When I user gcc4.1.0 to compile mysql4.1.22, I find some errors. I¡¯m not sure whether it¡¯s a gcc bug or not, so I need your help. The version of gcc: gcc -v Using built-in specs. Target: i586-suse-linux Configured with: ../configure --enable-threads=posix --prefix=/usr --with-local-prefix=/usr/local --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,objc,fortran,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.1.0 --enable-ssp --disable-libssp --enable-java-awt=gtk --enable-gtk-cairo --disable-libjava-multilib --with-slibdir=/lib --with-system-zlib --enable-shared --enable-__cxa_atexit --enable-libstdcxx-allocator=new --without-system-libunwind --with-cpu=generic --host=i586-suse-linux Thread model: posix gcc version 4.1.0 (SUSE Linux) Linux verson: Linux 2.6.16.21-0.8-TENCENT #1 SMP Sat Jan 13 19:17:08 CST 2007 i686 i686 i386 GNU/Linux Mysql4.1.22 is from http://mysql.he.net/Downloads/MySQL-4.1/mysql-4.1.22.tar.gz . The error is in function mysql_stmt_execute(THD *thd, char *packet, uint packet_length)( mysql-4.1.22/sql/sql_prepare.cc:1786). The file is complied by these arguments: g++ -DMYSQL_SERVER -DDEFAULT_MYSQL_HOME="\"/data/home/c4b/still/bin/mysql/\"" -DDATADIR="\"/data/home/c4b/still/bin/mysql//var\"" -DSHAREDIR="\"/data/home/c4b/still/bin/mysql//share/mysql\"" -DHAVE_CONFIG_H -I. -I. -I.. -I../innobase/include -I../include -I../include -I../regex -I. -O3 -DDBUG_OFF-fno-implicit-templates -fno-exceptions -fno-rtti -MT sql_prepare.o -MD -MP -MF ".deps/sql_prepare.Tpo" -g -c -o sql_prepare.o sql_prepare.cc In line 1822-1824 1822if (setup_conversion_functions(stmt, (uchar **) &packet, packet_end) || 1823 stmt->set_params(stmt, null_array, (uchar *) packet, packet_end, 1824 &expanded_query)) And the function ¡°setup_conversion_functions¡± is compiled as inline function. The lase sentence of in function setup_conversion_functions is *data= read_pos; The three sentences is compiled to 0x08197bff <_Z18mysql_stmt_executeP3THDPcj+703>: mov0xc(%ebp),%ecx 0x08197c02 <_Z18mysql_stmt_executeP3THDPcj+706>:mov 0xffc0(%ebp),%ebx 0x08197c05 <_Z18mysql_stmt_executeP3THDPcj+709>:mov 0xffb4(%ebp),%eax 0x08197c08 <_Z18mysql_stmt_executeP3THDPcj+712>:mov%ecx,0x8(%esp) 0x08197c0c <_Z18mysql_stmt_executeP3THDPcj+716>:mov 0xffd0(%ebp),%edx 0x08197c0f <_Z18mysql_stmt_executeP3THDPcj+719>:mov 0xffb8(%ebp),%ecx 0x08197c12 <_Z18mysql_stmt_executeP3THDPcj+722>:mov%ebx,0xc(%ebp) //*data= read_pos 0x08197c15 <_Z18mysql_stmt_executeP3THDPcj+725>:lea 0xffdc(%ebp),%ebx 0x08197c18 <_Z18mysql_stmt_executeP3THDPcj+728>:mov%ebx,0x10(%esp) 0x08197c1c <_Z18mysql_stmt_executeP3THDPcj+732>:mov%eax,0xc(%esp) 0x08197c20 <_Z18mysql_stmt_executeP3THDPcj+736>:mov%edx,0x4(%esp) 0x08197c24 <_Z18mysql_stmt_executeP3THDPcj+740>:mov%ecx,(%esp) 0x08197c27 <_Z18mysql_stmt_executeP3THDPcj+743>:call *0x764(%ecx) 0xc(%ebp) is the address of &packet(in function mysql_stmt_execute) and also the address of *data(in function setup_conversion_functions). In 703 and 712, we can see the value of 0xc(%ebp) is push to stack for the third argument of function stmt->set_params. The sentence 722 is for *data= read_pos, Move the read_pos to *data(address 0xc(%ebp)). So the third argument of function stmt->set_params use the old value not the new value. Am I right£¿Wait for your reply, and thank you very much. Best wishes, Still -- Summary: A problem of gcc4.1.0(O3 optimize) Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: stillzhang at tencent dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281
[Bug c++/32281] A problem of gcc4.1.0(O3 optimize)
--- Comment #1 from pinskia at gcc dot gnu dot org 2007-06-11 03:41 --- So packet is char*, and you are accessing it as uchar*, so this code is violating C/C++ aliasing rules. *** This bug has been marked as a duplicate of 21920 *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||DUPLICATE Summary|A problem of gcc4.1.0(O3|A problem of gcc4.1.0(O3 |optimize) |optimize) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281
[Bug c/21920] aliasing violations
--- Comment #113 from pinskia at gcc dot gnu dot org 2007-06-11 03:41 --- *** Bug 32281 has been marked as a duplicate of this bug. *** -- pinskia at gcc dot gnu dot org changed: What|Removed |Added CC||stillzhang at tencent dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21920
[Bug rtl-optimization/29589] incorrect conversion of (ior (ashiftrt (plus ...))) in combine.c
--- Comment #5 from pinskia at gcc dot gnu dot org 2007-06-11 04:44 --- I have a fix from our local tree which also fixes up the regression which we found with a different patch. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |pinskia at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29589
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #23 from ubizjak at gmail dot com 2007-06-11 05:51 --- (In reply to comment #22) > At some point icc did such transformations (for 1/x and sqrt) but, apparently, > they're now removed. It didn't bother to plug every holes (ie wrt infinities) > but at least got the case of 0 covered even when set lose; it's cheap to do. > I've repeatedly been pointed to the peculiar semantic of -ffast-math in the > past, so i know there's little chance for me to succeed, but would it be > possible to consider that as an option? But both, rcpss and rsqrtss handle infinties correctly (they return zero) and return [-]inf when [-]0.0 is used as an argument. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/31723] Use reciprocal and reciprocal square root with -ffast-math
--- Comment #24 from tbptbp at gmail dot com 2007-06-11 05:58 --- Yes, but there's some fuss at 0 when you pile up a NR round. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31723
[Bug middle-end/32077] [Regression 4.3] Profile-use: ICE: Segmentation fault
--- Comment #2 from burnus at gcc dot gnu dot org 2007-06-11 06:04 --- Seems to be fixed since 2007-06-07. -> Close PR. -- burnus at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED Target Milestone|--- |4.3.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32077
[Bug c++/32281] A problem of gcc4.1.0(O3 optimize)
--- Comment #2 from stillzhang at tencent dot com 2007-06-11 06:07 --- Thank you. But if i compiled it without -O3, it work fine. If I compiled it under gcc3.3 with -O3, it also work fine. The same program with different optimize has different, so i think it should not be like this. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32281
[Bug middle-end/32279] Fold 1.0/sqrt(x/y) to sqrt(y/x)
--- Comment #1 from ubizjak at gmail dot com 2007-06-11 06:36 --- Patch at http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00655.html Patch was also checked with 0.0, __builtin_inf and __builtin_nan, and the results were the same as for unpatched gcc for all combinations that were thrown in. -- ubizjak at gmail dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |ubizjak at gmail dot com |dot org | URL||http://gcc.gnu.org/ml/gcc- ||patches/2007- ||06/msg00655.html Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Keywords||patch Last reconfirmed|-00-00 00:00:00 |2007-06-11 06:36:21 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32279
[Bug target/32275] [4.3 Regression] : FAIL: gcc.c-torture/execute/va-arg-24.c execution
--- Comment #6 from bonzini at gnu dot org 2007-06-11 06:54 --- can you please show the difference in assembly code between the two? -- bonzini at gnu dot org changed: What|Removed |Added CC||bonzini at gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32275