The following code segfaults GCC 4.5 (and GCC trunk 159536) when compiled with
:
$ gcc -O1 -floop-parallelize-all test-17.c
Program received signal SIGSEGV, Segmentation fault.
0x77bce389 in cloog_domain_stride (domain=,
strided_level=, nb_par=,
stride=, offset=) at
source/ppl/domain
--- Comment #18 from nbenoit at tuxfamily dot org 2009-12-17 09:34 ---
(In reply to comment #17)
> (In reply to comment #16)
> > Created an attachment (id=19332)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19332&action=view) [edit]
> > Real fix
> >
--- Comment #17 from nbenoit at tuxfamily dot org 2009-12-17 09:32 ---
(In reply to comment #16)
> Created an attachment (id=19332)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19332&action=view) [edit]
> Real fix
>
> Now, before I blow it again, would you
--- Comment #12 from nbenoit at tuxfamily dot org 2009-12-16 12:55 ---
Created an attachment (id=19321)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19321&action=view)
Diff of the RTL expand dump between revisions 151079 and 151080
--
http://gcc.gnu.org/bugzilla/show_
--- Comment #11 from nbenoit at tuxfamily dot org 2009-12-16 12:53 ---
The fastest is the variant with more jumps (442/convol.s in the diff) generated
by GCC-4.4.2.
In the one jump variant (r155286/convol.s in the diff), I guess it is the
computing of both conditions before jumping
--- Comment #9 from nbenoit at tuxfamily dot org 2009-12-16 11:06 ---
Here is a unified diff which focuses on the inner-loop exit conditions.
--- 442/convol.s
+++ r155286/convol.s
.L3:
movl(%edx), %ebx
- imull (%esi,%eax,4), %ebx
+ imull H(,%eax,4), %ebx
--- Comment #8 from nbenoit at tuxfamily dot org 2009-12-16 10:34 ---
I am confused, a performance regression is still noticeable:
* Intel Xeon E5320 (x86_64 arch but gcc machine is i686-pc-linux-gnu), with -O1
flag
GCC-4.4.2 7364 ms
GCC-trunk-r155286 9515 ms
* Intel Xeon
--- Comment #4 from nbenoit at tuxfamily dot org 2009-12-01 10:11 ---
It seems that this regression first appeared with revision 151080
* with -O1
GCC-4.4.2 7.4 s
GCC-trunk-r151078 7.4 s
GCC-trunk-r151079 7.4 s
GCC-trunk-r151080 9.4 s
GCC-trunk-r151081 9.4 s
GCC-trunk
--- Comment #3 from nbenoit at tuxfamily dot org 2009-11-26 15:08 ---
Using integer instead of double, the performance difference is even more
noticeable :
* with -O1
GCC 4.4.2 7475 ms
GCC-trunk-r154672 9390 ms
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id
--- Comment #1 from nbenoit at tuxfamily dot org 2009-11-13 09:51 ---
Created an attachment (id=19010)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19010&action=view)
Source file with a convolution loop pattern.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42027
GCC trunk rev. 154141 seems to handle less efficiently a convolution code than
previous stable releases, it was also spotted in revision 153048.
Here are some average timings on an Intel E5320 clocked at 1.86 GHz with 4 MB
of L2 cache, Debian GNU/Linux with a 2.6.26 kernel.
* with -O2 -march=nati
11 matches
Mail list logo