Re: Discussion: What is unspec_volatile?
On 11/12/2010 03:25 PM, H.J. Lu wrote: IRA may move instructions across an unspec_volatile, Do you have a testcase? Paolo
[PATCH] i8k: Tell gcc that *regs gets clobbered
More recent GCC caused the i8k driver to stop working, on Slackware compiler was upgraded from gcc-4.4.4 to gcc-4.5.1 after which it didn't work anymore, meaning the driver didn't load or gave total nonsensical output. As it turned out the asm(..) statement forgot to mention it modifies the *regs variable. Credits to Andi Kleen and Andreas Schwab for providing the fix. Signed-off-by: Jim Bos --- linux-2.6.36/drivers/char/i8k.c.ORIG2010-08-02 17:20:46.0 +0200 +++ linux-2.6.36/drivers/char/i8k.c 2010-11-13 11:35:11.0 +0100 @@ -141,7 +141,7 @@ "lahf\n\t" "shrl $8,%%eax\n\t" "andl $1,%%eax\n" - :"=a"(rc) + :"=a"(rc), "+m" (*regs) :"a"(regs) :"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory"); #else @@ -166,7 +166,8 @@ "movl %%edx,0(%%eax)\n\t" "lahf\n\t" "shrl $8,%%eax\n\t" - "andl $1,%%eax\n":"=a"(rc) + "andl $1,%%eax\n" + :"=a"(rc), "+m" (*regs) :"a"(regs) :"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory"); #endif
Re: Discussion: What is unspec_volatile?
On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini wrote: > On 11/12/2010 03:25 PM, H.J. Lu wrote: >> >> IRA may move instructions across an unspec_volatile, > > Do you have a testcase? > x86 has ;; Clear the upper 128bits of AVX registers, equivalent to a NOP ;; if the upper 128bits are unused. (define_insn "avx_vzeroupper" [(unspec_volatile [(match_operand 0 "const_int_operand" "")] UNSPECV_VZEROUPPER)] "TARGET_AVX" "vzeroupper" [(set_attr "type" "sse") (set_attr "modrm" "0") (set_attr "memory" "none") (set_attr "prefix" "vex") (set_attr "mode" "OI")]) It is no-nop, but it has to be in the place where it was expanded. Since there is no register operand, IRA moves instructions across it. We have to undo IRA moves in ix86_reorg. We have memory barrier. But we don't have register barrier. -- H.J.
Re: Discussion: What is unspec_volatile?
On 11/13/2010 03:34 PM, H.J. Lu wrote: On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini wrote: On 11/12/2010 03:25 PM, H.J. Lu wrote: IRA may move instructions across an unspec_volatile, Do you have a testcase? x86 has ;; Clear the upper 128bits of AVX registers, equivalent to a NOP ;; if the upper 128bits are unused. (define_insn "avx_vzeroupper" [(unspec_volatile [(match_operand 0 "const_int_operand" "")] UNSPECV_VZEROUPPER)] "TARGET_AVX" "vzeroupper" [(set_attr "type" "sse") (set_attr "modrm" "0") (set_attr "memory" "none") (set_attr "prefix" "vex") (set_attr "mode" "OI")]) It is no-nop, but it has to be in the place where it was expanded. Since there is no register operand, IRA moves instructions across it. We have to undo IRA moves in ix86_reorg. That's because VZEROUPPER (and VZEROALL too, btw) has input and output operands that you are not modeling. Undoing these moves in reorg seems very wrong to me, even though you need it anyway to delete them. Paolo
Re: Discussion: What is unspec_volatile?
On Sat, Nov 13, 2010 at 6:56 AM, Paolo Bonzini wrote: > On 11/13/2010 03:34 PM, H.J. Lu wrote: >> >> On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini wrote: >>> >>> On 11/12/2010 03:25 PM, H.J. Lu wrote: IRA may move instructions across an unspec_volatile, >>> >>> Do you have a testcase? >>> >> >> x86 has >> >> ;; Clear the upper 128bits of AVX registers, equivalent to a NOP >> ;; if the upper 128bits are unused. >> (define_insn "avx_vzeroupper" >> [(unspec_volatile [(match_operand 0 "const_int_operand" "")] >> UNSPECV_VZEROUPPER)] >> "TARGET_AVX" >> "vzeroupper" >> [(set_attr "type" "sse") >> (set_attr "modrm" "0") >> (set_attr "memory" "none") >> (set_attr "prefix" "vex") >> (set_attr "mode" "OI")]) >> >> It is no-nop, but it has to be in the place where it was expanded. >> Since there is no register operand, IRA moves instructions across >> it. We have to undo IRA moves in ix86_reorg. > > That's because VZEROUPPER (and VZEROALL too, btw) has input and output Please pay close attention to VZEROALL, which I enclosed below. > operands that you are not modeling. Undoing these moves in reorg seems very > wrong to me, even though you need it anyway to delete them. > VZEROUPPER is no-nop to executions. But it isn't no-nop for performance. That is why no instructions should be moved across it. But IRA doesn't support this. -- H.J. -- (define_insn "*avx_vzeroall" [(match_parallel 0 "vzeroall_operation" [(unspec_volatile [(const_int 0)] UNSPECV_VZEROALL)])] "TARGET_AVX" "vzeroall" [(set_attr "type" "sse") (set_attr "modrm" "0") (set_attr "memory" "none") (set_attr "prefix" "vex") (set_attr "mode" "OI")])
Re: Discussion: What is unspec_volatile?
On 11/13/2010 04:28 PM, H.J. Lu wrote: VZEROUPPER is no-nop to executions. But it isn't no-nop for performance. IIUC it's a noop as GCC uses it. You could use it in 256-bit mode and it would be valid, but not a noop. Paolo
Re: Discussion: What is unspec_volatile?
On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini wrote: > On 11/13/2010 04:28 PM, H.J. Lu wrote: >> >> VZEROUPPER is no-nop to executions. But it isn't no-nop for performance. > > IIUC it's a noop as GCC uses it. You could use it in 256-bit mode and it > would be valid, but not a noop. > That is besides the point. I just pointed out that there was no register barrier for IRA. -- H.J.
Re: Discussion: What is unspec_volatile?
On 11/13/2010 05:10 PM, H.J. Lu wrote: On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini wrote: On 11/13/2010 04:28 PM, H.J. Lu wrote: VZEROUPPER is no-nop to executions. But it isn't no-nop for performance. IIUC it's a noop as GCC uses it. You could use it in 256-bit mode and it would be valid, but not a noop. That is besides the point. I just pointed out that there was no register barrier for IRA. And I was pointing out that the machine description is wrong :) Paolo
Re: Discussion: What is unspec_volatile?
On Sat, Nov 13, 2010 at 8:20 AM, Paolo Bonzini wrote: > On 11/13/2010 05:10 PM, H.J. Lu wrote: >> >> On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini wrote: >>> >>> On 11/13/2010 04:28 PM, H.J. Lu wrote: VZEROUPPER is no-nop to executions. But it isn't no-nop for performance. >>> >>> IIUC it's a noop as GCC uses it. You could use it in 256-bit mode and it >>> would be valid, but not a noop. >>> >> >> That is besides the point. I just pointed out that there was no register >> barrier >> for IRA. > > And I was pointing out that the machine description is wrong :) > Please tell me how I can tell IRA not to move ANY instructions across an instruction which is an no-op to executions. -- H.J.
Re: Discussion: What is unspec_volatile?
On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote: > On 11/12/2010 03:25 PM, H.J. Lu wrote: > > IRA may move instructions across an unspec_volatile, > > Do you have a testcase? Are you sure it's IRA and not our old friend update_equiv_regs() which IRA calls? http://gcc.gnu.org/PR41171 shows an example where update_equiv_regs() moves code around. Peter
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
I re-measured the performance difference using trunk gcc and trunk clang/llvm on a core-2 box. -fno-strict-aliasing is added to gcc because clang/llvm's type based aliasing is not incomplete and not enabled by default. I also added -fomit-frame-pointer to clang/llvm as this is gcc's default. The base option is -O2. 32bit: 164.gzip12101239 2.44% 175.vpr16621621 -2.42% 181.mcf27333109 13.75% 186.crafty18121721 -5.00% 197.parser13281289 -2.92% 253.perlbmk20862580 23.67% 254.gap19681912 -2.86% 255.vortex18421965 6.66% 256.bzip214401553 7.82% 300.twolf22842213 -3.08% 64bit: 164.gzip12681320 4.15% 175.vpr16051534 -4.42% 176.gcc22032315 5.08% 181.mcf16251737 6.85% 186.crafty24112307 -4.30% 197.parser11731166 -0.57% 252.eon22452464 9.72% 253.perlbmk22142444 10.37% 254.gap19871978 -0.47% 255.vortex24972422 -3.00% 256.bzip215851740 9.80% 300.twolf22942281 -0.58% Though gcc leads LLVM in performance overrall, there are a couple of benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and twolf (32bit), vortex (64bit). This needs to be triaged. gcc miscompiles gcc and eon in 32bit -- is there a bug tracking the problem? Thanks, David On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov wrote: > GCC-4.5.0 and LLVM-2.7 were released recently. To understand > where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000 > for x86/x86-64 and posted the comparison of it with the > previous GCC releases and LLVM-2.7. > > Even benchmarking SPEC2000 takes a lot of time on the fastest > machine I have. So I don't plan to use SPEC2006 for this in near > future. > > You can find the comparison on > http://vmakarov.fedorapeople.org/spec/ (please just click links at the > bottom of the left frame starting with link "GCC release comparison"). > > If you need exact numbers, please use the tables (the links to them > are also given) which were used to generate the corresponding bar > graphs. > > > In general GCC-4.5.0 became faster (upto 10%) in -O2 mode. This is > first considerable compilation speed improvement since GCC-4.2. > GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64 > SPECFP2000 in -O2 mode) code too in comparison with the previous > release. That is not including LTO and Graphite which can gives even > more (especially LTO) in many cases. > > GCC-4.5.0 has new big optimizations LTO and Graphite (more > accurately graphite was introduced in the previous release). > Therefore I ran additional benchmarks to test them. > > LTO is a promising technology especially for integer benchmarks for > which it results in smaller and faster code. But it might result in > degradations too on SPECFP2000 mainly because of big degradations on a > few benchmarks like wupwise or facerec. Another annoying thing about > LTO, it considerably slows down the compiler. > > Currently Graphite gives small improvements on x86 (one exception is > 2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with > maximum one more than 10% for SPECFP2000 because of big degradations > on mgrid and swim). So further work is needed on the project because > it seems not mature yet. > > As for LLVM, LLVM became slower (e.g. in comparison with llvm-2.5 on > 15%-50% for x86-64). So the gap between compilation speed of GCC and > LLVM decreased and sometimes achieves 4% on x86_64 and 8% on x86 (both > for SPECInt2000 in -O2 mode). May be I am wrong but I don't think > CLANG will improve this situation significantly (in -O2 and -O3 mode) > because optimizations still take most of time of any serious > optimizing compiler. > > LLVM did a progress in code performance especially for floating > point benchmarks. But the gap between LLVM-2.7 and GCC-4.5 in peak > performance (not including GCC LTO and Graphite) still 6-7% on > SPECInt200 and 13-17% on SPECFP2000. > > In general, IMHO GCC-4.5.0 is a good and promising release. > >
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
On 11/13/2010 10:08 PM, Xinliang David Li wrote: Though gcc leads LLVM in performance overrall, there are a couple of benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and twolf (32bit), vortex (64bit). This needs to be triaged. gcc miscompiles gcc and eon in 32bit -- is there a bug tracking the problem? Have you tried -ffast-math or -mfpmath=sse for eon? Paolo
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
On Sat, Nov 13, 2010 at 2:39 PM, Paolo Bonzini wrote: > On 11/13/2010 10:08 PM, Xinliang David Li wrote: >> >> Though gcc leads LLVM in performance overrall, there are a couple of >> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and >> twolf (32bit), vortex (64bit). This needs to be triaged. gcc >> miscompiles gcc and eon in 32bit -- is there a bug tracking the >> problem? > > Have you tried -ffast-math or -mfpmath=sse for eon? > -ffast-math is used on eon. David > Paolo >
gcc-4.6-20101113 is now available
Snapshot gcc-4.6-20101113 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20101113/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 166720 You'll find: gcc-4.6-20101113.tar.bz2 Complete GCC (includes all of below) MD5=d2a3df4783e5f996385fd7f570f798ed SHA1=cd0f1a1a1ce996151816d0200877c75765bff089 gcc-core-4.6-20101113.tar.bz2C front end and core compiler MD5=29b59036c7a1f235780f87c23a3fc0ac SHA1=d3dbfb3f4752f513f31043e0f3043bb7646b72aa gcc-ada-4.6-20101113.tar.bz2 Ada front end and runtime MD5=8d37bfd2cb222f0038d4b4386ca962c8 SHA1=ee3769f9ac5a4fef2d25c62b7ed6b6a2186290ba gcc-fortran-4.6-20101113.tar.bz2 Fortran front end and runtime MD5=cb5d99f93bc9aac4fac4bcb30f8b4177 SHA1=e1e3c2d823a3e4101f9a6b68331c4d37e6c2218b gcc-g++-4.6-20101113.tar.bz2 C++ front end and runtime MD5=57184a99e0cad49ad0f9eea9fa638056 SHA1=aa8a95420db9ad2539c0f039014744afa950fafa gcc-java-4.6-20101113.tar.bz2Java front end and runtime MD5=91123af0d0ff0972d4214609178973f0 SHA1=7411e551b512274b81741292a194b5c1c87c6218 gcc-objc-4.6-20101113.tar.bz2Objective-C front end and runtime MD5=81786ba66b816aa1768719f710de7033 SHA1=7bae331b5b3250316e1d5a8052b174a6eb9c73d9 gcc-testsuite-4.6-20101113.tar.bz2 The GCC testsuite MD5=a12c003af226e05fc20bdbf440ea296d SHA1=f60266229443b94da53fad7f9663c83124a136f2 Diffs from 4.6-20101106 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64
On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li wrote: > > Though gcc leads LLVM in performance overrall, there are a couple of > benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and > twolf (32bit), vortex (64bit). This needs to be triaged. gcc > miscompiles gcc and eon in 32bit -- is there a bug tracking the > problem? > GCC trunk compiles and runs SPEC CPU 2K correctly at -O2 and -O3 for both 32bit and 64bit on x86: http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg00977.html http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg00983.html You need alternate source for eon. I use: 252.eon=default=default=default: CXXPORTABILITY = -DHAS_ERRLIST EXTRA_CXXFLAGS=-ffast-math -mpc64 EXTRA_LDFLAGS = -ffast-math -mpc64 srcalt=gcc43 176.gcc=default=default=default: CPORTABILITY = -Dalloca=_alloca -- H.J.