Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Paolo Bonzini

On 11/12/2010 03:25 PM, H.J. Lu wrote:

IRA may move instructions across an unspec_volatile,


Do you have a testcase?

Paolo


[PATCH] i8k: Tell gcc that *regs gets clobbered

2010-11-13 Thread Jim Bos


More recent GCC caused the i8k driver to stop working, on Slackware
compiler was upgraded from gcc-4.4.4 to gcc-4.5.1 after which it
didn't work anymore, meaning the driver didn't load or gave total
nonsensical output.
As it turned out the asm(..) statement forgot to mention it modifies
the *regs variable.
Credits to Andi Kleen  and Andreas Schwab
 for providing the fix.

Signed-off-by: Jim Bos 

--- linux-2.6.36/drivers/char/i8k.c.ORIG2010-08-02 17:20:46.0 
+0200
+++ linux-2.6.36/drivers/char/i8k.c 2010-11-13 11:35:11.0 +0100
@@ -141,7 +141,7 @@
"lahf\n\t"
"shrl $8,%%eax\n\t"
"andl $1,%%eax\n"
-   :"=a"(rc)
+   :"=a"(rc), "+m" (*regs)
:"a"(regs)
:"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
 #else
@@ -166,7 +166,8 @@
"movl %%edx,0(%%eax)\n\t"
"lahf\n\t"
"shrl $8,%%eax\n\t"
-   "andl $1,%%eax\n":"=a"(rc)
+   "andl $1,%%eax\n"
+   :"=a"(rc), "+m" (*regs)
:"a"(regs)
:"%ebx", "%ecx", "%edx", "%esi", "%edi", "memory");
 #endif


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread H.J. Lu
On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini  wrote:
> On 11/12/2010 03:25 PM, H.J. Lu wrote:
>>
>> IRA may move instructions across an unspec_volatile,
>
> Do you have a testcase?
>

x86 has

;; Clear the upper 128bits of AVX registers, equivalent to a NOP
;; if the upper 128bits are unused.
(define_insn "avx_vzeroupper"
  [(unspec_volatile [(match_operand 0 "const_int_operand" "")]
UNSPECV_VZEROUPPER)]
  "TARGET_AVX"
  "vzeroupper"
  [(set_attr "type" "sse")
   (set_attr "modrm" "0")
   (set_attr "memory" "none")
   (set_attr "prefix" "vex")
   (set_attr "mode" "OI")])

It is no-nop, but it has to be in the place where it was expanded.
Since there is no register operand, IRA moves instructions across
it.  We have to undo IRA moves in ix86_reorg.

We have memory barrier. But we don't have register barrier.

-- 
H.J.


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Paolo Bonzini

On 11/13/2010 03:34 PM, H.J. Lu wrote:

On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini  wrote:

On 11/12/2010 03:25 PM, H.J. Lu wrote:


IRA may move instructions across an unspec_volatile,


Do you have a testcase?



x86 has

;; Clear the upper 128bits of AVX registers, equivalent to a NOP
;; if the upper 128bits are unused.
(define_insn "avx_vzeroupper"
   [(unspec_volatile [(match_operand 0 "const_int_operand" "")]
 UNSPECV_VZEROUPPER)]
   "TARGET_AVX"
   "vzeroupper"
   [(set_attr "type" "sse")
(set_attr "modrm" "0")
(set_attr "memory" "none")
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])

It is no-nop, but it has to be in the place where it was expanded.
Since there is no register operand, IRA moves instructions across
it.  We have to undo IRA moves in ix86_reorg.


That's because VZEROUPPER (and VZEROALL too, btw) has input and output 
operands that you are not modeling.  Undoing these moves in reorg seems 
very wrong to me, even though you need it anyway to delete them.


Paolo


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread H.J. Lu
On Sat, Nov 13, 2010 at 6:56 AM, Paolo Bonzini  wrote:
> On 11/13/2010 03:34 PM, H.J. Lu wrote:
>>
>> On Sat, Nov 13, 2010 at 2:27 AM, Paolo Bonzini  wrote:
>>>
>>> On 11/12/2010 03:25 PM, H.J. Lu wrote:

 IRA may move instructions across an unspec_volatile,
>>>
>>> Do you have a testcase?
>>>
>>
>> x86 has
>>
>> ;; Clear the upper 128bits of AVX registers, equivalent to a NOP
>> ;; if the upper 128bits are unused.
>> (define_insn "avx_vzeroupper"
>>   [(unspec_volatile [(match_operand 0 "const_int_operand" "")]
>>                     UNSPECV_VZEROUPPER)]
>>   "TARGET_AVX"
>>   "vzeroupper"
>>   [(set_attr "type" "sse")
>>    (set_attr "modrm" "0")
>>    (set_attr "memory" "none")
>>    (set_attr "prefix" "vex")
>>    (set_attr "mode" "OI")])
>>
>> It is no-nop, but it has to be in the place where it was expanded.
>> Since there is no register operand, IRA moves instructions across
>> it.  We have to undo IRA moves in ix86_reorg.
>
> That's because VZEROUPPER (and VZEROALL too, btw) has input and output

Please pay close attention to VZEROALL, which I enclosed below.

> operands that you are not modeling.  Undoing these moves in reorg seems very
> wrong to me, even though you need it anyway to delete them.
>

VZEROUPPER is no-nop to executions. But it isn't no-nop for performance.
That is why no instructions should be moved across it. But IRA doesn't support
this.

-- 
H.J.
--
(define_insn "*avx_vzeroall"
  [(match_parallel 0 "vzeroall_operation"
[(unspec_volatile [(const_int 0)] UNSPECV_VZEROALL)])]
  "TARGET_AVX"
  "vzeroall"
  [(set_attr "type" "sse")
   (set_attr "modrm" "0")
   (set_attr "memory" "none")
   (set_attr "prefix" "vex")
   (set_attr "mode" "OI")])


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Paolo Bonzini

On 11/13/2010 04:28 PM, H.J. Lu wrote:

VZEROUPPER is no-nop to executions. But it isn't no-nop for performance.


IIUC it's a noop as GCC uses it.  You could use it in 256-bit mode and 
it would be valid, but not a noop.


Paolo


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread H.J. Lu
On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini  wrote:
> On 11/13/2010 04:28 PM, H.J. Lu wrote:
>>
>> VZEROUPPER is no-nop to executions. But it isn't no-nop for performance.
>
> IIUC it's a noop as GCC uses it.  You could use it in 256-bit mode and it
> would be valid, but not a noop.
>

That is besides the point. I just pointed out that there was no register barrier
for IRA.


-- 
H.J.


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Paolo Bonzini

On 11/13/2010 05:10 PM, H.J. Lu wrote:

On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini  wrote:

On 11/13/2010 04:28 PM, H.J. Lu wrote:


VZEROUPPER is no-nop to executions. But it isn't no-nop for performance.


IIUC it's a noop as GCC uses it.  You could use it in 256-bit mode and it
would be valid, but not a noop.



That is besides the point. I just pointed out that there was no register barrier
for IRA.


And I was pointing out that the machine description is wrong :)

Paolo


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread H.J. Lu
On Sat, Nov 13, 2010 at 8:20 AM, Paolo Bonzini  wrote:
> On 11/13/2010 05:10 PM, H.J. Lu wrote:
>>
>> On Sat, Nov 13, 2010 at 8:01 AM, Paolo Bonzini  wrote:
>>>
>>> On 11/13/2010 04:28 PM, H.J. Lu wrote:

 VZEROUPPER is no-nop to executions. But it isn't no-nop for performance.
>>>
>>> IIUC it's a noop as GCC uses it.  You could use it in 256-bit mode and it
>>> would be valid, but not a noop.
>>>
>>
>> That is besides the point. I just pointed out that there was no register
>> barrier
>> for IRA.
>
> And I was pointing out that the machine description is wrong :)
>

Please tell me how I can tell IRA not to move ANY instructions
across an instruction which is an no-op to executions.


-- 
H.J.


Re: Discussion: What is unspec_volatile?

2010-11-13 Thread Peter Bergner
On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote:
> On 11/12/2010 03:25 PM, H.J. Lu wrote:
> > IRA may move instructions across an unspec_volatile,
> 
> Do you have a testcase?

Are you sure it's IRA and not our old friend update_equiv_regs()
which IRA calls?  http://gcc.gnu.org/PR41171 shows an example
where update_equiv_regs() moves code around.

Peter





Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread Xinliang David Li
I re-measured the performance difference using trunk gcc and trunk
clang/llvm on a core-2 box.  -fno-strict-aliasing is added to gcc
because clang/llvm's type based aliasing is not incomplete and not
enabled by default. I also added -fomit-frame-pointer to clang/llvm as
this is gcc's default. The base option is -O2.

32bit:

164.gzip12101239  2.44%
 175.vpr16621621 -2.42%
 181.mcf27333109 13.75%
  186.crafty18121721 -5.00%
  197.parser13281289 -2.92%
 253.perlbmk20862580 23.67%
 254.gap19681912 -2.86%
  255.vortex18421965  6.66%
   256.bzip214401553  7.82%
   300.twolf22842213 -3.08%


64bit:
164.gzip12681320  4.15%
 175.vpr16051534 -4.42%
 176.gcc22032315  5.08%
 181.mcf16251737  6.85%
  186.crafty24112307 -4.30%
  197.parser11731166 -0.57%
 252.eon22452464  9.72%
 253.perlbmk22142444 10.37%
 254.gap19871978 -0.47%
  255.vortex24972422 -3.00%
   256.bzip215851740  9.80%
   300.twolf22942281 -0.58%


Though gcc leads LLVM in performance overrall, there are a couple of
benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
twolf (32bit), vortex (64bit).  This needs to be triaged.   gcc
miscompiles gcc and eon in 32bit -- is there a bug tracking the
problem?

Thanks,

David


On Thu, Apr 29, 2010 at 9:25 AM, Vladimir Makarov  wrote:
>  GCC-4.5.0 and LLVM-2.7 were released recently.  To understand
> where we stand after releasing GCC-4.5.0 I benchmarked it on SPEC2000
> for x86/x86-64 and posted the comparison of it with the
> previous GCC releases and LLVM-2.7.
>
>  Even benchmarking SPEC2000 takes a lot of time on the fastest
> machine I have. So I don't plan to use SPEC2006 for this in near
> future.
>
>  You can find the comparison on
> http://vmakarov.fedorapeople.org/spec/ (please just click links at the
> bottom of the left frame starting with link "GCC release comparison").
>
>  If you need exact numbers, please use the tables (the links to them
> are also given) which were used to generate the corresponding bar
> graphs.
>
>
>  In general GCC-4.5.0 became faster (upto 10%) in -O2 mode.  This is
> first considerable compilation speed improvement since GCC-4.2.
> GCC-4.5.0 generates a better (1-2% in average upto 4% for x86-64
> SPECFP2000 in -O2 mode) code too in comparison with the previous
> release.  That is not including LTO and Graphite which can gives even
> more (especially LTO) in many cases.
>
>  GCC-4.5.0 has new big optimizations LTO and Graphite (more
> accurately graphite was introduced in the previous release).
> Therefore I ran additional benchmarks to test them.
>
>  LTO is a promising technology especially for integer benchmarks for
> which it results in smaller and faster code.  But it might result in
> degradations too on SPECFP2000 mainly because of big degradations on a
> few benchmarks like wupwise or facerec.  Another annoying thing about
> LTO, it considerably slows down the compiler.
>
>  Currently Graphite gives small improvements on x86 (one exception is
> 2% for peak x86 SPECFP2000) and mostly degradation on x86_64 (with
> maximum one more than 10% for SPECFP2000 because of big degradations
> on mgrid and swim).  So further work is needed on the project because
> it seems not mature yet.
>
>  As for LLVM, LLVM became slower (e.g. in comparison with llvm-2.5 on
> 15%-50% for x86-64).  So the gap between compilation speed of GCC and
> LLVM decreased and sometimes achieves 4% on x86_64 and 8% on x86 (both
> for SPECInt2000 in -O2 mode).  May be I am wrong but I don't think
> CLANG will improve this situation significantly (in -O2 and -O3 mode)
> because optimizations still take most of time of any serious
> optimizing compiler.
>
>  LLVM did a progress in code performance especially for floating
> point benchmarks.  But the gap between LLVM-2.7 and GCC-4.5 in peak
> performance (not including GCC LTO and Graphite) still 6-7% on
> SPECInt200 and 13-17% on SPECFP2000.
>
>  In general, IMHO GCC-4.5.0 is a good and promising release.
>
>


Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread Paolo Bonzini

On 11/13/2010 10:08 PM, Xinliang David Li wrote:

Though gcc leads LLVM in performance overrall, there are a couple of
benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
twolf (32bit), vortex (64bit).  This needs to be triaged.   gcc
miscompiles gcc and eon in 32bit -- is there a bug tracking the
problem?


Have you tried -ffast-math or -mfpmath=sse for eon?

Paolo


Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread Xinliang David Li
On Sat, Nov 13, 2010 at 2:39 PM, Paolo Bonzini  wrote:
> On 11/13/2010 10:08 PM, Xinliang David Li wrote:
>>
>> Though gcc leads LLVM in performance overrall, there are a couple of
>> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
>> twolf (32bit), vortex (64bit).  This needs to be triaged.   gcc
>> miscompiles gcc and eon in 32bit -- is there a bug tracking the
>> problem?
>
> Have you tried -ffast-math or -mfpmath=sse for eon?
>

-ffast-math is used on eon.

David

> Paolo
>


gcc-4.6-20101113 is now available

2010-11-13 Thread gccadmin
Snapshot gcc-4.6-20101113 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20101113/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 166720

You'll find:

 gcc-4.6-20101113.tar.bz2 Complete GCC (includes all of below)

  MD5=d2a3df4783e5f996385fd7f570f798ed
  SHA1=cd0f1a1a1ce996151816d0200877c75765bff089

 gcc-core-4.6-20101113.tar.bz2C front end and core compiler

  MD5=29b59036c7a1f235780f87c23a3fc0ac
  SHA1=d3dbfb3f4752f513f31043e0f3043bb7646b72aa

 gcc-ada-4.6-20101113.tar.bz2 Ada front end and runtime

  MD5=8d37bfd2cb222f0038d4b4386ca962c8
  SHA1=ee3769f9ac5a4fef2d25c62b7ed6b6a2186290ba

 gcc-fortran-4.6-20101113.tar.bz2 Fortran front end and runtime

  MD5=cb5d99f93bc9aac4fac4bcb30f8b4177
  SHA1=e1e3c2d823a3e4101f9a6b68331c4d37e6c2218b

 gcc-g++-4.6-20101113.tar.bz2 C++ front end and runtime

  MD5=57184a99e0cad49ad0f9eea9fa638056
  SHA1=aa8a95420db9ad2539c0f039014744afa950fafa

 gcc-java-4.6-20101113.tar.bz2Java front end and runtime

  MD5=91123af0d0ff0972d4214609178973f0
  SHA1=7411e551b512274b81741292a194b5c1c87c6218

 gcc-objc-4.6-20101113.tar.bz2Objective-C front end and runtime

  MD5=81786ba66b816aa1768719f710de7033
  SHA1=7bae331b5b3250316e1d5a8052b174a6eb9c73d9

 gcc-testsuite-4.6-20101113.tar.bz2   The GCC testsuite

  MD5=a12c003af226e05fc20bdbf440ea296d
  SHA1=f60266229443b94da53fad7f9663c83124a136f2

Diffs from 4.6-20101106 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

2010-11-13 Thread H.J. Lu
On Sat, Nov 13, 2010 at 1:08 PM, Xinliang David Li  wrote:
>
> Though gcc leads LLVM in performance overrall, there are a couple of
> benchmarks gcc is worse: vpr and crafty (64bit and 32bit), parser and
> twolf (32bit), vortex (64bit).  This needs to be triaged.   gcc
> miscompiles gcc and eon in 32bit -- is there a bug tracking the
> problem?
>

GCC trunk compiles and runs SPEC CPU 2K correctly at
-O2 and -O3 for both 32bit and 64bit on x86:

http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg00977.html
http://gcc.gnu.org/ml/gcc-testresults/2010-11/msg00983.html

You need alternate source for eon. I use:

252.eon=default=default=default:
CXXPORTABILITY = -DHAS_ERRLIST
EXTRA_CXXFLAGS=-ffast-math -mpc64
EXTRA_LDFLAGS = -ffast-math -mpc64
srcalt=gcc43

176.gcc=default=default=default:
CPORTABILITY  = -Dalloca=_alloca


-- 
H.J.