RE: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Bingfeng Mei
Thanks for nice benchmarks. Vladimir.

Why is GCC code size so much bigger than LLVM? Does -Ofast have more unrolling
on GCC? It doesn't seem increasing code size help performance (164.gzip & 
197.parser)
Is there comparisons for O2? I guess that is more useful for typical
mobile/embedded programmers.

Bingfeng

> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Vladimir Makarov
> Sent: 24 June 2014 16:07
> To: Ramana Radhakrishnan; gcc.gcc.gnu.org
> Subject: Re: Comparison of GCC-4.9 and LLVM-3.4 performance on
> SPECInt2000 for x86-64 and ARM
> 
> On 06/24/2014 10:57 AM, Ramana Radhakrishnan wrote:
> >
> > The ball-park number you have probably won't change much.
> >
> >>>
> >> Unfortunately, that is the configuration I can use on my system
> because
> >> of lack of libraries for other configurations.
> >
> > Using --with-fpu={neon / neon-vfpv4} shouldn't cause you ABI issues
> > with libraries for any other configurations. neon / neon-vfpv4 enable
> > use of the neon unit in a manner that is ABI compatible with the rest
> > of the system.
> >
> > For more on command line options for AArch32 and how they map to
> > various CPU's you might find this blog interesting.
> >
> > http://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-
> processors-and-gcc-command-lines
> >
> >
> >>
> >> I don't think Neon can improve score for SPECInt2000 significantly
> but
> >> may be I am wrong.
> >
> > It won't probably improve the overall score by a large amount but some
> > individual benchmarks will get some help.
> >
> There are some few benchmarks which benefit from autovectorization (eon
> particularly).
> >>> Did you add any other architecture specific options to your SPEC2k
> >>> runs ?
> >>>
> >>>
> >> No.  The only options I used are -Ofast.
> >>
> >> Could you recommend me what best options you think I should use for
> this
> >> processor.
> >>
> >
> > I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
> > --with-float=hard on this processor as that maps with the processor
> > available on that particular piece of Silicon.
> Thanks, Ramana.  Next time, I'll try these options.
> >
> > Also given it's a big LITTLE system with probably kernel switching -
> > it may be better to also make sure that you are always running on the
> > big core.
> >
> The results are pretty stable.  Also this version of Fedora does not
> implement switching from Big to Little processors.



Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Renato Golin
On 25 June 2014 10:26, Bingfeng Mei  wrote:
> Why is GCC code size so much bigger than LLVM? Does -Ofast have more unrolling
> on GCC? It doesn't seem increasing code size help performance (164.gzip & 
> 197.parser)
> Is there comparisons for O2? I guess that is more useful for typical
> mobile/embedded programmers.

Hi Bingfeng,

My analysis wasn't as thorough as Vladimir's, but I found that GCC
wasn't eliminating some large blocks of dead code or inlining as much
as LLVM was. I haven't dug deeper, though. Some of the differences
were quite big, I'd be surprised if it all can be explained by
unrolling loops and vectorization...

cheers,
--renato


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Bin.Cheng
On Wed, Jun 25, 2014 at 5:26 PM, Bingfeng Mei  wrote:
> Thanks for nice benchmarks. Vladimir.
>
> Why is GCC code size so much bigger than LLVM? Does -Ofast have more unrolling
On the contrary, I don't think rtl unrolling is enabled by default on
GCC with level O3/Ofast. There is no unroll dump file at all unless
-funroll-loops/-funroll-all-loops is explicitly specified.

Thanks,
bin

> on GCC? It doesn't seem increasing code size help performance (164.gzip & 
> 197.parser)
> Is there comparisons for O2? I guess that is more useful for typical
> mobile/embedded programmers.
>
> Bingfeng
>
>> -Original Message-
>> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
>> Vladimir Makarov
>> Sent: 24 June 2014 16:07
>> To: Ramana Radhakrishnan; gcc.gcc.gnu.org
>> Subject: Re: Comparison of GCC-4.9 and LLVM-3.4 performance on
>> SPECInt2000 for x86-64 and ARM
>>
>> On 06/24/2014 10:57 AM, Ramana Radhakrishnan wrote:
>> >
>> > The ball-park number you have probably won't change much.
>> >
>> >>>
>> >> Unfortunately, that is the configuration I can use on my system
>> because
>> >> of lack of libraries for other configurations.
>> >
>> > Using --with-fpu={neon / neon-vfpv4} shouldn't cause you ABI issues
>> > with libraries for any other configurations. neon / neon-vfpv4 enable
>> > use of the neon unit in a manner that is ABI compatible with the rest
>> > of the system.
>> >
>> > For more on command line options for AArch32 and how they map to
>> > various CPU's you might find this blog interesting.
>> >
>> > http://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-
>> processors-and-gcc-command-lines
>> >
>> >
>> >>
>> >> I don't think Neon can improve score for SPECInt2000 significantly
>> but
>> >> may be I am wrong.
>> >
>> > It won't probably improve the overall score by a large amount but some
>> > individual benchmarks will get some help.
>> >
>> There are some few benchmarks which benefit from autovectorization (eon
>> particularly).
>> >>> Did you add any other architecture specific options to your SPEC2k
>> >>> runs ?
>> >>>
>> >>>
>> >> No.  The only options I used are -Ofast.
>> >>
>> >> Could you recommend me what best options you think I should use for
>> this
>> >> processor.
>> >>
>> >
>> > I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
>> > --with-float=hard on this processor as that maps with the processor
>> > available on that particular piece of Silicon.
>> Thanks, Ramana.  Next time, I'll try these options.
>> >
>> > Also given it's a big LITTLE system with probably kernel switching -
>> > it may be better to also make sure that you are always running on the
>> > big core.
>> >
>> The results are pretty stable.  Also this version of Fedora does not
>> implement switching from Big to Little processors.
>


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Bin.Cheng
On Wed, Jun 25, 2014 at 5:47 PM, Bin.Cheng  wrote:
> On Wed, Jun 25, 2014 at 5:26 PM, Bingfeng Mei  wrote:
>> Thanks for nice benchmarks. Vladimir.
>>
>> Why is GCC code size so much bigger than LLVM? Does -Ofast have more 
>> unrolling
> On the contrary, I don't think rtl unrolling is enabled by default on
> GCC with level O3/Ofast. There is no unroll dump file at all unless
> -funroll-loops/-funroll-all-loops is explicitly specified.
Need to clarify, I did see cases in which GCC's rtl unroller more
aggressive than llvm's once it's specified.
>
> Thanks,
> bin
>
>> on GCC? It doesn't seem increasing code size help performance (164.gzip & 
>> 197.parser)
>> Is there comparisons for O2? I guess that is more useful for typical
>> mobile/embedded programmers.
>>
>> Bingfeng
>>
>>> -Original Message-
>>> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
>>> Vladimir Makarov
>>> Sent: 24 June 2014 16:07
>>> To: Ramana Radhakrishnan; gcc.gcc.gnu.org
>>> Subject: Re: Comparison of GCC-4.9 and LLVM-3.4 performance on
>>> SPECInt2000 for x86-64 and ARM
>>>
>>> On 06/24/2014 10:57 AM, Ramana Radhakrishnan wrote:
>>> >
>>> > The ball-park number you have probably won't change much.
>>> >
>>> >>>
>>> >> Unfortunately, that is the configuration I can use on my system
>>> because
>>> >> of lack of libraries for other configurations.
>>> >
>>> > Using --with-fpu={neon / neon-vfpv4} shouldn't cause you ABI issues
>>> > with libraries for any other configurations. neon / neon-vfpv4 enable
>>> > use of the neon unit in a manner that is ABI compatible with the rest
>>> > of the system.
>>> >
>>> > For more on command line options for AArch32 and how they map to
>>> > various CPU's you might find this blog interesting.
>>> >
>>> > http://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-
>>> processors-and-gcc-command-lines
>>> >
>>> >
>>> >>
>>> >> I don't think Neon can improve score for SPECInt2000 significantly
>>> but
>>> >> may be I am wrong.
>>> >
>>> > It won't probably improve the overall score by a large amount but some
>>> > individual benchmarks will get some help.
>>> >
>>> There are some few benchmarks which benefit from autovectorization (eon
>>> particularly).
>>> >>> Did you add any other architecture specific options to your SPEC2k
>>> >>> runs ?
>>> >>>
>>> >>>
>>> >> No.  The only options I used are -Ofast.
>>> >>
>>> >> Could you recommend me what best options you think I should use for
>>> this
>>> >> processor.
>>> >>
>>> >
>>> > I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
>>> > --with-float=hard on this processor as that maps with the processor
>>> > available on that particular piece of Silicon.
>>> Thanks, Ramana.  Next time, I'll try these options.
>>> >
>>> > Also given it's a big LITTLE system with probably kernel switching -
>>> > it may be better to also make sure that you are always running on the
>>> > big core.
>>> >
>>> The results are pretty stable.  Also this version of Fedora does not
>>> implement switching from Big to Little processors.
>>


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Richard Biener
On Wed, Jun 25, 2014 at 11:53 AM, Bin.Cheng  wrote:
> On Wed, Jun 25, 2014 at 5:47 PM, Bin.Cheng  wrote:
>> On Wed, Jun 25, 2014 at 5:26 PM, Bingfeng Mei  wrote:
>>> Thanks for nice benchmarks. Vladimir.
>>>
>>> Why is GCC code size so much bigger than LLVM? Does -Ofast have more 
>>> unrolling
>> On the contrary, I don't think rtl unrolling is enabled by default on
>> GCC with level O3/Ofast. There is no unroll dump file at all unless
>> -funroll-loops/-funroll-all-loops is explicitly specified.
> Need to clarify, I did see cases in which GCC's rtl unroller more
> aggressive than llvm's once it's specified.

At -O3 you get more aggressive complete peeling from the GIMPLE
cunroll pass.

Richard.

>> Thanks,
>> bin
>>
>>> on GCC? It doesn't seem increasing code size help performance (164.gzip & 
>>> 197.parser)
>>> Is there comparisons for O2? I guess that is more useful for typical
>>> mobile/embedded programmers.
>>>
>>> Bingfeng
>>>
 -Original Message-
 From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
 Vladimir Makarov
 Sent: 24 June 2014 16:07
 To: Ramana Radhakrishnan; gcc.gcc.gnu.org
 Subject: Re: Comparison of GCC-4.9 and LLVM-3.4 performance on
 SPECInt2000 for x86-64 and ARM

 On 06/24/2014 10:57 AM, Ramana Radhakrishnan wrote:
 >
 > The ball-park number you have probably won't change much.
 >
 >>>
 >> Unfortunately, that is the configuration I can use on my system
 because
 >> of lack of libraries for other configurations.
 >
 > Using --with-fpu={neon / neon-vfpv4} shouldn't cause you ABI issues
 > with libraries for any other configurations. neon / neon-vfpv4 enable
 > use of the neon unit in a manner that is ABI compatible with the rest
 > of the system.
 >
 > For more on command line options for AArch32 and how they map to
 > various CPU's you might find this blog interesting.
 >
 > http://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-
 processors-and-gcc-command-lines
 >
 >
 >>
 >> I don't think Neon can improve score for SPECInt2000 significantly
 but
 >> may be I am wrong.
 >
 > It won't probably improve the overall score by a large amount but some
 > individual benchmarks will get some help.
 >
 There are some few benchmarks which benefit from autovectorization (eon
 particularly).
 >>> Did you add any other architecture specific options to your SPEC2k
 >>> runs ?
 >>>
 >>>
 >> No.  The only options I used are -Ofast.
 >>
 >> Could you recommend me what best options you think I should use for
 this
 >> processor.
 >>
 >
 > I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
 > --with-float=hard on this processor as that maps with the processor
 > available on that particular piece of Silicon.
 Thanks, Ramana.  Next time, I'll try these options.
 >
 > Also given it's a big LITTLE system with probably kernel switching -
 > it may be better to also make sure that you are always running on the
 > big core.
 >
 The results are pretty stable.  Also this version of Fedora does not
 implement switching from Big to Little processors.
>>>


stdatomic.h and atomic_load_explicit()

2014-06-25 Thread Sebastian Huber

Hello,

GCC provides its own version of stdatomic.h since GCC 4.9.  Here we have:

#define atomic_load_explicit(PTR, MO)   \
  __extension__ \
  ({\
__auto_type __atomic_load_ptr = (PTR);  \
__typeof__ (*__atomic_load_ptr) __atomic_load_tmp;  \
__atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (MO));\
__atomic_load_tmp;  \
  })

According to

http://en.cppreference.com/w/c/atomic/atomic_load

(or in the standard "7.17.7.2 The atomic_load generic functions") we have

C atomic_load_explicit( volatile A* obj, memory_order order );

This test case

#include 

int ld(volatile atomic_int *i)
{
  return atomic_load_explicit(i, memory_order_relaxed);
}

yields on ARM

arm-rtems4.11-gcc -march=armv7-a -O2 test.c -S && cat test.s
.arch armv7-a
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file   "test.c"
.text
.align  2
.global ld
.type   ld, %function
ld:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r3, [r0]
sub sp, sp, #8
str r3, [sp, #4]
ldr r0, [sp, #4]
add sp, sp, #8
@ sp needed
bx  lr
.size   ld, .-ld
.ident  "GCC: (GNU) 4.9.1 20140515 (prerelease)

I think the inheritance of the volatile qualifier via __typeof__ 
(*__atomic_load_ptr) is an implementation flaw.


With the FreeBSD version of  I don't have this problem:

http://svnweb.freebsd.org/base/head/include/stdatomic.h?revision=234958&view=markup&pathrev=234958#l231

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: stdatomic.h and atomic_load_explicit()

2014-06-25 Thread Joseph S. Myers
On Wed, 25 Jun 2014, Sebastian Huber wrote:

> I think the inheritance of the volatile qualifier via __typeof__
> (*__atomic_load_ptr) is an implementation flaw.

See the comment in c_parser_typeof_specifier:

  /* For use in macros such as those in , remove
 _Atomic and const qualifiers from atomic types.  (Possibly
 all qualifiers should be removed; const can be an issue for
 more macros using typeof than just the 
 ones.)  */

(If changing this, remember to change the __auto_type handling as well.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Vladimir Makarov

On 2014-06-25, 5:32 AM, Renato Golin wrote:

On 25 June 2014 10:26, Bingfeng Mei  wrote:

Why is GCC code size so much bigger than LLVM? Does -Ofast have more unrolling
on GCC? It doesn't seem increasing code size help performance (164.gzip & 
197.parser)
Is there comparisons for O2? I guess that is more useful for typical
mobile/embedded programmers.


Hi Bingfeng,

My analysis wasn't as thorough as Vladimir's, but I found that GCC
wasn't eliminating some large blocks of dead code or inlining as much
as LLVM was.


  That might be a consequence of difference in aliasing I wrote about. 
 I looked at the code generated by LLVM and GCC of an interpreter and 
saw bigger code generated by GCC too.


  A sequence of bytecodes execution and each bytecode checks types of 
variables (small structure in memory) and set up values and types of 
results variables.  So GCC was worse to propagate the variable type info 
(e.g. int) in the bytecode sequence execution where it would be possible 
and remove unnecessary code (cases where other types, e.g. fp, is 
processed).  LLVM was more successful with this task.


 I haven't dug deeper, though. Some of the differences

were quite big, I'd be surprised if it all can be explained by
unrolling loops and vectorization...





Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Vladimir Makarov

On 2014-06-24, 10:57 AM, Ramana Radhakrishnan wrote:


The ball-park number you have probably won't change much.



I don't think Neon can improve score for SPECInt2000 significantly but
may be I am wrong.


It won't probably improve the overall score by a large amount but some
individual benchmarks will get some help.


Did you add any other architecture specific options to your SPEC2k
runs ?



No.  The only options I used are -Ofast.

Could you recommend me what best options you think I should use for this
processor.



I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
--with-float=hard on this processor as that maps with the processor
available on that particular piece of Silicon.



I've tried this options too.  As I guessed it resulted in GCC 
improvement of eon only by 6% which improved overall score by less 0.5%. 
 No change for LLVM though.  Eon is more fp benchmark with my point of 
view and it should be in SPECFP but it is a different story.


I've tried GCC SPECFP2000 also with and without this options and it gave 
about 12% improvement (1006 vs 988).  That is a *huge* improvement.  I 
guess using NEON for ARM is really important for fp benchmarks.




Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Richard Biener
On Wed, Jun 25, 2014 at 4:00 PM, Vladimir Makarov  wrote:
> On 2014-06-25, 5:32 AM, Renato Golin wrote:
>>
>> On 25 June 2014 10:26, Bingfeng Mei  wrote:
>>>
>>> Why is GCC code size so much bigger than LLVM? Does -Ofast have more
>>> unrolling
>>> on GCC? It doesn't seem increasing code size help performance (164.gzip &
>>> 197.parser)
>>> Is there comparisons for O2? I guess that is more useful for typical
>>> mobile/embedded programmers.
>>
>>
>> Hi Bingfeng,
>>
>> My analysis wasn't as thorough as Vladimir's, but I found that GCC
>> wasn't eliminating some large blocks of dead code or inlining as much
>> as LLVM was.
>
>
>   That might be a consequence of difference in aliasing I wrote about.  I
> looked at the code generated by LLVM and GCC of an interpreter and saw
> bigger code generated by GCC too.
>
>   A sequence of bytecodes execution and each bytecode checks types of
> variables (small structure in memory) and set up values and types of results
> variables.  So GCC was worse to propagate the variable type info (e.g. int)
> in the bytecode sequence execution where it would be possible and remove
> unnecessary code (cases where other types, e.g. fp, is processed).  LLVM was
> more successful with this task.

Maybe LLVM is just too aggressive here (but is lucky to not miscompile
this case).

Richard.

>
>  I haven't dug deeper, though. Some of the differences
>>
>> were quite big, I'd be surprised if it all can be explained by
>> unrolling loops and vectorization...
>>
>


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Jakub Jelinek
On Wed, Jun 25, 2014 at 04:02:49PM +0200, Richard Biener wrote:
> >   That might be a consequence of difference in aliasing I wrote about.  I
> > looked at the code generated by LLVM and GCC of an interpreter and saw
> > bigger code generated by GCC too.
> >
> >   A sequence of bytecodes execution and each bytecode checks types of
> > variables (small structure in memory) and set up values and types of results
> > variables.  So GCC was worse to propagate the variable type info (e.g. int)
> > in the bytecode sequence execution where it would be possible and remove
> > unnecessary code (cases where other types, e.g. fp, is processed).  LLVM was
> > more successful with this task.
> 
> Maybe LLVM is just too aggressive here (but is lucky to not miscompile
> this case).

Or just doesn't consider all stores as possibly changing the effective type,
only real placement new and first store to newly allocated heap memory?

Guess it would be nice to try a few simple testcases...

Jakub


[GSoC] Question about unit tests

2014-06-25 Thread Roman Gareev
Dear gcc contributors,

could you please answer a few questions about unit tests? Is it
possible to use them in gcc? Or maybe there is some analogue? I would
be very grateful for your comments.

--
   Cheers, Roman Gareev


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Vladimir Makarov

On 2014-06-25, 10:02 AM, Richard Biener wrote:

On Wed, Jun 25, 2014 at 4:00 PM, Vladimir Makarov  wrote:

On 2014-06-25, 5:32 AM, Renato Golin wrote:


On 25 June 2014 10:26, Bingfeng Mei  wrote:


Why is GCC code size so much bigger than LLVM? Does -Ofast have more
unrolling
on GCC? It doesn't seem increasing code size help performance (164.gzip &
197.parser)
Is there comparisons for O2? I guess that is more useful for typical
mobile/embedded programmers.



Hi Bingfeng,

My analysis wasn't as thorough as Vladimir's, but I found that GCC
wasn't eliminating some large blocks of dead code or inlining as much
as LLVM was.



   That might be a consequence of difference in aliasing I wrote about.  I
looked at the code generated by LLVM and GCC of an interpreter and saw
bigger code generated by GCC too.

   A sequence of bytecodes execution and each bytecode checks types of
variables (small structure in memory) and set up values and types of results
variables.  So GCC was worse to propagate the variable type info (e.g. int)
in the bytecode sequence execution where it would be possible and remove
unnecessary code (cases where other types, e.g. fp, is processed).  LLVM was
more successful with this task.


Maybe LLVM is just too aggressive here (but is lucky to not miscompile
this case).



Maybe.  But in this case LLVM did a right thing.  The variable 
addressing was through a restrict pointer.  Some (temporary) variables 
were reused and control flow insensitive aliasing in GCC (as I know that 
is what GCC have right now although I may be wrong because I never 
looked at code for aliasing) can not deal with this well.


  My overall impression of several years of benchmarking GCC and LLVM 
that LLVM is more buggy even on major x86/x86-64 paltform.  Also I found 
on this code, that GCC is much better with dead store elimination.  It 
removed a lot of them meanwhile LLVM does not removed anything.




Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Marc Glisse

On Wed, 25 Jun 2014, Vladimir Makarov wrote:

Maybe.  But in this case LLVM did a right thing.  The variable addressing was 
through a restrict pointer.


Ah, gcc implements (on purpose?) a weak version of restrict, where it only 
considers that 2 restrict pointers don't alias, whereas all other 
compilers assume that restrict pointers don't alias other non-derived 
pointers (see several PRs in bugzilla). I believe Richard recently added 
code that would make implementing the strong version of restrict easier. 
Maybe that's what is missing here?


--
Marc Glisse


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Vladimir Makarov

On 2014-06-25, 10:37 AM, Marc Glisse wrote:

On Wed, 25 Jun 2014, Vladimir Makarov wrote:


Maybe.  But in this case LLVM did a right thing.  The variable
addressing was through a restrict pointer.


Ah, gcc implements (on purpose?) a weak version of restrict, where it
only considers that 2 restrict pointers don't alias, whereas all other
compilers assume that restrict pointers don't alias other non-derived
pointers (see several PRs in bugzilla). I believe Richard recently added
code that would make implementing the strong version of restrict easier.
Maybe that's what is missing here?



May be. At least I saw 3 different restrict pointers in this code.


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Vladimir Makarov

On 2014-06-25, 10:01 AM, Vladimir Makarov wrote:

On 2014-06-24, 10:57 AM, Ramana Radhakrishnan wrote:




I've tried this options too.  As I guessed it resulted in GCC
improvement of eon only by 6% which improved overall score by less 0.5%.
  No change for LLVM though.  Eon is more fp benchmark with my point of
view and it should be in SPECFP but it is a different story.

I've tried GCC SPECFP2000 also with and without this options and it gave
about 12% improvement (1006 vs 988).  That is a *huge* improvement.  I
guess using NEON for ARM is really important for fp benchmarks.



Sorry, I miscalculated last improvement in m head.  It should be 1.8% 
not 12%.




Re: stdatomic.h and atomic_load_explicit()

2014-06-25 Thread Sebastian Huber

On 2014-06-25 15:25, Joseph S. Myers wrote:

On Wed, 25 Jun 2014, Sebastian Huber wrote:


I think the inheritance of the volatile qualifier via __typeof__
(*__atomic_load_ptr) is an implementation flaw.


See the comment in c_parser_typeof_specifier:

   /* For use in macros such as those in , remove
  _Atomic and const qualifiers from atomic types.  (Possibly
  all qualifiers should be removed; const can be an issue for
  more macros using typeof than just the 
  ones.)  */

(If changing this, remember to change the __auto_type handling as well.)



Thanks for the hint.  I sent a patch to the list.

https://gcc.gnu.org/ml/gcc-patches/2014-06/msg02026.html

In case __auto_type discards const and volatile qualifiers, then shouldn't this 
generate a warning (-Wconst-qual)


__auto_type __atomic_load_ptr = (PTR);

?

Why is it necessary to discard the const and/or volatile qualifiers in the 
__auto_type at all?  I think for  it should be sufficient to 
discard them only in __typeof__.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: stdatomic.h and atomic_load_explicit()

2014-06-25 Thread Joseph S. Myers
On Wed, 25 Jun 2014, Sebastian Huber wrote:

> In case __auto_type discards const and volatile qualifiers, then shouldn't
> this generate a warning (-Wconst-qual)
> 
> __auto_type __atomic_load_ptr = (PTR);
> 
> ?

No.  The discarding is for qualifiers on the type itself (remembering that 
qualifiers on rvalues aren't that well-defined in ISO C), not on the 
target of any pointer.

> Why is it necessary to discard the const and/or volatile qualifiers in the
> __auto_type at all?  I think for  it should be sufficient to
> discard them only in __typeof__.

__auto_type is, by design, meant to be consistent with typeof (while 
avoiding multiple evaluation when VLAs are involved, as well as avoiding 
exponential blowup of the size of the expansion with nested calls to 
macros).

-- 
Joseph S. Myers
jos...@codesourcery.com


gcc-4.9-20140625 is now available

2014-06-25 Thread gccadmin
Snapshot gcc-4.9-20140625 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140625/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 212001

You'll find:

 gcc-4.9-20140625.tar.bz2 Complete GCC

  MD5=a6343e12cae6b53721c9018729c12053
  SHA1=bfe774c73d99c10b4d4eb9d2fc72bf62638bf526

Diffs from 4.9-20140618 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.