GCC 9.0 Status Report (2018-10-17), Stage 3 starts Nov. 12th

2018-10-17 Thread Richard Biener


Status
==

GCC trunk is open for general development (Stage 1) until the end
of Nov 11th after which it will transition to bugfixing mode (Stage 3)
which in turn will end Jan 6th after which only regression and
documentation fixes will be possible.

This means you have an additional three weeks to push feature
changes for GCC 9.

Note bugs have not yet been prioritized thoroughly so there's no
meaningful Quality Data yet.


Previous Report
===

https://gcc.gnu.org/ml/gcc/2018-04/msg00156.html


This Week’s Fuel Prices

2018-10-17 Thread MEDCO sal
This Week’s Fuel Prices
if you can’t see the image below you can find it here 
(https://gfcmedia.com/medco/)

https://gfcmedia.com/medco/

Energetically Yours, MEDCO

To Stop Receiving our Emails, Please Reply with: REMOVE

( Some emails are scheduled in advanced, remove may take up some time )

Sent by GFC.media (https://gfcmedia.com/medco)   |  Unsubscribe 
(https://gfcmedia.com/register/)   |  Report Spam 
(https://gfcmedia.com/register/)

http://gfcmedia.com/
Send your emails with GFC and reach more audience
Beirut (+961) 71 629 666  Dubai (+971) 5  5 338

Disclaimer: GFC is not responsible for any contents linked to this email, as 
they are only related to the advertiser and do not represent GFC in any way.

This email was sent to gcc@gcc.gnu.org (mailto:gcc@gcc.gnu.org)
why did I get this? 
(https://battleparkae.us19.list-manage.com/about?u=90bac0906b280fc6945d7b273&id=4a9f8b0547&e=4aafd6bcaf&c=d024c643c8)
 unsubscribe from this list 
(https://battleparkae.us19.list-manage.com/unsubscribe?u=90bac0906b280fc6945d7b273&id=4a9f8b0547&e=4aafd6bcaf&c=d024c643c8)
 update subscription preferences 
(https://battleparkae.us19.list-manage.com/profile?u=90bac0906b280fc6945d7b273&id=4a9f8b0547&e=4aafd6bcaf)
BP AE . UAE . Dubai  . United Arab Emirates

RE: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-17 Thread Tamar Christina
Hi Richard,

> > [...]
> > 3) So I abandoned vec-patterns and instead tried to do it in
> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
> > is created.  Matching the SLP tree is quite simple and getting it to
> > emit the right SLP tree was simple enough,except that at this point
> > all data references and loads have already been calculated.
> 
> (3) seems like the way to go.  Can you explain in more detail why it didn't
> work?  The SLP tree after matching should look something like this:
> 
>   REALPART_EXPR <*_10> = _4;
>   IMAGPART_EXPR <*_10> = _13;
> 
>   _4 = .COMPLEX_ADD_ROT_90 (_12, _8)
>   _13 = .COMPLEX_ADD_ROT_90 (_22, _6)
> 
>   _12 = REALPART_EXPR <*_3>;
>   _22 = IMAGPART_EXPR <*_3>;
> 
>   _8 = REALPART_EXPR <*_5>;
>   _6 = IMAGPART_EXPR <*_5>;
> 
> The operands to the individual .COMPLEX_ADD_ROT_90s aren't the
> operands that actually determine the associated scalar result, but that's
> bound to be the case with something that includes an internal permute.  All
> we're trying to describe is an operation that does the right thing when
> vectorised.
> 
> If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back on mixed
> two-operator SLP, the final node would be in the opposite order:
> 
>   _6 = IMAGPART_EXPR <*_5>;
>   _8 = REALPART_EXPR <*_5>;
> 
> So if you're doing the matching after building the initial tree, you'd need to
> swap the statements in that node so that _8 comes first and cancel the
> associated load permute.  If you're doing the matching on the fly while
> building the SLP tree then the subnodes should start out in the right order.

Ah, I hadn't tried it this way because in the SLP version, I had originally 
started with looking
at the complex fma, which would have a considerably longer match pattern.

  _3 = c_14(D) + _2;
  _11 = REALPART_EXPR <*_3>;
  _21 = IMAGPART_EXPR <*_3>;
  _5 = a_15(D) + _2;
  _22 = REALPART_EXPR <*_5>;
  _12 = IMAGPART_EXPR <*_5>;
  _7 = b_16(D) + _2;
  _19 = REALPART_EXPR <*_7>;
  _20 = IMAGPART_EXPR <*_7>;
  _25 = _19 * _22;
  _26 = _12 * _20;
  _27 = _20 * _22;
  _28 = _12 * _19;
  _29 = _25 - _26;
  _30 = _27 + _28;
  _31 = _11 + _29;
  _32 = _21 + _30;
  REALPART_EXPR <*_3> = _31;
  IMAGPART_EXPR <*_3> = _32;

So In this case I should replace _31 and _32 right? but I can't remove the 
other statements otherwise it'll complain later about the missing references. I 
could replace _31 and _32 with something using all the variables I would need, 
however when I tried this previously in vect-patterns there was a block on 
build in functions with more than 4 arguments (and currently 3 is the limit for 
built in functions in the def file as well).  I don’t know if that same 
limitation is in place if I replace it in SLP.

The complex add basically creates this vector

 b⋅c - e⋅f + l 
 b⋅e + c⋅f + n

so I'd need 5 parameters and then I'm guessing the other expressions would be 
removed by DCE at some point?

Kind Regards,
Tamar

> 
> Thanks,
> Richard


[RFC][GCC][rs6000] Remaining work for inline expansion of strncmp/strcmp/memcmp for powerpc

2018-10-17 Thread Aaron Sawdey
I've previously posted a patch to add vector/vsx inline expansion of
strcmp/strncmp for the power8/power9 processors. Here are some of the
other items I have in the pipeline that I hope to get into gcc9:

* vector/vsx support for inline expansion of memcmp to non-loop code.
  This improves performance of small memcmp.
* vector/vsx support for inline expansion of memcmp to loop code. This
  will close the performance gap for lengths of about 128-512 bytes
  by making the loop code closer to the performance of the library
  memcmp.
* generate inline expansion to a loop for strcmp/strncmp. This closes
  another performance gap because strcmp/strncmp vector/vsx code
  currently generated is lots faster than the library call but we
  only generate comparison of 64 bytes to avoid exploding code size.
  Similar code in a loop would be compact and allow inline comparison
  of maybe the first 512 bytes inline before dumping to the library
  function.

If anyone has any other input on the inline expansion work I've been
doing for the rs6000 target, please let me know.

Thanks!
Aaron


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-17 Thread Richard Sandiford
Tamar Christina  writes:
> Hi Richard,
>> > [...]
>> > 3) So I abandoned vec-patterns and instead tried to do it in
>> > tree-vect-slp.c in vect_analyze_slp_instance just after the SLP tree
>> > is created.  Matching the SLP tree is quite simple and getting it to
>> > emit the right SLP tree was simple enough,except that at this point
>> > all data references and loads have already been calculated.
>> 
>> (3) seems like the way to go.  Can you explain in more detail why it didn't
>> work?  The SLP tree after matching should look something like this:
>> 
>>   REALPART_EXPR <*_10> = _4;
>>   IMAGPART_EXPR <*_10> = _13;
>> 
>>   _4 = .COMPLEX_ADD_ROT_90 (_12, _8)
>>   _13 = .COMPLEX_ADD_ROT_90 (_22, _6)
>> 
>>   _12 = REALPART_EXPR <*_3>;
>>   _22 = IMAGPART_EXPR <*_3>;
>> 
>>   _8 = REALPART_EXPR <*_5>;
>>   _6 = IMAGPART_EXPR <*_5>;
>> 
>> The operands to the individual .COMPLEX_ADD_ROT_90s aren't the
>> operands that actually determine the associated scalar result, but that's
>> bound to be the case with something that includes an internal permute.  All
>> we're trying to describe is an operation that does the right thing when
>> vectorised.
>> 
>> If you didn't have the .COMPLEX_ADD_ROT_90 and just fell back on mixed
>> two-operator SLP, the final node would be in the opposite order:
>> 
>>   _6 = IMAGPART_EXPR <*_5>;
>>   _8 = REALPART_EXPR <*_5>;
>> 
>> So if you're doing the matching after building the initial tree, you'd need 
>> to
>> swap the statements in that node so that _8 comes first and cancel the
>> associated load permute.  If you're doing the matching on the fly while
>> building the SLP tree then the subnodes should start out in the right order.
>
> Ah, I hadn't tried it this way because in the SLP version, I had originally 
> started with looking
> at the complex fma, which would have a considerably longer match pattern.
>
>   _3 = c_14(D) + _2;
>   _11 = REALPART_EXPR <*_3>;
>   _21 = IMAGPART_EXPR <*_3>;
>   _5 = a_15(D) + _2;
>   _22 = REALPART_EXPR <*_5>;
>   _12 = IMAGPART_EXPR <*_5>;
>   _7 = b_16(D) + _2;
>   _19 = REALPART_EXPR <*_7>;
>   _20 = IMAGPART_EXPR <*_7>;
>   _25 = _19 * _22;
>   _26 = _12 * _20;
>   _27 = _20 * _22;
>   _28 = _12 * _19;
>   _29 = _25 - _26;
>   _30 = _27 + _28;
>   _31 = _11 + _29;
>   _32 = _21 + _30;
>   REALPART_EXPR <*_3> = _31;
>   IMAGPART_EXPR <*_3> = _32;
>
> So In this case I should replace _31 and _32 right? but I can't remove the 
> other statements otherwise it'll complain later about the missing references. 
> I could replace _31 and _32 with something using all the variables I would 
> need, however when I tried this previously in vect-patterns there was a block 
> on build in functions with more than 4 arguments (and currently 3 is the 
> limit for built in functions in the def file as well).  I don’t know if that 
> same limitation is in place if I replace it in SLP.
>
> The complex add basically creates this vector
>
>  b⋅c - e⋅f + l 
>  b⋅e + c⋅f + n
>
> so I'd need 5 parameters and then I'm guessing the other expressions would be 
> removed by DCE at some point?

Are you planning to make the FCMLA behaviour directly available
as an internal function or provide a higher-level one that does
a full complex multiply, with the target lowering that into
individual instructions where necessary?

Either way, each individual FCMLA should only need three scalar inputs.
Like with FCADD, it doesn't matter whether the operands to the individual
scalar FCMLAs are the ones (or the only ones) that determine the
associated FCMLA scalar result.  All the node needs to do is describe
something that would work when vectorised.

What to do with the intermediate results you don't need is an
interesting question :-).  Like you say, I was hoping DCE would
get rid of them later.  Does that not work?

Thanks,
Richard


Re: [RFC][GCC][rs6000] Remaining work for inline expansion of strncmp/strcmp/memcmp for powerpc

2018-10-17 Thread Florian Weimer
* Aaron Sawdey:

> I've previously posted a patch to add vector/vsx inline expansion of
> strcmp/strncmp for the power8/power9 processors. Here are some of the
> other items I have in the pipeline that I hope to get into gcc9:
>
> * vector/vsx support for inline expansion of memcmp to non-loop code.
>   This improves performance of small memcmp.
> * vector/vsx support for inline expansion of memcmp to loop code. This
>   will close the performance gap for lengths of about 128-512 bytes
>   by making the loop code closer to the performance of the library
>   memcmp.
> * generate inline expansion to a loop for strcmp/strncmp. This closes
>   another performance gap because strcmp/strncmp vector/vsx code
>   currently generated is lots faster than the library call but we
>   only generate comparison of 64 bytes to avoid exploding code size.
>   Similar code in a loop would be compact and allow inline comparison
>   of maybe the first 512 bytes inline before dumping to the library
>   function.
>
> If anyone has any other input on the inline expansion work I've been
> doing for the rs6000 target, please let me know.

The inline expansion of strcmp is problematic for valgrind:

  

We currently see around 0.5 KiB of instructions for each call to
strcmp.  I find it hard to believe that this improves general system
performance except in micro-benchmarks.


gcc-6-20181017 is now available

2018-10-17 Thread gccadmin
Snapshot gcc-6-20181017 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20181017/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 265250

You'll find:

 gcc-6-20181017.tar.xzComplete GCC

  SHA256=5396a3877a19e314db33943f9d63a2516ef867cc95912e96234ba7bf34a22cbc
  SHA1=84ded341256b032c279548e96bce58b19784b7f9

Diffs from 6-20181010 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.