testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Rainer Emrich
There are new testsuite results for x86_64-w64-mingw32, for versions
6.4.1, 7.3.1, 8.2.1 and 9.0.0.

https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02309.html
https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02310.html
https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02312.html
https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02313.html

Complete build and test logs are available for download here:
https://cloud.emrich-ebersheim.de/index.php/s/g9D245XdCW6GD5W

Have fun

Rainer



signature.asc
Description: OpenPGP digital signature


Re: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Eric Botcazou
> There are new testsuite results for x86_64-w64-mingw32, for versions
> 6.4.1, 7.3.1, 8.2.1 and 9.0.0.
> 
> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02309.html
> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02310.html
> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02312.html

So we have an additional ACATS failure (ce2104c) in 8.2.1 over the previous 
releases?  Feel free to open a PR about it if you have a couple of minutes.

Thanks for testing the Ada compiler!

-- 
Eric Botcazou


Re: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Óscar Fuentes
Eric Botcazou  writes:

>> There are new testsuite results for x86_64-w64-mingw32, for versions
>> 6.4.1, 7.3.1, 8.2.1 and 9.0.0.
>> 
>> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02309.html
>> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02310.html
>> https://gcc.gnu.org/ml/gcc-testresults/2018-10/msg02312.html
>
> So we have an additional ACATS failure (ce2104c) in 8.2.1 over the previous 
> releases?  Feel free to open a PR about it if you have a couple of minutes.
>
> Thanks for testing the Ada compiler!

About the Ada compiler: it doesn't build on i686-w64-mingw32. It is
the reason why MSYS2 is stuck with 7.3 for 32 bits.

Rainer: testing the 32 bits builds would be very useful. On MinGW there
are enough differences among 32 bits vs 64 bits to consider them two
different beasts.

Thank you for your reports so far.



Re: [RFC][GCC][rs6000] Remaining work for inline expansion of strncmp/strcmp/memcmp for powerpc

2018-10-18 Thread Aaron Sawdey
On 10/17/18 4:03 PM, Florian Weimer wrote:
> * Aaron Sawdey:
> 
>> I've previously posted a patch to add vector/vsx inline expansion of
>> strcmp/strncmp for the power8/power9 processors. Here are some of the
>> other items I have in the pipeline that I hope to get into gcc9:
>>
>> * vector/vsx support for inline expansion of memcmp to non-loop code.
>>   This improves performance of small memcmp.
>> * vector/vsx support for inline expansion of memcmp to loop code. This
>>   will close the performance gap for lengths of about 128-512 bytes
>>   by making the loop code closer to the performance of the library
>>   memcmp.
>> * generate inline expansion to a loop for strcmp/strncmp. This closes
>>   another performance gap because strcmp/strncmp vector/vsx code
>>   currently generated is lots faster than the library call but we
>>   only generate comparison of 64 bytes to avoid exploding code size.
>>   Similar code in a loop would be compact and allow inline comparison
>>   of maybe the first 512 bytes inline before dumping to the library
>>   function.
>>
>> If anyone has any other input on the inline expansion work I've been
>> doing for the rs6000 target, please let me know.
> 
> The inline expansion of strcmp is problematic for valgrind:
> 
>   

I'm aware of this. One thing that will help is that I believe the vsx
expansion for strcmp/strncmp does not have this problem, so with
current gcc9 trunk the problem should only be seen if one of the strings is
known at compile time to be less than 16 bytes, or if -mcpu=power7, or
if vector/vsx is disabled. My position is that it is valgrind's problem
if it doesn't understand correct code, but I also want valgrind to be a
useful tool so I'm going to take a look and see if I can find a gpr
sequence that is equally fast that it can understand.

> We currently see around 0.5 KiB of instructions for each call to
> strcmp.  I find it hard to believe that this improves general system
> performance except in micro-benchmarks.

The expansion of strcmp where both arguments are strings of unknown
length at compile time will compare 64 bytes then call strcmp on the
remainder if no difference is found. If the gpr sequence is used (p7
or vec/vsx disabled) then the overhead is 91 instructions. If the
p8 vsx sequence is used, the overhead is 59 instructions. If the p9
vsx sequence is used, then the overhead is 41 instructions.

Yes, this will increase the instruction footprint. However the processors
that this targets (p7, p8, p9) all have aggressive iprefetch. Doing some
of the comparison inline makes the common cases of strings being totally
different, or identical and <= 64 bytes in length very much faster, and
also avoiding the plt call means less pressure on the count cache and
better branch prediction elsewhere.

If you are aware of any real world code that is faster when built
with -fno-builtin-strcmp and/or -fno-builtin-strncmp, please let me know
so I can look at avoiding those situations.

  Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Eric Botcazou
> About the Ada compiler: it doesn't build on i686-w64-mingw32. It is
> the reason why MSYS2 is stuck with 7.3 for 32 bits.

Why doesn't it build?  Because of PR ada/81878?

-- 
Eric Botcazou


Re: [RFC][GCC][rs6000] Remaining work for inline expansion of strncmp/strcmp/memcmp for powerpc

2018-10-18 Thread Segher Boessenkool
On Thu, Oct 18, 2018 at 09:48:22AM -0500, Aaron Sawdey wrote:
> On 10/17/18 4:03 PM, Florian Weimer wrote:
> I'm aware of this. One thing that will help is that I believe the vsx
> expansion for strcmp/strncmp does not have this problem, so with
> current gcc9 trunk the problem should only be seen if one of the strings is
> known at compile time to be less than 16 bytes, or if -mcpu=power7, or
> if vector/vsx is disabled. My position is that it is valgrind's problem
> if it doesn't understand correct code, but I also want valgrind to be a
> useful tool so I'm going to take a look and see if I can find a gpr
> sequence that is equally fast that it can understand.

If we can do that without losing performance, that is nice of course :-)

> > We currently see around 0.5 KiB of instructions for each call to
> > strcmp.  I find it hard to believe that this improves general system
> > performance except in micro-benchmarks.
> 
> The expansion of strcmp where both arguments are strings of unknown
> length at compile time will compare 64 bytes then call strcmp on the
> remainder if no difference is found. If the gpr sequence is used (p7
> or vec/vsx disabled) then the overhead is 91 instructions. If the
> p8 vsx sequence is used, the overhead is 59 instructions. If the p9
> vsx sequence is used, then the overhead is 41 instructions.

That is 0.355kB, 0.230kB, resp. 0.160kB.

> Yes, this will increase the instruction footprint. However the processors
> that this targets (p7, p8, p9) all have aggressive iprefetch. Doing some
> of the comparison inline makes the common cases of strings being totally
> different, or identical and <= 64 bytes in length very much faster, and
> also avoiding the plt call means less pressure on the count cache and
> better branch prediction elsewhere.
> 
> If you are aware of any real world code that is faster when built
> with -fno-builtin-strcmp and/or -fno-builtin-strncmp, please let me know
> so I can look at avoiding those situations.

+1

Thanks Aaron!  Both for all the original work, and for looking at it
once again.


Segher


RE: [RFC][mid-end] Support vectorization of complex numbers using machine instructions.

2018-10-18 Thread Tamar Christina
Hi Richard,

Thanks for all the help so far,

> > so I'd need 5 parameters and then I'm guessing the other expressions
> would be removed by DCE at some point?
> 
> Are you planning to make the FCMLA behaviour directly available as an
> internal function or provide a higher-level one that does a full complex
> multiply, with the target lowering that into individual instructions where
> necessary?

I was planning on doing it as one internal function and leave it up to the 
target
to expand it however it needs to.

> 
> Either way, each individual FCMLA should only need three scalar inputs.
> Like with FCADD, it doesn't matter whether the operands to the individual
> scalar FCMLAs are the ones (or the only ones) that determine the associated
> FCMLA scalar result.  All the node needs to do is describe something that
> would work when vectorised.

Ah yes that makes sense. I see what you mean.

> 
> What to do with the intermediate results you don't need is an interesting
> question :-).  Like you say, I was hoping DCE would get rid of them later.
> Does that not work?

I haven't tried it yet 😊 But I assume it'll work too.
I have complex add almost working, it generates the right code for the 
vectorized
loop. The loads are also corrected and the permute is gone and I update all the 
data references
for the two statements I replaced.

However for the scalar tail loop I have a problem since I only have vector
versions of the instructions, and the scalar loop is created from the same SLP 
tree.
So I end up with the builtins in the tail loop with nothing to expand them to 
and with
no way to differentiate between the two calls to the internal fn.

I would need to somehow undo this for the scalar part..

Kind Regards,
Tamar

> 
> Thanks,
> Richard


RE: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Tamar Christina



> -Original Message-
> From: gcc-ow...@gcc.gnu.org  On Behalf Of Eric
> Botcazou
> Sent: Thursday, October 18, 2018 16:40
> To: Óscar Fuentes 
> Cc: gcc@gcc.gnu.org
> Subject: Re: testsuite result updates for x86_64-w64-mingw32
> 
> > About the Ada compiler: it doesn't build on i686-w64-mingw32. It is
> > the reason why MSYS2 is stuck with 7.3 for 32 bits.
> 
> Why doesn't it build?  Because of PR ada/81878?

I think so, but not sure, it currently ICEs. But in order to get it to compile 
at all the MINGW-Packages revert the
81878 patch, this seems work for the 64 bit build but ICEs the 32 bit one.  So 
I hadn't looked into it because I
think it may very well be an issue due to the revert.

Regards,
Tamar

> 
> --
> Eric Botcazou


Re: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Óscar Fuentes
Eric Botcazou  writes:

>> About the Ada compiler: it doesn't build on i686-w64-mingw32. It is
>> the reason why MSYS2 is stuck with 7.3 for 32 bits.
>
> Why doesn't it build?  Because of PR ada/81878?

xgcc crashes.

https://github.com/Alexpux/MINGW-packages/issues/4125#issuecomment-425741895

This is all I know, sorry.


Re: testsuite result updates for x86_64-w64-mingw32

2018-10-18 Thread Óscar Fuentes
Eric Botcazou  writes:

>> About the Ada compiler: it doesn't build on i686-w64-mingw32. It is
>> the reason why MSYS2 is stuck with 7.3 for 32 bits.
>
> Why doesn't it build?  Because of PR ada/81878?

More specifically:

xgcc.exe: internal compiler error: Aborted signal terminated program gnat1

If you are interested on analyzing this problem, ask and I'll try to
help as far as my time permits.


gcc-7-20181018 is now available

2018-10-18 Thread gccadmin
Snapshot gcc-7-20181018 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20181018/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 265292

You'll find:

 gcc-7-20181018.tar.xzComplete GCC

  SHA256=7994464b512daf653b622182b47333d3d85c3788e81cef6de8c95b43c0c6159d
  SHA1=884ea161d7e50931c8e07dc3064d4e50ce96d3b9

Diffs from 7-20181011 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.