Re: Dealing with paradoxical subregs of memory?

2017-01-26 Thread Dominik Vogt
On Wed, Jan 25, 2017 at 04:45:23PM -0600, Segher Boessenkool wrote:
> On Wed, Jan 25, 2017 at 06:36:04PM +0100, Dominik Vogt wrote:
> > On the other hand, Combine
> > does not know that they are "outlawed" and happily generates
> > them.
> 
> combine should not generate things that can never match.  Of course it
> sometimes does.  This should be improved; please open a PR.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79238

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Robin Dapp
Hi,

while analyzing a test case with a lot of nested loops (>7) and double
floating point operations I noticed a performance regression of GCC 6/7
vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5
couldn't.
 Basically, each loop iterates over three dimensions, we fully unroll
some of the inner loops until we have straight-line code of roughly 2000
insns that are being executed three times in GCC 5. GCC 6 vectorizes two
iterations and adds a scalar epilogue for the third iteration. The
epilogue code is so bad that it slows down the execution by at least
50%, using only two hard registers and lots of spill slots.
Although my analysis is not completed, I believe this is because
register pressure is high in the epilogue and the live ranges span the
vectorized code as well as the epilogue.

Even reduced, the test case is huge, therefore I didn't include it. Some
high-level questions instead:

- Has anybody else observed similar problems and got around them?

- Is there some way around the register pressure/long live ranges?
Perhaps something we could/should fix in the s390 backend? (Probably
hard to tell without source)

- Would it make sense to allow a backend to specify the minimal number
of loop iterations considered for vectorization? Is this
perhaps already possible somehow? I added a check to disable
vectorization for loops with <= 3 iterations that shows no regressions
and improves two SPEC benchmarks noticeably. I'm even considering <=5,
since a vectorization factor of 4 should exhibit the same problematic
pattern.

Regards
 Robin



Re: Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Bin.Cheng
On Thu, Jan 26, 2017 at 10:18 AM, Robin Dapp  wrote:
> Hi,
>
> while analyzing a test case with a lot of nested loops (>7) and double
> floating point operations I noticed a performance regression of GCC 6/7
> vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5
> couldn't.
>  Basically, each loop iterates over three dimensions, we fully unroll
> some of the inner loops until we have straight-line code of roughly 2000
> insns that are being executed three times in GCC 5. GCC 6 vectorizes two
> iterations and adds a scalar epilogue for the third iteration. The
> epilogue code is so bad that it slows down the execution by at least
> 50%, using only two hard registers and lots of spill slots.
> Although my analysis is not completed, I believe this is because
> register pressure is high in the epilogue and the live ranges span the
> vectorized code as well as the epilogue.
>
> Even reduced, the test case is huge, therefore I didn't include it. Some
> high-level questions instead:
>
> - Has anybody else observed similar problems and got around them?
Yes, I think so.  Also we have case that GCC vectorizes with larger
vect_factor, which causes regression too.

>
> - Is there some way around the register pressure/long live ranges?
I am doing some experiments calculating coarse-grained register
pressure for GIMPLE loop, but the motivation is not from vectorizer,
but predcom/pre, like PR77498.

> Perhaps something we could/should fix in the s390 backend? (Probably
> hard to tell without source)
>
> - Would it make sense to allow a backend to specify the minimal number
> of loop iterations considered for vectorization? Is this
> perhaps already possible somehow? I added a check to disable
> vectorization for loops with <= 3 iterations that shows no regressions
> and improves two SPEC benchmarks noticeably. I'm even considering <=5,
> since a vectorization factor of 4 should exhibit the same problematic
> pattern.
Is the niter number known at compilation time?  if yes, I am surprised
GCC's behavior here on such small iteration loops.  Cost-model?

Thanks,
bin
>
> Regards
>  Robin
>


Re: What is the status of macOS PowerPC support?

2017-01-26 Thread Jonathan Wakely
On 25 January 2017 at 22:30, Segher Boessenkool wrote:
> On Wed, Jan 25, 2017 at 04:36:13PM +0100, FX wrote:
>> I am trying to determine what is the status of the powerpc-apple-darwin 
>> target for GCC. The last released version of GCC for which a successful 
>> build is reported is 4.9.1 
>> (https://gcc.gnu.org/ml/gcc-testresults/2014-07/msg02093.html), and the last 
>> gcc-testresults post I could find was in April 2015 
>> (https://gcc.gnu.org/ml/gcc-testresults/2015-04/msg01438.html), for the GCC 
>> 5 branch.
>>
>> Do GCC 5, GCC 6 and current trunk support powerpc-apple-darwin? The target 
>> code is still there, apparently, and the compiler is not on the “obsolete” 
>> list.
>
> It is actively being worked on (the latest commit is just over a month
> old it seems).  It mostly works, too.  It is in better shape than many
> other targets, I would say.

Less than a month even:
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00553.html


Re: Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Richard Biener
On Thu, Jan 26, 2017 at 11:36 AM, Bin.Cheng  wrote:
> On Thu, Jan 26, 2017 at 10:18 AM, Robin Dapp  wrote:
>> Hi,
>>
>> while analyzing a test case with a lot of nested loops (>7) and double
>> floating point operations I noticed a performance regression of GCC 6/7
>> vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5
>> couldn't.
>>  Basically, each loop iterates over three dimensions, we fully unroll
>> some of the inner loops until we have straight-line code of roughly 2000
>> insns that are being executed three times in GCC 5. GCC 6 vectorizes two
>> iterations and adds a scalar epilogue for the third iteration. The
>> epilogue code is so bad that it slows down the execution by at least
>> 50%, using only two hard registers and lots of spill slots.
>> Although my analysis is not completed, I believe this is because
>> register pressure is high in the epilogue and the live ranges span the
>> vectorized code as well as the epilogue.
>>
>> Even reduced, the test case is huge, therefore I didn't include it. Some
>> high-level questions instead:
>>
>> - Has anybody else observed similar problems and got around them?
> Yes, I think so.  Also we have case that GCC vectorizes with larger
> vect_factor, which causes regression too.
>
>>
>> - Is there some way around the register pressure/long live ranges?
> I am doing some experiments calculating coarse-grained register
> pressure for GIMPLE loop, but the motivation is not from vectorizer,
> but predcom/pre, like PR77498.
>
>> Perhaps something we could/should fix in the s390 backend? (Probably
>> hard to tell without source)
>>
>> - Would it make sense to allow a backend to specify the minimal number
>> of loop iterations considered for vectorization? Is this
>> perhaps already possible somehow? I added a check to disable
>> vectorization for loops with <= 3 iterations that shows no regressions
>> and improves two SPEC benchmarks noticeably. I'm even considering <=5,
>> since a vectorization factor of 4 should exhibit the same problematic
>> pattern.
> Is the niter number known at compilation time?  if yes, I am surprised
> GCC's behavior here on such small iteration loops.  Cost-model?

Yes, looking at the cost model decision makes sense here.  Note there is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69873 you might run into
if the costmodel looks sensible.

Richard.

> Thanks,
> bin
>>
>> Regards
>>  Robin
>>


RE: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Matthew Fortune
Matthew Fortune  writes:
...
> Pseudo 300 is assigned to memory and then LRA produces a simple DImode
> load from the assigned stack slot. The only instruction to set pseudo
> 300 is:
> 
> (insn 247 212 389 3 (set (reg:SI 300)
> (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
> (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>  (nil))
> 
> Which leads to an SImode store to the stack slot:
> 
> (insn 247 392 393 3 (set (reg:SI 4 $4 [300])
> (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
> (const_int 0 [0]))) "/home/mfortune/gcc/gcc/predict.c":2904
> 504 {*sne_zero_sisi}
>  (nil))
> (insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
> (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
> (reg:SI 4 $4 [300])) "/home/mfortune/gcc/gcc/predict.c":2904 312
> {*movsi_internal}
>  (nil))
> ...
> 
> (note 248 246 249 40 NOTE_INSN_DELETED)
> (note 249 248 256 40 NOTE_INSN_DELETED)
> (note 256 249 250 40 NOTE_INSN_DELETED)
> (insn 250 256 251 40 (set (reg:DI 6 $6)
> (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
> (const_int 16 [0x10])) [403 %sfp+16 S8 A64]))
> "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
>  (nil))
> 
> My assumption is that LRA is again expected to deal with this case and
> for insn
> 250 should be recognising that it must load 32-bits and rely on implicit
> LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In this
> case it does not matter whether it is sign or zero extension and my
> assumption is that this construct would never appear if a specific sign
> or zero extension was required.
> 
> I haven't got to looking at where the issue is this time but it seems
> different as this is a subreg in a simple move instruction where we
> already support the load/ store directly so no new reload instruction is
> required. I don't know if this implies that simple move patterns should
> reject subregs but that doesn't sound right either.
> 
> Resolving this fixes at least one bug and potentially all bugs in the
> MIPS bootstrap as  manually modified the generated assembly code to use
> LW instead of LD for insn
> 250 and one of the buggy stage 3 objects is fixed.
> 
> I'll keep thinking, any advice in the meantime is appreciated.

All I have been able to determine on this is that there is potentially
different behaviour for paradoxical subregs in LRA vs reload.  There is
this comment in reload.c:push_reload:

If we have (SUBREG:M1 (MEM:M2 ...) ...) (or an inner REG that is still
 a pseudo and hence will become a MEM) with M1 wider than M2 and the
 register is a pseudo, also reload the inside expression.

To me this makes perfect sense as I believe the RTL is only saying that
there is an M2-mode object to access or at least only the M2-mode sized
bits are valid. There are comments to say there will always be sufficient
memory assigned for spill slots as they are sized to fit the largest
paradoxical subreg, I just don't know why that is useful/important.

However in lra-constraints.c:simplify_operand_subreg it quite happily
performs a reload using the outer mode in this case and only drops down to
the inner mode if the outer mode reload would be slower than the inner.

Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as the
junk upper bits in registers will be ignored; On WORD_REGISTER_OPERATIONS
targets then the narrower-than-word mode load will take care of any
'magic' needed to set the upper bits to a safe value in register.

So my thinking is that at least WORD_REGISTER_OPERATIONS targets should
always reload the inner mode for the case mentioned above much like the same
is required for normal subregs. Does that seem reasonable? Have I
misunderstood the paradoxical subreg case entirely?

I've only done superficial testing of a change to this code so far but my
testcase starts working at least which is a start.

Thanks,
Matthew






Re: [patch, libgfortran RFC] Installation script for OpenCoarrays to enable multi-image gfortran

2017-01-26 Thread FX
Hi Jerry,

A few questions:

  - why mpich? doesn’t opencoarrays support any MPI implementation?
  - I am a bit surprised by the complexity of the script… couldn’t we provide a 
Makefile for opencoarrays, to be compatible with our other build requirements?
  - do we want to work towards seamless implementation of coarrays into 
gfortran, or coexistence as a separate package (as is currently the case, for 
example in Mac Homebrew, where it ships as a separate — but compatible — 
package)?

FX

Re: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Eric Botcazou
> However in lra-constraints.c:simplify_operand_subreg it quite happily
> performs a reload using the outer mode in this case and only drops down to
> the inner mode if the outer mode reload would be slower than the inner.
> 
> Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as the
> junk upper bits in registers will be ignored; On WORD_REGISTER_OPERATIONS
> targets then the narrower-than-word mode load will take care of any
> 'magic' needed to set the upper bits to a safe value in register.

Yes, I was leaning to the same conclusion before reading your second message.

> So my thinking is that at least WORD_REGISTER_OPERATIONS targets should
> always reload the inner mode for the case mentioned above much like the same
> is required for normal subregs. Does that seem reasonable? Have I
> misunderstood the paradoxical subreg case entirely?

No, this is correct, see find_reloads:

  /* We must force a reload of paradoxical SUBREGs
 of a MEM because the alignment of the inner value
 may not be enough to do the outer reference.  On
 big-endian machines, it may also reference outside
 the object.

 On machines that extend byte operations and we have a
 SUBREG where both the inner and outer modes are no wider
 than a word and the inner mode is narrower, is integral,
 and gets extended when loaded from memory, combine.c has
 made assumptions about the behavior of the machine in such
 register access.  If the data is, in fact, in memory we
 must always load using the size assumed to be in the
 register and let the insn do the different-sized
 accesses.

 This is doubly true if WORD_REGISTER_OPERATIONS.  In
 this case eliminate_regs has left non-paradoxical
 subregs for push_reload to see.  Make sure it does
 by forcing the reload.

-- 
Eric Botcazou


Re: [patch, libgfortran RFC] Installation script for OpenCoarrays to enable multi-image gfortran

2017-01-26 Thread Jerry DeLisle
On 01/26/2017 05:25 AM, FX wrote:
> Hi Jerry,
> 
> A few questions:
> 
>   - why mpich? doesn’t opencoarrays support any MPI implementation?

We picked it as one that I had available and only as a starting point, we plan
to add support for other libraries as we go. (OpenCoarrays itself does support
other libraries)

>   - I am a bit surprised by the complexity of the script… couldn’t we provide 
> a Makefile for opencoarrays, to be compatible with our other build 
> requirements?

I agree, this script is using Boiler Plate which allows a lot of flexibility and
provides some diagnostics and handling of script errors. It builds on things the
OpenCoarrays team is familiar with and was the quickest way to go initially. My
first draft script was about 50 lines with comments, but it had no error checks.
I will be able to reduce what you see when I narrow down to one tracked release
package.

>   - do we want to work towards seamless implementation of coarrays into 
> gfortran, or coexistence as a separate package (as is currently the case, for 
> example in Mac Homebrew, where it ships as a separate — but compatible — 
> package)?

I think we do want to head toward seamless. I have explored even copying the
source directly into the caf directory of libgfortran and merging the .h files,
but this takes some time to do and would leave two sets of sources to maintain.
Regarding things like Homebrew, or rpm packages, it will require us to learn how
to do these packages which none of us know right now.

Ultimately, since multi images is part of the Fortran language, it should just
happen transparently with the gcc regular build process.

Jerry


Re: [RFC] Further LRA subreg handling issues

2017-01-26 Thread David Malcolm
On Thu, 2017-01-26 at 13:00 +, Matthew Fortune wrote:
> Matthew Fortune  writes:
> ...
> > Pseudo 300 is assigned to memory and then LRA produces a simple
> > DImode
> > load from the assigned stack slot. The only instruction to set
> > pseudo
> > 300 is:
> > 
> > (insn 247 212 389 3 (set (reg:SI 300)
> > (ne:SI (subreg/s/u:SI (reg/v:DI 231 [ taken ]) 0)
> > (const_int 0 [0])))
> > "/home/mfortune/gcc/gcc/predict.c":2904
> > 504 {*sne_zero_sisi}
> >  (nil))
> > 
> > Which leads to an SImode store to the stack slot:
> > 
> > (insn 247 392 393 3 (set (reg:SI 4 $4 [300])
> > (ne:SI (reg:SI 20 $20 [orig:231 taken ] [231])
> > (const_int 0 [0])))
> > "/home/mfortune/gcc/gcc/predict.c":2904
> > 504 {*sne_zero_sisi}
> >  (nil))
> > (insn 393 247 389 3 (set (mem/c:SI (plus:DI (reg/f:DI 29 $sp)
> > (const_int 16 [0x10])) [403 %sfp+16 S4 A64])
> > (reg:SI 4 $4 [300]))
> > "/home/mfortune/gcc/gcc/predict.c":2904 312
> > {*movsi_internal}
> >  (nil))
> > ...
> > 
> > (note 248 246 249 40 NOTE_INSN_DELETED)
> > (note 249 248 256 40 NOTE_INSN_DELETED)
> > (note 256 249 250 40 NOTE_INSN_DELETED)
> > (insn 250 256 251 40 (set (reg:DI 6 $6)
> > (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
> > (const_int 16 [0x10])) [403 %sfp+16 S8 A64]))
> > "/home/mfortune/gcc/gcc/predict.c":2904 310 {*movdi_64bit}
> >  (nil))
> > 
> > My assumption is that LRA is again expected to deal with this case
> > and
> > for insn
> > 250 should be recognising that it must load 32-bits and rely on
> > implicit
> > LOAD_EXTEND_OP behaviour producing an acceptable 64-bit value. In
> > this
> > case it does not matter whether it is sign or zero extension and my
> > assumption is that this construct would never appear if a specific
> > sign
> > or zero extension was required.
> > 
> > I haven't got to looking at where the issue is this time but it
> > seems
> > different as this is a subreg in a simple move instruction where we
> > already support the load/ store directly so no new reload
> > instruction is
> > required. I don't know if this implies that simple move patterns
> > should
> > reject subregs but that doesn't sound right either.
> > 
> > Resolving this fixes at least one bug and potentially all bugs in
> > the
> > MIPS bootstrap as  manually modified the generated assembly code to
> > use
> > LW instead of LD for insn
> > 250 and one of the buggy stage 3 objects is fixed.
> > 
> > I'll keep thinking, any advice in the meantime is appreciated.
> 
> All I have been able to determine on this is that there is
> potentially
> different behaviour for paradoxical subregs in LRA vs reload.  There
> is
> this comment in reload.c:push_reload:
> 
> If we have (SUBREG:M1 (MEM:M2 ...) ...) (or an inner REG that is
> still
>  a pseudo and hence will become a MEM) with M1 wider than M2 and
> the
>  register is a pseudo, also reload the inside expression.
> 
> To me this makes perfect sense as I believe the RTL is only saying
> that
> there is an M2-mode object to access or at least only the M2-mode
> sized
> bits are valid. There are comments to say there will always be
> sufficient
> memory assigned for spill slots as they are sized to fit the largest
> paradoxical subreg, I just don't know why that is useful/important.
> 
> However in lra-constraints.c:simplify_operand_subreg it quite happily
> performs a reload using the outer mode in this case and only drops
> down to
> the inner mode if the outer mode reload would be slower than the
> inner.
> 
> Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as
> the
> junk upper bits in registers will be ignored; On
> WORD_REGISTER_OPERATIONS
> targets then the narrower-than-word mode load will take care of any
> 'magic' needed to set the upper bits to a safe value in register.
> 
> So my thinking is that at least WORD_REGISTER_OPERATIONS targets
> should
> always reload the inner mode for the case mentioned above much like
> the same
> is required for normal subregs. Does that seem reasonable? Have I
> misunderstood the paradoxical subreg case entirely?
> 
> I've only done superficial testing of a change to this code so far
> but my
> testcase starts working at least which is a start.

FWIW, the RTL "frontend" [1] is now in trunk (as of r244878), so it
should now possible to write small fragments of RTL as testcases in
DejaGnu.

I don't know if it's helpful for this bug though.

In case it is, I started some documentation for it here:
  https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02065.html


Dave

[1] I put "frontend" in quotes as it's actually an extension to the C
frontend, rather than a true frontend.


RE: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Matthew Fortune
Eric Botcazou  writes:
> > However in lra-constraints.c:simplify_operand_subreg it quite happily
> > performs a reload using the outer mode in this case and only drops
> > down to the inner mode if the outer mode reload would be slower than
> the inner.
> >
> > Presumably this is safe for non WORD_REGISTER_OPERATIONS targets as
> > the junk upper bits in registers will be ignored; On
> > WORD_REGISTER_OPERATIONS targets then the narrower-than-word mode load
> > will take care of any 'magic' needed to set the upper bits to a safe
> value in register.
> 
> Yes, I was leaning to the same conclusion before reading your second
> message.
> 
> > So my thinking is that at least WORD_REGISTER_OPERATIONS targets
> > should always reload the inner mode for the case mentioned above much
> > like the same is required for normal subregs. Does that seem
> > reasonable? Have I misunderstood the paradoxical subreg case entirely?
> 
> No, this is correct, see find_reloads:
> 
> /* We must force a reload of paradoxical SUBREGs
>of a MEM because the alignment of the inner value
>may not be enough to do the outer reference.  On
>big-endian machines, it may also reference outside
>the object.
> 
>On machines that extend byte operations and we have a
>SUBREG where both the inner and outer modes are no wider
>than a word and the inner mode is narrower, is integral,
>and gets extended when loaded from memory, combine.c has
>made assumptions about the behavior of the machine in such
>register access.  If the data is, in fact, in memory we
>must always load using the size assumed to be in the
>register and let the insn do the different-sized
>accesses.

This part suggests to me that LRA should never be reloading the
paradoxical subreg meaning the whole SLOW_UNALIGNED_ACCESS checking code in
simplify_operand_subreg could be removed unconditionally.  But I get the
feeling the big valid_address_p check (below) will still prevent some
paradoxical subregs from being reloaded via their inner mode.  I haven't
quite understood exactly what the check is trying to achieve yet though:

  if (!addr_was_valid
  || valid_address_p (GET_MODE (subst), XEXP (subst, 0),
  MEM_ADDR_SPACE (subst))
  || ((get_constraint_type (lookup_constraint
(curr_static_id->operand[nop].constraint))
   != CT_SPECIAL_MEMORY)
  /* We still can reload address and if the address is
 valid, we can remove subreg without reloading its
 inner memory.  */
  && valid_address_p (GET_MODE (subst),
  regno_reg_rtx
  [ira_class_hard_regs
   [base_reg_class (GET_MODE (subst),
MEM_ADDR_SPACE (subst),
ADDRESS, SCRATCH)][0]],
  MEM_ADDR_SPACE (subst
{

>This is doubly true if WORD_REGISTER_OPERATIONS.  In
>this case eliminate_regs has left non-paradoxical
>subregs for push_reload to see.  Make sure it does
>by forcing the reload.

This statement covers the fix I already proposed but perhaps
simplify_operand_subreg can also hit this issue if a 'normal' subreg appears
in an instruction where registers and memory are supported (like move
instructions). In this case the constraints are satisfied and the fix I
proposed would never get run but simplify_operand_subreg would.

Eric: I see you recently had to modify the code I'm talking about in the post
below. Out of interest... was this another issue brought to light by the
improvements to zero extension elimination?

https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01202.html

Matthew


Successful bootstrap and install of gcc (GCC) 6.3.0 on armv7l-unknown-linux-gnueabi

2017-01-26 Thread Aaro Koskinen
Hi,

Here's a report of a successful build and install of GCC:

$ gcc-6.3.0/config.guess
armv7l-unknown-linux-gnueabi

$ newcompiler/bin/gcc -v
Using built-in specs.
COLLECT_GCC=newcompiler/bin/gcc
COLLECT_LTO_WRAPPER=/home/aaro/gcctest/newcompiler/libexec/gcc/arm-unknown-linux-gnueabi/6.3.0/lto-wrapper
Target: arm-unknown-linux-gnueabi
Configured with: ../gcc-6.3.0/configure --with-arch=armv4t --with-float=soft 
--disable-nls --prefix=/home/aaro/gcctest/newcompiler --enable-languages=c,c++ 
--host=arm-unknown-linux-gnueabi --build=arm-unknown-linux-gnueabi 
--target=arm-unknown-linux-gnueabi --with-system-zlib --with-sysroot=/
Thread model: posix
gcc version 6.3.0 (GCC) 

-- Build environment --

host: raspberrypi-2
distro:   los.git rootfs=96c66f native=96c66f
kernel:   Linux 4.9.0-rpi2-los_839021
binutils: GNU binutils 2.27
make: GNU Make 4.2.1
libc: GNU C Library (GNU libc) stable release version 2.24
zlib: 1.2.8
mpfr: 3.1.3
gmp:  6

-- Time consumed --

configure:  real0m 23.43s
user0m 21.74s
sys 0m 2.10s

bootstrap:  real11h 49m 58s
user37h 47m 28s
sys 1h 32m 35s

install:real9m 11.47s
user3m 23.98s
sys 5m 59.89s

-- Hardware details ---

MemTotal: 952432 kB

processor   : 0
model name  : ARMv7 Processor rev 5 (v7l)
BogoMIPS: 38.40
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 5

processor   : 1
model name  : ARMv7 Processor rev 5 (v7l)
BogoMIPS: 38.40
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 5

processor   : 2
model name  : ARMv7 Processor rev 5 (v7l)
BogoMIPS: 38.40
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 5

processor   : 3
model name  : ARMv7 Processor rev 5 (v7l)
BogoMIPS: 38.40
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 5

Hardware: BCM2835
Revision: 

A.


Re: [RFC] Further LRA subreg handling issues

2017-01-26 Thread Eric Botcazou
> This part suggests to me that LRA should never be reloading the
> paradoxical subreg meaning the whole SLOW_UNALIGNED_ACCESS checking code in
> simplify_operand_subreg could be removed unconditionally.

Why?  For a little-endian target which is neither strict-alignment nor 
WORD_REGISTER_OPERATIONS, typically x86, you can reload the outer reg.
Problems arise only for big-endian or strict-alignment or W_R_O, as explained 
by the find_reloads code.

IOW simplify_operand_subreg should mimic the handling of paradoxical SUBREGs 
done by find_reloads, with specific checks for WORD_REGISTER_OPERATIONS, 
BYTES_BIG_ENDIAN, etc.

> Eric: I see you recently had to modify the code I'm talking about in the
> post below. Out of interest... was this another issue brought to light by
> the improvements to zero extension elimination?

Nope, there were a couple of other, unrelated bugs in the code.

-- 
Eric Botcazou


gcc-6-20170126 is now available

2017-01-26 Thread gccadmin
Snapshot gcc-6-20170126 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/6-20170126/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 6 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-6-branch 
revision 244955

You'll find:

 gcc-6-20170126.tar.bz2   Complete GCC

  MD5=e95c5d3bc327dec872da9f0aea4d71df
  SHA1=6e8644e7ed88611fcb6131de2d57d58f1521bbc4

Diffs from 6-20170119 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-6
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.