from:"rearnsha at gcc dot gnu.org"

[Bug rtl-optimization/90311] [9 Regression] wrong code with -O and __builtin_add_overflow() and compare

2020-03-05 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90311

Richard Earnshaw  changed:

   What|Removed |Added

Summary|[9/10 Regression] wrong |[9 Regression] wrong code
   |code with -O and|with -O and
   |__builtin_add_overflow()|__builtin_add_overflow()
   |and compare |and compare

--- Comment #8 from Richard Earnshaw  ---
The 64-bit arithmetic code for Arm was completely rewritten for gcc-10, so this
testcase no-longer fails.

However, it's a 10(ish)-patch series (and maybe some follow-ups); I'm not sure
if a back port is appropriate.

[Bug rtl-optimization/90311] [9 Regression] wrong code with -O and __builtin_add_overflow() and compare

2020-03-05 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90311

--- Comment #10 from Richard Earnshaw  ---
Initial commit in the series was r10-3970 but there were certainly follow-ups
after that.

[Bug target/90311] [9 Regression] wrong code with -O and __builtin_add_overflow() and compare

2020-03-05 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90311

--- Comment #11 from Richard Earnshaw  ---
Looks like this was fixed with r10-1963.  Testing that as a backport.

[Bug target/90311] [9 Regression] wrong code with -O and __builtin_add_overflow() and compare

2020-03-05 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90311

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Earnshaw  ---
Fixed on gcc-9

[Bug target/91913] [8/9 Regression] ICE in extract_constrain_insn, at recog.c:2211

2020-03-12 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91913

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Richard Earnshaw  ---
Fixed on gcc-9 with
https://gcc.gnu.org/pipermail/gcc-cvs/2020-March/271649.html
(https://gcc.gnu.org/g:08f00a213f8a1b99bbf3ad3c337dea249a288cf1)
Fixed on gcc-8 with
https://gcc.gnu.org/pipermail/gcc-cvs/2020-March/271650.html
(https://gcc.gnu.org/g:3d46f4875c6c50e8095294b6b700d6678a7e2f1e)

[Bug middle-end/94172] [arm-none-eabi] ICE in expand_debug_locations, at cfgexpand.c:5403

2020-03-16 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94172

--- Comment #4 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #3)
> Can't reproduce on the trunk, neither on x86_64-linux with -Os -g3
> -fshort-enums, nor on arm-linux-gnueabi with -Os -g3 -fshort-enums
> -mcpu=cortex-m0 -mthumb

arm-linux doesn't use short enums, only arm-none-eabi

[Bug target/94236] -mcmodel=large does not work on aarch64

2020-03-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94236

--- Comment #8 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #6)
> Note for Threos OS, please don't reuse the same target triplet as elf or
> linux; use your own triplet.  Also adding long calls is not hard and such.

The correct solution is to implement long call support in the linker in
conformance with the ABI.

[Bug target/94220] libgcc FTB for ARM Thumb when optimizing for size

2020-03-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94220

Richard Earnshaw  changed:

   What|Removed |Added

Summary|libgcc FTB for ARM Thump|libgcc FTB for ARM Thumb
   |when optimizing for size|when optimizing for size
   Assignee|unassigned at gcc dot gnu.org  |rearnsha at gcc dot 
gnu.org

--- Comment #1 from Richard Earnshaw  ---
Mine

[Bug target/94220] libgcc FTB for ARM Thumb when optimizing for size

2020-03-26 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94220

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Richard Earnshaw  ---
Fixed

[Bug debug/94502] [aarch64] Missing LR register location in FDE

2020-04-09 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94502

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|FIXED   |INVALID

--- Comment #7 from Richard Earnshaw  ---
Compiler was conforming to specification.  Not a bug.

[Bug tree-optimization/93674] [8/9 Regression] GCC eliminates conditions it should not, when strict-enums is on

2020-04-20 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93674

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|FIXED   |---
Summary|[8/9/10 Regression] GCC |[8/9 Regression] GCC
   |eliminates conditions it|eliminates conditions it
   |should not, when|should not, when
   |strict-enums is on  |strict-enums is on
 Status|RESOLVED|REOPENED

--- Comment #17 from Richard Earnshaw  ---
Has not been backported yet.

[Bug libfortran/94694] [10 Regression][libgfortran] libgfortran does not compile on bare-metal aarch64-none-elf (newlib)

2020-04-21 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94694

--- Comment #5 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #2)
> Note in another bug it was said that libgfortran requires a C99 runtime,
> when that's not available you should disable gfortran build.  GCC (or
> libgfortran)
> is in no position to disable parts of its features and AFAICS for QOI issues
> the _frontend_ would need to reject programs making use of disabled library
> functionality, otherwise programs are going to only fail to link.
> 
> IMHO failure at runtime when the disabled functionality is actually invoked
> isn't good QOI either.
> 
> Re-implementing missing functions in libgfortran isn't trivial either.
> 
> Since Fortran isn't release critical the only P1-ish part is that fortran
> build is enabled on aarch64-elf and thus we can resolve this by adding
> aarch64-elf-*) to
> 
> # Disable Fortran for some systems.
> case "${target}" in
>   mmix-*-*)
> # See .
> unsupported_languages="$unsupported_languages fortran"
> ;;
>   bpf-*-*)
> unsupported_languages="$unsupported_languages fortran"
> ;;
> esac
> 
> technically aarch64-elf isn't a primary architecture.

Well in that case it should *test* the run time for compatibility.  It
shouldn't assume it's incompatible just because of the target triplet.

aarch64-elf is a secondary platform.  It should still build.

[Bug target/94697] aarch64: bti j at function start instead of bti c

2020-04-22 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2020-04-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Earnshaw  ---
Jumping to the label is not the same as calling the function indirectly.  But
yes, the code here is clearly incorrect.

[Bug target/94748] aarch64: many unnecessary bti j emitted

2020-04-27 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94748

--- Comment #1 from Richard Earnshaw  ---
A BTI that's not immediately after a label looks wrong.  Either it should be
removed entirely, or it should be merged with the preceding BTI.

[Bug testsuite/94763] UNRESOLVED scan assembler tests on arm-none-eabi

2020-04-27 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94763

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-04-27
 Status|UNCONFIRMED |WAITING

[Bug target/94743] IRQ handler doesn't save scratch VFP registers

2020-04-28 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94743

--- Comment #5 from Richard Earnshaw  ---
This is made more complex due to the fact that the existence of the top 16 D
registers depends on the hardware you have, so saving them might require a d32
variant of the ISA, but we can't (quickly) tell in an interrupt context whether
or not we have that.  It only matters in reality if the interrupt routine calls
a function in another translation unit where we can't see what FP registers
might be needed.

Also, it's not just the registers that need to be saved.  The floating point
status registers also need to be saved and restored.

My initial thoughts are along the lines of...
Only try to save FP registers that this function directly clobbers.
Provide libgcc routines to save/restore the FP context.

Or we could say simply:
interrupt routines should be compiled as if with -mgeneral-regs-only and if
they want to call some routine that uses FP then they must take it upon
themselves to save and restore the FP context.

[Bug target/94743] IRQ handler doesn't save scratch VFP registers

2020-04-28 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94743

--- Comment #7 from Richard Earnshaw  ---
well, __aeabi_memcpy is required not to clobber the FP state.  Sadly, GCC does
not know about it...

[Bug target/94743] IRQ handler doesn't save scratch VFP registers

2020-05-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94743

--- Comment #10 from Richard Earnshaw  ---
(In reply to Christophe Lyon from comment #9)
> > My initial thoughts are along the lines of...
> > Only try to save FP registers that this function directly clobbers.
> What's the point of saving these if a callee clobbers other registers?
> 

They need to be done early enough to ensure that any code in *this* function
does not clobber them.  Any additional registers would have to be saved by a
library call that does that.

> Shouldn't that be something like save-nothing vs save-all-FP-regs if there
> is a callee?
> 
> Do you mean save direct clobbers only when the handler is a leaf function?

Well, obviously if it's a leaf function, saving only the registers that are
clobbered is enough, and the compiler can do the analysis to ensure that.

> 
> > Provide libgcc routines to save/restore the FP context.
> Do you mean such routines should push all FP regs+status regs?

All registers that are are call clobbered.  There's no need to do the
call-saved registers as the compiler can do that on an as-needed case already.

[Bug tree-optimization/93674] [8/9 Regression] GCC eliminates conditions it should not, when strict-enums is on

2020-05-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93674

Richard Earnshaw  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #21 from Richard Earnshaw  ---
Now fixed on all live branches

[Bug target/94743] IRQ handler doesn't save scratch VFP registers

2020-05-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94743

--- Comment #12 from Richard Earnshaw  ---
(In reply to Christophe Lyon from comment #11)
> (In reply to Richard Earnshaw from comment #10)
> > (In reply to Christophe Lyon from comment #9)
> > > > My initial thoughts are along the lines of...
> > > > Only try to save FP registers that this function directly clobbers.
> > > What's the point of saving these if a callee clobbers other registers?
> > > 
> > 
> > They need to be done early enough to ensure that any code in *this* function
> > does not clobber them.  Any additional registers would have to be saved by a
> > library call that does that.
> > 
> Why do we need a library function for that? It would have to be special with
> the stack: push FP registers, but do not restore SP, so that the dual
> restore function can pop them and restore SP.

Because it's a lot of code to work out how many FP registers there are.  You
can't assume that the FPU used to compile the interrupt handler is the same as
that being used at run time.

[Bug target/94743] IRQ handler doesn't save scratch VFP registers

2020-05-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94743

--- Comment #14 from Richard Earnshaw  ---
(In reply to Christophe Lyon from comment #13)

> But, in general (non-interrupt) code, what is supposed to happen if you
> compile for a d32 VFP and run on d16 one ? (and the code uses the extra
> registers)

Well obviously that won't work.  But if you build the interrupt routine with a
d16 system and then call a function from it that requires d32 then that should
still work if running on a d32 CPU.

I think we can probably make that work, but it's probably a bit of a dance to
get it all right.  Hence the suggestion that this be done in a library
function.

[Bug bootstrap/95122] Cross-compile arm32 toolchain with hard float, but Error in gcc final

2020-05-14 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95122

--- Comment #4 from Richard Earnshaw  ---
Don't use -mhard-float or -msoft-float.  Instead, you should be using
-mfloat-abi=[hard|softfp|soft] as appropriate.  Also, rather than encoding this
into various sets of flags you should configure the compiler with
--with-float=[hard|softfp|soft] as your environment requires.  Then it should
not be necessary to pass various flags into the library builds.

There's no such option as -mhardfloat.

[Bug target/94995] gcc/config/aarch64/cortex-a57-fma-steering.c: 5 * member function could be const ?

2020-05-26 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94995

Richard Earnshaw  changed:

   What|Removed |Added

   Priority|P3  |P5
 Target||aarch64
   Severity|normal  |enhancement

[Bug target/95651] GCC compilation error on AArch64: error: expected expression: AARCH64_INIT_MEMTAG_BUILTINS_DECL

2020-06-12 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95651

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-06-12
 Ever confirmed|0   |1

--- Comment #1 from Richard Earnshaw  ---
I build gcc on ubuntu 20.04 on a pi4 regularly and do not see this.

To get further, we need far more information, but firstly, DO NOT BUILD in the
source tree, use a separate build directory.

[Bug target/95676] [armhf] g++ mis-compiles code at -O1 or above

2020-06-15 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95676

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1
   Last reconfirmed||2020-06-15

--- Comment #1 from Richard Earnshaw  ---
So what do you think is wrong with the code?  Sorry, I don't have time to try
to second guess what's going on.

How did you configure the compiler?  What options did you use when building
your code?

We need MUCH more than this if there's to be any progress.

[Bug target/96313] [AArch64] vqmovun* return types should be unsigned

2020-07-27 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96313

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-07-27
 Ever confirmed|0   |1

--- Comment #1 from Richard Earnshaw  ---
In what way, exactly, do you think the output is incorrect?  Don't forget that
your testcase returns a uint16_t and the ABI says that it is the caller's
responsibility to do any widening, so any bits beyond bit 15 in the result
register are simply unspecified.

Secondly, you haven't stated which version of gcc you were using, or how it was
configured.

[Bug target/96751] overwriting libstdc++ for a default target during building libraries for armv5te/mthumb-interwork

2020-08-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96751

Richard Earnshaw  changed:

   What|Removed |Added

   Last reconfirmed||2020-08-25
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #1 from Richard Earnshaw  ---
Do you see this without your local changes?  I don't recall seeing anything
like this when I do builds, so my suspicion is that it's due to the changes you
have made.

[Bug target/96768] -mpure-code produces switch tables for thumb-1

2020-08-28 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96768

--- Comment #3 from Richard Earnshaw  ---
Note that the switch table is in the .rodata section, so that's not a problem.

[Bug c/96882] Wrong assembly code generated with arm-none-eabi-gcc -flto -mfloat-abi=hard options

2020-09-01 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882

Richard Earnshaw  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-09-01

--- Comment #1 from Richard Earnshaw  ---
We need to see the configuration information.  What is the output of "gcc -v"
for your compiler?

[Bug c/96882] Wrong assembly code generated with arm-none-eabi-gcc -flto -mfloat-abi=hard options

2020-09-01 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882

Richard Earnshaw  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #3 from Richard Earnshaw  ---
LTO seems to be getting confused as to the ABI.  Investigating...

In the mean time, the only work-around I can think of is to remove -flto from
your build.

[Bug target/96882] Wrong assembly code generated with arm-none-eabi-gcc -flto -mfloat-abi=hard options

2020-09-01 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882

Richard Earnshaw  changed:

   What|Removed |Added

 Target||arm

--- Comment #4 from Richard Earnshaw  ---
typedef struct {
  double m_a;
  double m_b;
  double m_c;
  double m_d;
} AtLeast32BytesObject;

static AtLeast32BytesObject __attribute__((noinline,noclone)) CalledFunction()
{
  AtLeast32BytesObject result = {1.1, 2.2, 3.3, 4.4};
  return result;
}

void __attribute__((noinline)) _start() {
  volatile AtLeast32BytesObject result = CalledFunction();
  while(1) {}
}

Will miscompile without needing LTO.

[Bug target/96882] Wrong assembly code generated with arm-none-eabi-gcc -flto -mfloat-abi=hard options

2020-09-02 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882

--- Comment #6 from Richard Earnshaw  ---
Yes, the problem is related to returning values in memory and the ABI variants
we have.  If we have hardware floating-point we generally use registers to
return values; if we don't, then we have to return in memory.

However, when we have a function that is not inlinable, but is private to the
compilation unit we can optimize the ABI in some circumstances.  That's what is
happening here.  Unfortunately, it appears that function that decides whether
or not the result should be returned in memory or in registers lacks important
information as to whether or not the function is private and this in turn leads
to two parts of the compiler making different choices - with the disastrous
consequences you've discovered.

I'm not sure if this is restricted to M-profile parts or if it's more
wide-spread - I'm still investigating.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #5 from Richard Earnshaw  ---
I batted my head against this when reworking the command line options stuff a
couple of years back, but the documentation on how the different hooks should
interact (especially for LTO and streaming) is, quite frankly woeful.  How any
back-end maintainer is supposed to support this is beyond me.

[Bug target/96939] LTO vs. different arm arch options

2020-09-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #6 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #4)
> Doesn't seem to be related to me, in the other PR everything is compiled
> with one set of options and no target attribute is involved either.

No, that's a completely different problem.  The problem there is some calls to
the back-end pass a fntype and some a fndecl.  The ones with a fndecl can work
out if a function is local and pick a different ABI, but the ones with only a
type cannot.  So we get inconsistent results.

[Bug target/96882] Wrong assembly code generated with arm-none-eabi-gcc -flto -mfloat-abi=hard options

2020-09-15 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96882

--- Comment #8 from Richard Earnshaw  ---
(In reply to emilie.feral from comment #7)
> Hello,
> Any news on the subject?
> Would you advise in the meantime to discard the LTO (with the -fno-lto
> option) on the compilation unit containing the failing code?
> The bug occurred for us when returning a structure of four doubles. Do you
> have any indication of when the bug might appear to help us track other
> occurrences?
> Thanks for helping!

Sorry, I haven't had time to work on this yet.

The safest work-around for now is to add an additional attribute to force the
PCS to the default for the selected ABI - I think adding 

 pcs("aapcs-vfp")

to the attributes will solve the problem.

ie.

AtLeast32BytesObject __attribute__((noinline, pcs("aapcs-vfp")))
CalledFunction() {
  AtLeast32BytesObject result = {1.1, 2.2, 3.3, 4.4};
  return result;
}

[Bug target/89400] [7/8/9 Regression] ICE: output_operand: invalid %-code with -march=armv6kz -mthumb -munaligned-access

2019-10-17 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89400

--- Comment #8 from Richard Earnshaw  ---
Author: rearnsha
Date: Thu Oct 17 16:45:46 2019
New Revision: 277123

URL: https://gcc.gnu.org/viewcvs?rev=277123&root=gcc&view=rev
Log:
[arm]  PR target/89400 fix thumb1 unaligned access expansion

Armv6 has support for unaligned accesses to memory.  However, the
thumb1 code patterns were trying to use the 32-bit code constraints.
One failure mode from this was that the patterns are designed to be
compatible with conditional execution and this was then causing an
assert in the compiler.

The unaligned_loadhis pattern is only used for expanding extv, which
in turn is only enabled for systems supporting thumb2.  Given that
there is no simple expansion for a thumb1 sign-extending load (the
instruction has no immediate offset form and requires two registers in
the address) it seems simpler to just disable this for thumb1.

Fixed thusly:

Backport from trunk:
2019-05-03  Richard Earnshaw  

PR target/89400
* config/arm/arm.md (unaligned_loadsi): Add variant for thumb1.
Restrict 'all' variant to 32-bit configurations.
(unaligned_loadhiu): Likewise.
(unaligned_storehi): Likewise.
(unaligned_storesi): Likewise.
(unaligned_loadhis): Disable when compiling for thumb1.

Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/config/arm/arm.md

[Bug target/89400] [7/8/9 Regression] ICE: output_operand: invalid %-code with -march=armv6kz -mthumb -munaligned-access

2019-10-17 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89400

--- Comment #9 from Richard Earnshaw  ---
Author: rearnsha
Date: Thu Oct 17 16:47:42 2019
New Revision: 277124

URL: https://gcc.gnu.org/viewcvs?rev=277124&root=gcc&view=rev
Log:
[arm]  PR target/89400 fix thumb1 unaligned access expansion

Armv6 has support for unaligned accesses to memory.  However, the
thumb1 code patterns were trying to use the 32-bit code constraints.
One failure mode from this was that the patterns are designed to be
compatible with conditional execution and this was then causing an
assert in the compiler.

The unaligned_loadhis pattern is only used for expanding extv, which
in turn is only enabled for systems supporting thumb2.  Given that
there is no simple expansion for a thumb1 sign-extending load (the
instruction has no immediate offset form and requires two registers in
the address) it seems simpler to just disable this for thumb1.

Fixed thusly:

Backport from trunk:
2019-05-03  Richard Earnshaw  

PR target/89400
* config/arm/arm.md (unaligned_loadsi): Add variant for thumb1.
Restrict 'all' variant to 32-bit configurations.
(unaligned_loadhiu): Likewise.
(unaligned_storehi): Likewise.
(unaligned_storesi): Likewise.
(unaligned_loadhis): Disable when compiling for thumb1.

Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/arm/arm.md

[Bug target/89400] [7/8/9 Regression] ICE: output_operand: invalid %-code with -march=armv6kz -mthumb -munaligned-access

2019-10-17 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89400

--- Comment #10 from Richard Earnshaw  ---
Author: rearnsha
Date: Thu Oct 17 16:48:39 2019
New Revision: 277125

URL: https://gcc.gnu.org/viewcvs?rev=277125&root=gcc&view=rev
Log:
[arm]  PR target/89400 fix thumb1 unaligned access expansion

Armv6 has support for unaligned accesses to memory.  However, the
thumb1 code patterns were trying to use the 32-bit code constraints.
One failure mode from this was that the patterns are designed to be
compatible with conditional execution and this was then causing an
assert in the compiler.

The unaligned_loadhis pattern is only used for expanding extv, which
in turn is only enabled for systems supporting thumb2.  Given that
there is no simple expansion for a thumb1 sign-extending load (the
instruction has no immediate offset form and requires two registers in
the address) it seems simpler to just disable this for thumb1.

Fixed thusly:

Backport from trunk:
2019-05-03  Richard Earnshaw  

PR target/89400
* config/arm/arm.md (unaligned_loadsi): Add variant for thumb1.
Restrict 'all' variant to 32-bit configurations.
(unaligned_loadhiu): Likewise.
(unaligned_storehi): Likewise.
(unaligned_storesi): Likewise.
(unaligned_loadhis): Disable when compiling for thumb1.

Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/arm/arm.md

[Bug target/89400] [7/8/9 Regression] ICE: output_operand: invalid %-code with -march=armv6kz -mthumb -munaligned-access

2019-10-17 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89400

Richard Earnshaw  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Richard Earnshaw  ---
Fixed on all active branches

[Bug c/92172] ARM Thumb2 frame pointers inconsistent with clang

2019-10-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172

--- Comment #3 from Richard Earnshaw  ---
(In reply to Seth LaForge from comment #2)

> Using R11 in ARM and R7 on Thumb is mandated by the AAPCS I believe. I don't
> think the overhead is likely to be particularly different in Thumb vs ARM.

No it doesn't.  The AAPCS for AArch32 makes no reference to a frame pointer, so
there is no portable way defined for walking a frame other than by using dwarf
records or C++ unwinding descriptions.  The latter are preferred, but only
support unwinding from 'synchronous' unwind points (after the prologue and
before the epilogue).

Compilers are, of course, free to use frame pointers internally, within a
frame, but there is no frame chain that can be walked.

[Bug target/91927] -mstrict-align doesn't prevent unaligned accesses at -O2 and -O3 on AARCH64 targets

2019-10-24 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91927

--- Comment #7 from Richard Earnshaw  ---
(In reply to Kamlesh Kumar from comment #6)
> This Fixes it.
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 2e73f3515bb..155f4c45500 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16161,15 +16161,9 @@ aarch64_builtin_support_vector_misalignment
> (machine_mode mode,
>  const_tree type, int
> misalignment,
>  bool is_packed)
>  {
> -  if (TARGET_SIMD && STRICT_ALIGNMENT)
> +  if (STRICT_ALIGNMENT)
>  {
> -  /* Return if movmisalign pattern is not supported for this mode.  */
> -  if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
> -return false;
> -
> -  /* Misalignment factor is unknown at compile time.  */
> -  if (misalignment == -1)
> -   return false;
> +  return false;
>  }
>return default_builtin_support_vector_misalignment (mode, type,
> misalignment,
>   is_packed);

No, that bodges around it.  It's not a fix.

[Bug target/92207] [10 Regression] pr36449.C fails on arm after r277179

2019-10-24 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92207

--- Comment #5 from Richard Earnshaw  ---
Apart from the addresses used, the traces are identical right up until the
latter version crashes.

The testcase tries to allocate 128Mb+4bytes of memory, so my suspicion is that
this is a test that needs more memory in the simulation.  Why this patch
triggers this failure I'm not sure.

[Bug target/92207] [10 Regression] pr36449.C fails on arm after r277179

2019-10-24 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92207

--- Comment #8 from Richard Earnshaw  ---
I'm 95%+ sure that this is a problem with qemu+newlib with the latter not
handling out-of-memory correctly.  If I run the good program to the same
instruction that faults in the bad program, I see:

Breakpoint 2, _malloc_r (reent_ptr=0x49570 , bytes=134217732)
at
/tmp/7992549.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdlib/mallocr.c:2353
2353in
/tmp/7992549.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdlib/mallocr.c
(gdb) 
Continuing.

Breakpoint 1, 0x00018bc0 in _malloc_r (reent_ptr=0x49570 , 
bytes=)
at
/tmp/7992549.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdlib/mallocr.c:2592
2592in
/tmp/7992549.tmpdir/aci-gcc-fsf/sources/newlib/newlib/libc/stdlib/mallocr.c
(gdb) x/i $pc
=> 0x18bc0 <_malloc_r+1092>:str r3, [r2, #4]
(gdb) info reg r2
r2 0x804aa40   134523456
(gdb) x/x $r2
0x804aa40:  0x
(gdb) x/x 0x805b000
0x805b000:  Cannot access memory at address 0x805b000

So the location to be written exists and is the last mapped page of memory.

In the 'bad' program, I see

(gdb) x/i $pc
=> 0x18bc0 <_malloc_r+1092>:str r3, [r2, #4]
(gdb) p $r2
$14 = 134522216
(gdb) p *$4
$15 = {prev_size = 0, size = 0, fd = 0x804a568, bk = 0x494c0 <__malloc_av_>}
(gdb) p/x $r2
$16 = 0x804a568
(gdb) x/i $r2
   0x804a568:   Cannot access memory at address 0x804a568
(gdb) info reg pc
pc 0x18bc0 0x18bc0 <_malloc_r+1092>
(gdb) x/x 0x8049ffc
0x8049ffc:  0x

that is the location being written is not mapped (but the page before is), this
is despite the address being lower than in the 'good' version.

Note that the difference between the two addresses being written in the two
programs is exactly the same as the difference between the VMAs of the .bss
segment in the executable files.

Consequently, I don't think this is a bug that has been introduced by this
patch, but is most likely a latent issue in either qemu or newlib (or perhaps a
combination of the two).

[Bug target/92207] [10 Regression] pr36449.C fails on arm after r277179

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92207

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Richard Earnshaw  ---
This is a bug in newlib-3.1.0 that should have been fixed in trunk with
https://sourceware.org/ml/newlib/2019/msg00413.html

So closing as invalid.

[Bug target/92207] [10 Regression] pr36449.C fails on arm after r277179

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92207

--- Comment #10 from Richard Earnshaw  ---
A bit more trace from the gdb session as evidence.

(gdb) p HeapLimit 
'HeapLimit' has unknown type; cast it to its declared type
(gdb) p &HeapLimit
$1 = ( *) 0x48f78
(gdb) x/x $1
0x48f78:0x0804a000
(gdb) p __heap_limit
No symbol "__heap_limit" in current context.

[Bug target/92207] [10 Regression] pr36449.C fails on arm after r277179

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92207

--- Comment #11 from Richard Earnshaw  ---
BTW, it looks like the libgloss implementation of the syscall API and startup
code has had this change since 2015.

[Bug target/88656] [7/8/9 Regression] lr clobbered by thumb prologue before __builtin_return_address(0) reads from it

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88656

--- Comment #7 from Richard Earnshaw  ---
This was fixed on trunk at some point, but not yet been backported.

[Bug target/88656] [7/8/9 Regression] lr clobbered by thumb prologue before __builtin_return_address(0) reads from it

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88656
Bug 88656 depends on bug 88167, which changed state.

Bug 88167 Summary: [7/8/9 regression] [ARM] Function __builtin_return_address 
returns invalid address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

[Bug target/88167] [7/8/9 regression] [ARM] Function __builtin_return_address returns invalid address

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

Richard Earnshaw  changed:

   What|Removed |Added

   Priority|P3  |P2
 Status|RESOLVED|REOPENED
 Blocks||88656
 Resolution|FIXED   |---
Summary|[ARM] Function  |[7/8/9 regression] [ARM]
   |__builtin_return_address|Function
   |returns invalid address |__builtin_return_address
   ||returns invalid address

--- Comment #3 from Richard Earnshaw  ---
Re-opening because needed for backporting to fix bug 88656


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88656
[Bug 88656] [7/8/9 Regression] lr clobbered by thumb prologue before
__builtin_return_address(0) reads from it

[Bug target/88167] [7/8/9 regression] [ARM] Function __builtin_return_address returns invalid address

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

--- Comment #4 from Richard Earnshaw  ---
Author: rearnsha
Date: Fri Oct 25 14:34:44 2019
New Revision: 277452

URL: https://gcc.gnu.org/viewcvs?rev=277452&root=gcc&view=rev
Log:
[arm][PR88167] Fix __builtin_return_address returns invalid address

This patch fixes a problem with the thumb1 prologue code where the link
register could be unconditionally used as a scratch register even if the
return value was still live at the end of the prologue.

Additionally, the patch improves the code generated when we are not
using many low call-saved registers to make use of any unused call
clobbered registers to help with the saving of high registers that
cannot be pushed directly (quite rare in normal code as the register
allocator correctly prefers low registers).

2019-05-08  Mihail Ionescu  
Richard Earnshaw  

gcc:

PR target/88167
* config/arm/arm.c (thumb1_prologue_unused_call_clobbered_lo_regs): New
function.
(thumb1_epilogue_unused_call_clobbered_lo_regs): New function.
(thumb1_compute_save_core_reg_mask): Don't force a spare work
register if both the epilogue and prologue can use call-clobbered
regs.
(thumb1_unexpanded_epilogue): Use
thumb1_epilogue_unused_call_clobbered_lo_regs.  Reverse the logic for
picking temporaries for restoring high regs to match that of the
prologue where possible.
(thumb1_expand_prologue): Add any usable call-clobbered low registers
to
the list of work registers.  Detect if the return address is still live
at the end of the prologue and avoid using it for a work register if
so.
If the return address is not live, add LR to the list of pushable regs
after the first pass.

gcc/testsuite:

PR target/88167
* gcc.target/arm/pr88167-1.c: New test.
* gcc.target/arm/pr88167-2.c: New test.


Added:
branches/gcc-9-branch/gcc/testsuite/gcc.target/arm/pr88167-1.c
branches/gcc-9-branch/gcc/testsuite/gcc.target/arm/pr88167-2.c
Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/config/arm/arm.c
branches/gcc-9-branch/gcc/testsuite/ChangeLog

[Bug target/88167] [7/8/9 regression] [ARM] Function __builtin_return_address returns invalid address

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

--- Comment #5 from Richard Earnshaw  ---
Author: rearnsha
Date: Fri Oct 25 14:37:14 2019
New Revision: 277453

URL: https://gcc.gnu.org/viewcvs?rev=277453&root=gcc&view=rev
Log:
[arm][PR88167] Fix __builtin_return_address returns invalid address

This patch fixes a problem with the thumb1 prologue code where the link
register could be unconditionally used as a scratch register even if the
return value was still live at the end of the prologue.

Additionally, the patch improves the code generated when we are not
using many low call-saved registers to make use of any unused call
clobbered registers to help with the saving of high registers that
cannot be pushed directly (quite rare in normal code as the register
allocator correctly prefers low registers).

2019-05-08  Mihail Ionescu  
Richard Earnshaw  

gcc:

PR target/88167
* config/arm/arm.c (thumb1_prologue_unused_call_clobbered_lo_regs): New
function.
(thumb1_epilogue_unused_call_clobbered_lo_regs): New function.
(thumb1_compute_save_core_reg_mask): Don't force a spare work
register if both the epilogue and prologue can use call-clobbered
regs.
(thumb1_unexpanded_epilogue): Use
thumb1_epilogue_unused_call_clobbered_lo_regs.  Reverse the logic for
picking temporaries for restoring high regs to match that of the
prologue where possible.
(thumb1_expand_prologue): Add any usable call-clobbered low registers
to
the list of work registers.  Detect if the return address is still live
at the end of the prologue and avoid using it for a work register if
so.
If the return address is not live, add LR to the list of pushable regs
after the first pass.

gcc/testsuite:

PR target/88167
* gcc.target/arm/pr88167-1.c: New test.
* gcc.target/arm/pr88167-2.c: New test.


Added:
branches/gcc-8-branch/gcc/testsuite/gcc.target/arm/pr88167-1.c
branches/gcc-8-branch/gcc/testsuite/gcc.target/arm/pr88167-2.c
Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/arm/arm.c
branches/gcc-8-branch/gcc/testsuite/ChangeLog

[Bug target/88167] [7/8/9 regression] [ARM] Function __builtin_return_address returns invalid address

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

--- Comment #6 from Richard Earnshaw  ---
Author: rearnsha
Date: Fri Oct 25 14:39:06 2019
New Revision: 277454

URL: https://gcc.gnu.org/viewcvs?rev=277454&root=gcc&view=rev
Log:
[arm][PR88167] Fix __builtin_return_address returns invalid address

This patch fixes a problem with the thumb1 prologue code where the link
register could be unconditionally used as a scratch register even if the
return value was still live at the end of the prologue.

Additionally, the patch improves the code generated when we are not
using many low call-saved registers to make use of any unused call
clobbered registers to help with the saving of high registers that
cannot be pushed directly (quite rare in normal code as the register
allocator correctly prefers low registers).

2019-05-08  Mihail Ionescu  
Richard Earnshaw  

gcc:

PR target/88167
* config/arm/arm.c (thumb1_prologue_unused_call_clobbered_lo_regs): New
function.
(thumb1_epilogue_unused_call_clobbered_lo_regs): New function.
(thumb1_compute_save_core_reg_mask): Don't force a spare work
register if both the epilogue and prologue can use call-clobbered
regs.
(thumb1_unexpanded_epilogue): Use
thumb1_epilogue_unused_call_clobbered_lo_regs.  Reverse the logic for
picking temporaries for restoring high regs to match that of the
prologue where possible.
(thumb1_expand_prologue): Add any usable call-clobbered low registers
to
the list of work registers.  Detect if the return address is still live
at the end of the prologue and avoid using it for a work register if
so.
If the return address is not live, add LR to the list of pushable regs
after the first pass.

gcc/testsuite:

PR target/88167
* gcc.target/arm/pr88167-1.c: New test.
* gcc.target/arm/pr88167-2.c: New test.

Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/pr88167-1.c
branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/pr88167-2.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/arm/arm.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog

[Bug target/88656] [7/8/9 Regression] lr clobbered by thumb prologue before __builtin_return_address(0) reads from it

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88656
Bug 88656 depends on bug 88167, which changed state.

Bug 88167 Summary: [7/8/9 regression] [ARM] Function __builtin_return_address 
returns invalid address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug target/88167] [7/8/9 regression] [ARM] Function __builtin_return_address returns invalid address

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88167

Richard Earnshaw  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|10.0|7.5

--- Comment #7 from Richard Earnshaw  ---
Fixed on all active branches

[Bug target/88656] [7/8/9 Regression] lr clobbered by thumb prologue before __builtin_return_address(0) reads from it

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88656

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Earnshaw  ---
Fixed on all active branches

[Bug rtl-optimization/87871] [9/10 Regression] testcases fail after r265398 on arm

2019-10-25 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87871

--- Comment #63 from Richard Earnshaw  ---
We need to reach closure on this, but there's nothing really concrete to make
such a decision.  Which of the tests originally reported are still failing?

[Bug target/77882] [Aarch64] Add 'naked' function attribute

2019-10-28 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77882

--- Comment #5 from Richard Earnshaw  ---
(In reply to Elad Lahav from comment #4)
> Created attachment 47119 [details]
> Proposed implementation of naked functions for aarch64
> 
> The change is quite simple (see the proposed patch). I hope it can be made,
> as I find naked functions quite useful, especially by allowing the use of
> certain C features in otherwise pure assembly code (e.g., offsetof,
> _Static_assert). Aesthetically, naked functions provide proper prototypes
> that are easier to follow, document and test.

Patches need to be sent to gcc-patc...@gcc.gnu.org.  Note, if you've not
contributed to gcc before you'll also need to sort out a copyright assignment
for the change (this is a non-trivial change).

[Bug rtl-optimization/92281] New: Inconsistent canonicalization of (minus (minus A B) C)

2019-10-30 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92281

Bug ID: 92281
   Summary: Inconsistent canonicalization of (minus (minus A B) C)
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rearnsha at gcc dot gnu.org
CC: segher at kernel dot crashing.org
  Target Milestone: ---

Here are two combine attempts from a simple testcase:

arm-none-eabi-gcc -O2 -marm -mcpu=arm7tdmi

typedef unsigned long long t64;

t64 f1(t64 a, t64 b) { return a + ~b; }

Trying 19 -> 8:
   19: r119:SI=r127:SI
  REG_DEAD r127:SI
8: r125:SI=r119:SI-ltu(cc:CC,0)-r121:SI
  REG_DEAD r121:SI
  REG_DEAD r119:SI
  REG_DEAD cc:CC
Failed to match this instruction:
(set (reg:SI 125 [+4 ])
(minus:SI (minus:SI (reg:SI 127)
(reg:SI 121 [ b+4 ]))
(ltu:SI (reg:CC 100 cc)
(const_int 0 [0]

Trying 21 -> 8:
   21: r121:SI=r129:SI
  REG_DEAD r129:SI
8: r125:SI=r119:SI-ltu(cc:CC,0)-r121:SI
  REG_DEAD r121:SI
  REG_DEAD r119:SI
  REG_DEAD cc:CC
Successfully matched this instruction:
(set (reg:SI 125 [+4 ])
(minus:SI (minus:SI (reg:SI 119 [ a+4 ])
(ltu:SI (reg:CC 100 cc)
(const_int 0 [0])))
(reg:SI 129)))

These are mathematically equivalent, but because we do not produce consistent
RTL for them we need two patterns if we are to match both alternatives.

I think both should be canonicalized with the LTU inside the inner MINUS
expression, but I wouldn't mind if the other were chosen, as long as we were
consistent.

[Bug tree-optimization/92282] New: gimple for (a + ~b) is harder to optimize in RTL when types are unsigned

2019-10-30 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92282

Bug ID: 92282
   Summary: gimple for (a + ~b) is harder to optimize in RTL when
types are unsigned
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---

Given:

t f1(t a, t b) { return a + ~b; }

if t is of type int64_t, then the gimple produced is


  _1 = ~b_2(D);
  _4 = _1 + a_3(D);

Which on Arm can then easily optimize into a 3 instruction sequence

MVN  R2, R2
ADDS R0, R0, R2
SBC  R1, R1, R3

(because on Arm, SBC = Rn - Rm - ~C == Rn + ~Rm + C)

But if the type is changed to uint64_t, then the gimple is transformed into

  _1 = a_2(D) - b_3(D);
  _4 = _1 + 18446744073709551615;

Which is almost impossible for the back-end to optimize back into the optimal
sequence.  The result is that we end up with two carry-propagating subtract
operations instead of one and less parallelism in the overall sequence as the
bit-wise invert can operate in parallel on any super-scalar architecture.

Note that the same problem likely exists on 64-bit architectures if t is
uint128_t.

[Bug rtl-optimization/92281] Inconsistent canonicalization of (minus (minus A B) C)

2019-10-31 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92281

--- Comment #2 from Richard Earnshaw  ---
(In reply to Segher Boessenkool from comment #1)
> (In reply to Richard Earnshaw from comment #0)
> 
> > Failed to match this instruction:
> > (set (reg:SI 125 [+4 ])
> > (minus:SI (minus:SI (reg:SI 127)
> > (reg:SI 121 [ b+4 ]))
> > (ltu:SI (reg:CC 100 cc)
> > (const_int 0 [0]
> 
> > (set (reg:SI 125 [+4 ])
> > (minus:SI (minus:SI (reg:SI 127)
> > (reg:SI 121 [ b+4 ]))
> > (ltu:SI (reg:CC 100 cc)
> > (const_int 0 [0]
> 
> That is
> 
>   (set D (minus (minus A B) (X C 0)))
> 
> > Successfully matched this instruction:
> > (set (reg:SI 125 [+4 ])
> > (minus:SI (minus:SI (reg:SI 119 [ a+4 ])
> > (ltu:SI (reg:CC 100 cc)
> > (const_int 0 [0])))
> > (reg:SI 129)))
> 
> And this is
> 
>   (set D (minus (minus A (X C 0)) B))
> 

Yes, but since 
  (A - B) - C = A - B - C = A - C - B = (A - C) - B
we can clearly swap the order of the two RHS operands here.  This would be
a special rule similar to the rules that we have that rewrite 
  A - (B + C)
as
  (A - B) - C.

My suggestion would be that we should have a rule here that re-orders things so
that B is the most 'complex' operation and C the simplest, using the normal
precedence ordering (complex > REG > CONST).

> There are no rules for that afaics.
> 
> > These are mathematically equivalent, but because we do not produce
> > consistent RTL for them we need two patterns if we are to match both
> > alternatives.
> 
> Yes; the same is true for quite a few other unusual combinations.  Or
> not even so very unusual:
>   (ior (ashift X N) (lshiftrt Y M))
> vs.
>   (ior (lshiftrt Y M) (ashift X N))
> is one nasty example, but also reg+reg+reg where one of the regs is
> "special" can appear in multiple forms.
> 
> > I think both should be canonicalized with the LTU inside the inner MINUS
> > expression, but I wouldn't mind if the other were chosen, as long as we were
> > consistent.
> 
> What would the rule become?  

See suggestion above.  I think we might also have a rule that within 'complex'
the ordering might be by RTX code number, but that's somewhat arbitrary;
thought it is likely to be fairly stable.  It would produce a strict canonical
ordering for your IOR case above, however.

> What targets would it break, and how?

Hard to tell, until we try it.  Mostly the 'breakage' would be some combine
patterns might no-longer match if the target only had one and the ordering were
not canonical (leading to some missed optimizations).  On targets that have
both orderings, some patterns might become redundant and never match unless
directly generated by the back-end.

> 
> What makes combine come up with something else for these two cases?

Sorry, I don't understand what you're asking here?  Why does it produce these
two separate canoncializations in one compilation?  I've no idea, hence the bug
report.

[Bug rtl-optimization/92281] Inconsistent canonicalization of (minus (minus A B) C)

2019-10-31 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92281

--- Comment #3 from Richard Earnshaw  ---
As for 'special' regs and their ordering, I'm not sure.  I would suggest that
if we have a commutative operation with two registers and one of the registers
is marked as a pointer, then it should appear first.  But other than that, I
don't have any other suggestions here.

[Bug middle-end/92308] New: Gimple passes could do a better job of forming address CSEs

2019-10-31 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92308

Bug ID: 92308
   Summary: Gimple passes could do a better job of forming address
CSEs
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---

Consider this testcase which was mentioned in
https://gcc.gnu.org/ml/gcc-help/2019-10/msg00122.html.  

#define BB_ADDRESS 0x43fe1800

void test1(void) {
  volatile uint32_t * const p = (uint32_t *) BB_ADDRESS;

  p[3] = 1;
  p[4] = 2;
  p[1] = 3;
  p[7] = 4;
  p[0] = 6;
}

The gimple generated for this is

test1 ()
{
;;   basic block 2, loop depth 0
;;pred:   ENTRY
  MEM[(volatile uint32_t *)1140725772B] ={v} 1;
  MEM[(volatile uint32_t *)1140725776B] ={v} 2;
  MEM[(volatile uint32_t *)1140725764B] ={v} 3;
  MEM[(volatile uint32_t *)1140725788B] ={v} 4;
  MEM[(volatile uint32_t *)1140725760B] ={v} 6;
  return;
;;succ:   EXIT

}

However, it's very unlikely on any RISC type architecture that addresses of
this form will be valid.  The TARGET_LEGITIMIZE_ADDRESS hook can help here, but
that has to guess how to split the address and it has no idea what, for each
call, the best base that should be chosen.  In this case the best base is
likely to be the lowest addressed object in the sequence, so that all other
objects can use a small positive offset from that.

The GIMPLE passes have a much broader view on the code being optimized, so
forming a common base for all these addresses should be straight forward and
much more likely to lead to better code than having to use a heuristic in the
back-end.

[Bug rtl-optimization/92294] alias attribute generates incorrect code

2019-10-31 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92294

--- Comment #1 from Richard Earnshaw  ---
Things go wrong in the forward-prop 1 pass.

[Bug rtl-optimization/92281] Inconsistent canonicalization of (minus (minus A B) C)

2019-11-01 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92281

--- Comment #5 from Richard Earnshaw  ---
(In reply to Segher Boessenkool from comment #4)
> (In reply to Richard Earnshaw from comment #2)
> > Yes, but since 
> >   (A - B) - C = A - B - C = A - C - B = (A - C) - B
> > we can clearly swap the order of the two RHS operands here.
> 
> My intent was to show the two rtx shapes, and that neither is a defined
> canonical form.
> 
> >  This would be
> > a special rule similar to the rules that we have that rewrite 
> >   A - (B + C)
> > as
> >   (A - B) - C.
> 
> That isn't a canonical form, either!  Not according to the documentation,
> anyway.
> 

What I've shown is equivalent to (minus (minus (A) (B)) (C)), which is what
combine produces today.  Are you saying that the documentation disagrees on the
overall shape of this and the compilers output right now?

> > My suggestion would be that we should have a rule here that re-orders things
> > so
> > that B is the most 'complex' operation and C the simplest, using the normal
> > precedence ordering (complex > REG > CONST).
> 
> But minus isn't commutative, and reordering with minus introduces negs which
> is wrong (it is canonical to *remove* such negs).
> 

Minus isn't commutative, but in a 3-way version (A - B - C), the order of B and
C does not matter.  ... - B - C is the same as ... - C - B.  So you can
re-order the nesting to produce a canonical form.

> > > What targets would it break, and how?
> > 
> > Hard to tell, until we try it.  Mostly the 'breakage' would be some combine
> > patterns might no-longer match if the target only had one and the ordering
> > were not canonical (leading to some missed optimizations).  On targets that
> > have both orderings, some patterns might become redundant and never match
> > unless directly generated by the back-end.
> 
> The breakage will be that many targets optimise worse than they did before.
> And this is non-obvious to detect, usually.

At present it's entirely random, since there's no attempt to create order.  Any
matching that does occur is more by good luck (or overkill in providing all the
redundant variant forms).

> 
> > > What makes combine come up with something else for these two cases?
> > 
> > Sorry, I don't understand what you're asking here?  Why does it produce
> > these two separate canoncializations in one compilation?  I've no idea,
> > hence the bug report.
> 
> A lot of what combine does is *not* canonicalisation.  But combine comes up
> with only one result for every attempted combination, making that a kind-of
> de-facto canonicalisation.
> 
> And yes, that is what I asked: in both cases it combined the same insn with
> a simple pseudo move, in both cases on the RHS in that insn.  And it came
> up with different results.
> 
> This may be unavoidable, or combine does something weird, or the RTL that
> combine started with was non-canonical or unexpected in some other way, etc.
> 
> So I'd like to know where the difference was introduced.  Was it in combine
> at all, to start with?  It can be in simplify-rtx as well for example.

Combine is the prime user of simplify-rtx - perhaps I'm conflating the two, but
this is, in part, combine's problem because it's during the combine pass that
having matchers for all these variants becomes most important.

[Bug middle-end/92308] Gimple passes could do a better job of forming address CSEs

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92308

--- Comment #2 from Richard Earnshaw  ---
Very few micro-architectures would benefit from auto-inc style addressing in a
sequence like this.  With modern super-scaler systems you want to use offset
addressing where possible (from a common base).  Auto-incs create serialization
in the instruction stream and thus restrict multiple-issue.

Even loops should only use one increment per base per iteration (using pre/post
modify if necessary).

[Bug middle-end/92308] Gimple passes could do a better job of forming address CSEs

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92308

--- Comment #4 from Richard Earnshaw  ---
So taking the example I posted in the initial report and compiling with trunk
for arm -mcpu=cortex-m4 -mthumb -Os, we get:

ldr r3, .L2
movsr2, #1
str r2, [r3, #2060]
movsr2, #2
str r2, [r3, #2064]
movsr2, #3
str r2, [r3, #2052]
movsr2, #4
str r2, [r3, #2076]
movsr2, #6
str r2, [r3, #2048]
bx  lr 
.L2:
.word   0x43fe1000

Because the backend (in TARGET_LEGITIMIZE_ADDRESS) has had to guess at a base,
and has chosen to split off the bottom 12 bits into the offset (giving the
maximum range and therefore the most likely base to form as many CSEs as
possible).

But using this base means that the str instructions need a 32-bit encoding as
the offsets exceed the limit for the 16-bit encoded version.

We could choose to to split off only 7 bits of offset, then we could use the
smaller encoding, but now we reduce the likelihood of finding common bases.

But there's no real need to do this by splitting the bits with a mask, if we
have a global view of what's going on (the problem is that
TARGET_LEGITIMIZE_ADDRESS does not have a global view); we could pick the
original BB_ADDRESS as the base just as easily as any other.

Also note that if BB_ADDRESS were changed to 0x43fefff8, then there is
practically no chance of the back-end finding an optimal base as the address
range spans a mask boundary, regardless of which mask we chose.

[Bug middle-end/92308] Gimple passes could do a better job of forming address CSEs

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92308

--- Comment #6 from Richard Earnshaw  ---
(In reply to rguent...@suse.de from comment #5)
> On Mon, 4 Nov 2019, rearnsha at gcc dot gnu.org wrote:

> I suspect TARGET_LEGITIMIZE_ADDRESS is only applied during
> reload/LRA, correct?

No, it's called during expand if the address isn't valid.  But it's called in
isolation with no information about what other addresses might be generated, so
forming bases is guesswork based purely on heuristics.

There's a similar hook in LRA (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT), but it
has the same basic problem that the B/E doesn't know what other values might
need legitimizing.

[Bug middle-end/92308] Gimple passes could do a better job of forming address CSEs

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92308

--- Comment #7 from Richard Earnshaw  ---
Reload also had a hook TARGET_LEGITIMIZE_RELOAD_ADDRESS as well.  But it had
the same problems - lack of context leading to guesswork and therefore too
local or too general fix-ups.

[Bug rtl-optimization/92342] [10 Regression] a small missed transformation into x?b:0

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342

--- Comment #5 from Richard Earnshaw  ---
So if the AND-based idiom is now preferred, shouldn't the if-then-else variant
be transformed into it?  Similarly for IOR, when we get

(IOR (NEG ()) (reg))

from

(IF_THEN_ELSE ()
  (reg)
  (const_int -1))

[Bug rtl-optimization/92342] [10 Regression] a small missed transformation into x?b:0

2019-11-04 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342

--- Comment #6 from Richard Earnshaw  ---
(In reply to Richard Earnshaw from comment #5)
> So if the AND-based idiom is now preferred, shouldn't the if-then-else
> variant be transformed into it?  Similarly for IOR, when we get
> 
> (IOR (NEG ()) (reg))
> 
> from
> 
> (IF_THEN_ELSE ()
>   (reg)
>   (const_int -1))

except that should be 

(IF_THEN_ELSE (')
  (reg)
  (const_int -1))

Where ' is the reversed condition.

[Bug rtl-optimization/92342] [10 Regression] a small missed transformation into x?b:0

2019-11-05 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92342

--- Comment #9 from Richard Earnshaw  ---
(In reply to Segher Boessenkool from comment #7)

> I think the IF_THEN_ELSE version should be canonical, and it should be
> formed in simplify_rtx, not at random spots in combine.

Why?  The and/ior variants are more likely to lead to a useful split if
combining 3 insns and it doesn't match.

[Bug target/92462] [arm32] -ftree-pre makes a variable to be wrongly hoisted out

2019-11-12 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92462

Richard Earnshaw  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Richard Earnshaw  ---
(In reply to Aleksei Voitylov from comment #4)
> Isn't
> 
>uint32_t cur = *aligned_dest;
>uint8_t* cur_as_bytes = reinterpret_cast(&cur);
> 
> the very definition of the pointer aliasing? 

No.  the standard says (paraphrasing, read the standard for the exact words)
that pointers to different types cannot point to the same object, unless one of
the pointers is a pointer to char.

As previously explained, uint8_t is not a char.  So the compiler is allowed to
assume that, because cur is not modified inside the loop it can be lifted out
entirely and treated as unchanging.

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-11-15 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314

--- Comment #28 from Richard Earnshaw  ---
The last release of gcc-7 has now been made, so it's end-of-life and no further
fixes for it will be made.

[Bug target/92071] [10 regression][ARM] ice in gen_movsi, at config/arm/arm.md:5378

2019-11-21 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92071

--- Comment #5 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #4)
> I'd say this should be fixed in the arm backend, instead of asserts it
> should check whether operands are aligned and if not, perform unaligned load
> or store,
> because the amount of spots in the middle-end that actually just call
> emit_move_insn when they see a MEM is huge.

Huh, this is a mid-end bug.  How can fixing it in the backend be anything bug a
hack?

There's a contract in place here.  If the target defines STRICT_ALIGNMENT, the
midend must NEVER pass an unaligned object to gen_movsi

[Bug rtl-optimization/37377] [4.4 Regression] Bootstrap failure compiling libgcc

2019-12-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37377

--- Comment #17 from Richard Earnshaw  ---
last patch was for pr37577.

[Bug ada/70786] Missing "not" breaks Ada.Text_IO.Get_Immediate(File, Item, Available)

2019-12-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70786

--- Comment #9 from Richard Earnshaw  ---
comment 8 should be for pr70876.

[Bug fortran/36117] Use MPFR for bessel function (optimization, rejects valid F2008)

2020-01-01 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36117

--- Comment #6 from Richard Earnshaw  ---
Comment #5 was really for PR36177

[Bug target/93119] [ICE] The traditional TLS support of aarch64-ilp32 target may be not perfect while enable fPIC

2020-01-03 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93119

--- Comment #3 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #2)
> Simplier patch, change PTR to P instead.  Mine then.
> 
> That is:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index f114f85..dd10ec5 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6725,7 +6725,7 @@
>   [(parallel [(set (match_operand 0 "register_operand")
>(call (mem:DI (match_dup 2)) (const_int 1)))
>  (unspec:DI [(const_int 0)] UNSPEC_CALLEE_ABI)
> -(unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref")]
> UNSPEC_GOTSMALLTLS)
> +(unspec:DI [(match_operand:P 1 "aarch64_valid_symref")]
> UNSPEC_GOTSMALLTLS)
>  (clobber (reg:DI LR_REGNUM))])]
>   ""
>  {
> @@ -6736,7 +6736,7 @@
>[(set (match_operand 0 "register_operand" "")
> (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
> (unspec:DI [(const_int 0)] UNSPEC_CALLEE_ABI)
> -   (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "S")]
> UNSPEC_GOTSMALLTLS)
> +   (unspec:DI [(match_operand:P 1 "aarch64_valid_symref" "S")]
> UNSPEC_GOTSMALLTLS)
> (clobber (reg:DI LR_REGNUM))
>]
>""

I don't think that's right either.  These are supposed to be machine addresses,
not C pointers.

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-06 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005

--- Comment #6 from Richard Earnshaw  ---
(In reply to Joel Holdsworth from comment #5)
> I found that if I make modified versions of the intrinsics in arm_neon.h
> that are designed more along the lines of the x86_64 SSE intrinsics defined
> with a simple pointer dereference, then gcc does the right thing [1].
> 
> 
> #include 
> 
> __extension__ extern __inline void
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vst1q_s32_fixed (int32_t * __a, int32x4_t __b)
> {
>   *(int32x4_t*)__a = __b;
> }
> 
> __extension__ extern __inline int32x4_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vld1q_s32_fixed (const int32_t * __a)
> {
>   return *(const int32x4_t*)__a;
> }
> 
> int32x4_t foo(int32x4_t a)
> {
> int32_t temp[4];
> vst1q_s32_fixed(temp, a);
> return vld1q_s32_fixed(temp);
> }
> 
> 
> 
> ...compiles to:
> 
> foo(long __vector(4)):
> bx  lr
> 
> 
> Is there any reason not to simply redefine vst1q_s32, vld1q_s32 and friends
> to stop using builtins?
> 

Did you test it with big-endian?

[Bug target/93005] Redundant NEON loads/stores from stack are not eliminated

2020-01-07 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005

--- Comment #8 from Richard Earnshaw  ---
(In reply to Joel Holdsworth from comment #7)
> > Did you test it with big-endian?
> 
> Good question. It seems to do the right thing in both cases:
> https://godbolt.org/z/7rDzAm

foo2(long*, __simd128_int32_t):
vst1.64 {d0-d1}, [r0:64]
bx  lr

Well for big-endian that is wrong.  You've got a vector of 32-bit elements but
you're storing it as 64-bit elements, so when you look in memory you'll find
the elements permuted.

[Bug target/93188] [9/10-regression] a-profile multilib mismatch for rmprofile toolchain when architecture includes +mp or +sec

2020-01-07 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93188

Richard Earnshaw  changed:

   What|Removed |Added

 Target||arm
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-01-07
  Component|driver  |target
Summary|a-profile multilib mismatch |[9/10-regression] a-profile
   |for rmprofile toolchain |multilib mismatch for
   |when architecture includes  |rmprofile toolchain when
   |+mp or +sec |architecture includes +mp
   ||or +sec
 Ever confirmed|0   |1

--- Comment #1 from Richard Earnshaw  ---
confirmed, patch in testing

[Bug target/93188] [9/10-regression] a-profile multilib mismatch for rmprofile toolchain when architecture includes +mp or +sec

2020-01-08 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93188

--- Comment #2 from Richard Earnshaw  ---
Author: rearnsha
Date: Wed Jan  8 09:29:02 2020
New Revision: 279993

URL: https://gcc.gnu.org/viewcvs?rev=279993&root=gcc&view=rev
Log:
arm: Fix rmprofile multilibs when architecture includes +mp or +sec (PR
target/93188)

When only the rmprofile multilibs are built, compiling for armv7-a
should select the generic v7 multilibs.  This used to work before +sec
and +mp were added to the architecture options but it was broken by
that update.  This patch fixes those variants and adds some tests to
ensure that they remain fixed ;-)

PR target/93188
* config/arm/t-multilib (MULTILIB_MATCHES): Add rules to match
armv7-a{+mp,+sec,+mp+sec} to appropriate armv7 multilib variants
when only building rm-profile multilibs.

* gcc.target/arm/multilib.exp: Add new tests for rm-profile only.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/arm/t-multilib
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/arm/multilib.exp

[Bug target/93188] [9 regression] a-profile multilib mismatch for rmprofile toolchain when architecture includes +mp or +sec

2020-01-08 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93188

Richard Earnshaw  changed:

   What|Removed |Added

Summary|[9/10-regression] a-profile |[9 regression] a-profile
   |multilib mismatch for   |multilib mismatch for
   |rmprofile toolchain when|rmprofile toolchain when
   |architecture includes +mp   |architecture includes +mp
   |or +sec |or +sec

--- Comment #3 from Richard Earnshaw  ---
Fixed on trunk so far.

[Bug target/93188] [9 regression] a-profile multilib mismatch for rmprofile toolchain when architecture includes +mp or +sec

2020-01-10 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93188

--- Comment #4 from Richard Earnshaw  ---
Author: rearnsha
Date: Fri Jan 10 16:50:15 2020
New Revision: 280123

URL: https://gcc.gnu.org/viewcvs?rev=280123&root=gcc&view=rev
Log:
backport: arm: Fix rmprofile multilibs when architecture includes +mp or +sec
(PR target/93188)

When only the rmprofile multilibs are built, compiling for armv7-a
should select the generic v7 multilibs.  This used to work before +sec
and +mp were added to the architecture options but it was broken by
that update.  This patch fixes those variants and adds some tests to
ensure that they remain fixed 

gcc/ChangeLog:
2020-01-10  Przemyslaw Wirkus  

Backport from trunk
PR target/93188
* config/arm/t-multilib (MULTILIB_MATCHES): Add rules to match
armv7-a{+mp,+sec,+mp+sec} to appropriate armv7 multilib variants
when only building rm-profile multilibs.

gcc/testsuite/ChangeLog:
2020-01-10  Przemyslaw Wirkus  

Backport from trunk
* gcc.target/arm/multilib.exp: Add new tests for rm-profile only.

Modified:
branches/gcc-9-branch/gcc/ChangeLog
branches/gcc-9-branch/gcc/config/arm/t-multilib
branches/gcc-9-branch/gcc/testsuite/ChangeLog
branches/gcc-9-branch/gcc/testsuite/gcc.target/arm/multilib.exp

[Bug target/93188] [9 regression] a-profile multilib mismatch for rmprofile toolchain when architecture includes +mp or +sec

2020-01-10 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93188

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Richard Earnshaw  ---
Fixed

[Bug target/92071] [10 regression][ARM] ice in gen_movsi, at config/arm/arm.md:5378

2020-01-17 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92071

--- Comment #7 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #6)
> Agreed.  Did anybody bisect what caused this?

It only came to light when we added a check in the backend.  So I'm not sure a
bisect will be that helpful, except if you try with that patch applied to every
revision under bisect.  Even that might not help as this is likely a
long-standing issue.

[Bug target/93341] [10 Regression] ICE in aarch64_do_track_speculation, at config/aarch64/aarch64-speculation.cc:221

2020-01-21 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93341

--- Comment #5 from Richard Earnshaw  ---
(In reply to Andrew Pinski from comment #3)
>   /* We should be able to reverse all conditions.  */
>   gcc_assert (inv_cond_code != UNKNOWN);
> 
> Obvious this code is broken because The quiet UN* was converted into the an
> unordered-signalling comparison which was bad.
> 
> Quote from the recent change:
> And it would do the same in reverse: convert a quiet UN* into an
> unordered-signalling comparison.
> 
> So obvious this code in aarch64_do_track_speculation was broken when it was
> added, just the aarch64 back-end was broken to do the wrong thing in the
> first place so we never able to hit the assert before hand.

At least as far as Arm hardware is concerned, there is no such thing as an
unreversible comparison.  The Arm condition codes (except AL, which isn't
really a condition) are all 100% reversisble.

The problem here is GCC's insane convolution of the comparison phase with the
final condition and hence its insistance that reversing, say LT to UNGE is not
possible because the associated comparison must change.  This is frankly
bonkers.  If we want to force the choice of a trapping/non-trapping comparison,
it really, Really, REALLY should be described independently of the condition
under which the branch is taken.

[Bug rtl-optimization/93235] [AArch64] ICE with __fp16 in a struct

2020-01-23 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235

Richard Earnshaw  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #4 from Richard Earnshaw  ---
Looks to be generic expansion code that is running amok.

#0  fancy_abort (
file=0x232c400 "/home/rearnsha/gnusrc/gcc-cross/master/gcc/emit-rtl.c", 
line=1021, 
function=0x232d608 )::__FUNCTION__> "gen_rtx_SUBREG")
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/diagnostic.c:1768
#1  0x00c7fc9f in gen_rtx_SUBREG (mode=E_SImode, reg=0x7630bcc0, 
offset=...) at /home/rearnsha/gnusrc/gcc-cross/master/gcc/emit-rtl.c:1021
#2  0x00cb3bcb in store_bit_field_using_insv (insv=0x7fffced0, 
op0=0x7630bee8, op0_mode=..., bitsize=16, bitnum=0, 
value=0x7630bed0, value_mode=...)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expmed.c:630
#3  0x00cb6031 in store_integral_bit_field (op0=0x7630bee8, 
op0_mode=..., bitsize=16, bitnum=0, bitregion_start=..., 
bitregion_end=..., fieldmode=E_HImode, value=0x7630bed0, 
reverse=false, fallback_p=true)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expmed.c:1050
#4  0x00cb5430 in store_bit_field_1 (str_rtx=0x7630bcc0, 
bitsize=..., bitnum=..., bitregion_start=..., bitregion_end=..., 
fieldmode=E_HImode, value=0x7630bed0, reverse=false, fallback_p=true)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expmed.c:870
#5  0x00cb6a9a in store_bit_field (str_rtx=0x7630bcc0, 
bitsize=..., bitnum=..., bitregion_start=..., bitregion_end=...,
fieldmode=E_HImode, value=0x7630bed0, reverse=false)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expmed.c:1177
#6  0x00cea57a in store_field (target=0x7630bcc0, bitsize=..., 
bitpos=..., bitregion_start=..., bitregion_end=..., mode=E_HImode, 
exp=0x764cab88, alias_set=0, nontemporal=false, reverse=false)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expr.c:7192
#7  0x00ce1ba2 in expand_assignment (to=0x763090f0, 
from=0x764cab88, nontemporal=false)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/expr.c:5369
#8  0x00b552d8 in expand_gimple_stmt_1 (stmt=0x762df960)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/cfgexpand.c:3749
#9  0x00b556c7 in expand_gimple_stmt (stmt=0x762df960)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/cfgexpand.c:3847
#10 0x00b5e295 in expand_gimple_basic_block (bb=0x762cf138, 
disable_tail_calls=false)
...

Also fails if __fp16 is changed to _Float16.

[Bug bootstrap/93548] arm-tune.md and arm-tables.opt should be updated with move-if-changed

2020-02-03 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93548

Richard Earnshaw  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2020-02-03
 Resolution|INVALID |---
Summary|gcc build tries to modify   |arm-tune.md and
   |source tree |arm-tables.opt should be
   ||updated with
   ||move-if-changed
 Ever confirmed|0   |1

--- Comment #6 from Richard Earnshaw  ---
Re-opening since the makefile fragment in t-arm should be updated to use
move-if-change.

[Bug bootstrap/93548] arm-tune.md and arm-tables.opt should be updated with move-if-changed

2020-02-03 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93548

Richard Earnshaw  changed:

   What|Removed |Added

 Status|REOPENED|NEW

--- Comment #7 from Richard Earnshaw  ---
Confirmed

[Bug bootstrap/93548] arm-tune.md and arm-tables.opt should be updated with move-if-changed

2020-02-03 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93548

Richard Earnshaw  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Richard Earnshaw  ---
Fixed on trunk.  Not planning a backport, but wouldn't be hard.

[Bug bootstrap/93548] arm-tune.md and arm-tables.opt should be updated with move-if-changed

2020-02-03 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93548

--- Comment #11 from Richard Earnshaw  ---
I don't think so, since the write back will update the timestamp.  It would
only rerun it once per make anyway.  

Also, the timestamp approach is really designed for files in the build area,
not those in the source tree.  While I'd prefer these files to live in the
build area, neither can at present because the build system won't look for .opt
or .md files there.

[Bug target/91913] [8/9/10 Regression] ICE in extract_constrain_insn, at recog.c:2211

2020-02-10 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91913

--- Comment #4 from Richard Earnshaw  ---
Main bug fixed with https://gcc.gnu.org/ml/gcc-cvs/2020-02/msg02312.html
Awaiting commit of testcase.

[Bug c++/93674] GCC eliminates conditions it should not, when strict-enums is on

2020-02-11 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93674

--- Comment #5 from Richard Earnshaw  ---
I'm seeing it on AArch64 for master.  Adding an enum value with an initializer
of -1 causes the problem to go away.  So it looks like the 'unsigned'
conversion is happening too soon.

[Bug rtl-optimization/93565] [9/10 regression] Combine duplicates instructions

2020-02-12 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93565

--- Comment #14 from Richard Earnshaw  ---
With the simpler test case we see

Breakpoint 1, try_combine (i3=0x764d33c0, i2=0x764d3380, i1=0x0, 
i0=0x0, new_direct_jump_p=0x7fffd850, 
last_combined_insn=0x764d33c0)
at /home/rearnsha/gnusrc/gcc-cross/master/gcc/combine.c:2671
2671{
(nil)
(nil)
(insn 7 4 8 2 (set (reg/v:SI 96 [ a ])
(and:SI (reg:SI 104)
(const_int 14 [0xe]))) "/tmp/t2.c":3:7 535 {andsi3}
 (expr_list:REG_DEAD (reg:SI 104)
(nil)))
(insn 8 7 10 2 (set (reg:DI 99 [ a ])
(sign_extend:DI (reg/v:SI 96 [ a ]))) "/tmp/t2.c":4:13 106
{*extendsidi2_aarch64}
 (nil))


And then the resulting insn that we try is

(parallel [
(set (reg:DI 99 [ a ])
(and:DI (subreg:DI (reg:SI 104) 0)
(const_int 14 [0xe])))
(set (reg/v:SI 96 [ a ])
(and:SI (reg:SI 104)
(const_int 14 [0xe])))
])

This insn doesn't match, and so we try to break it into two set insn and try
those individually.  But that gives us back insn 7 again and then a new insn
based on the (now extended lifetime) of r104.  It seems to me that if we are
doing this sort of transformation, then it's only likely to be profitable if
the cost of the really new insn is strictly cheaper than what we have before. 
Being the same cost is not enough in this case.

[Bug rtl-optimization/90378] [9/10 regression] -Os -flto miscompiles 454.calculix after r266385 on Arm

2020-02-18 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90378

--- Comment #5 from Richard Earnshaw  ---
(In reply to Vladimir Makarov from comment #4)
> > Miscompilation occurs in same configuration: arm-linux-gnueabihf at -O2
> > -flto.
> > 
> 
> I don't see how these two patches *directly* resulted in miscompilation. 
> Although it might trigger some latent bug.
> 
> 
> > I'll try to narrow this down to a single object.
> 
> Thank you, Maxim.  Unfortunately I have no arm machine with spec2006 to
> reproduce it by myself. You help here would be much appreciated.

While I don't see a 1% regression, I did notice a measurable code size increase
when building CSiBE for thumb-2 and -Os.  It's hard to put my finger on what is
causing this, beyond the fact that the compiler is making more use of 'high'
registers and this results in using more 32-bit rather than 16-bit
instructions.

[Bug rtl-optimization/90378] [9/10 regression] -Os -flto miscompiles 454.calculix after r266385 on Arm

2020-02-18 Thread rearnsha at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90378

--- Comment #6 from Richard Earnshaw  ---
(In reply to Richard Earnshaw from comment #5)
> (In reply to Vladimir Makarov from comment #4)
> > > Miscompilation occurs in same configuration: arm-linux-gnueabihf at -O2
> > > -flto.
> > > 
> > 
> > I don't see how these two patches *directly* resulted in miscompilation. 
> > Although it might trigger some latent bug.
> > 
> > 
> > > I'll try to narrow this down to a single object.
> > 
> > Thank you, Maxim.  Unfortunately I have no arm machine with spec2006 to
> > reproduce it by myself. You help here would be much appreciated.
> 
> While I don't see a 1% regression, I did notice a measurable code size
> increase when building CSiBE for thumb-2 and -Os.  It's hard to put my
> finger on what is causing this, beyond the fact that the compiler is making
> more use of 'high' registers and this results in using more 32-bit rather
> than 16-bit instructions.

Err, ignore that.  Wrong bug.

[Bug target/56441] [ARM Thumb] generated asm code produces "branch out of range" error in gas with -O1 -mcpu=cortex-m3

2013-02-26 Thread rearnsha at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56441



--- Comment #8 from Richard Earnshaw  2013-02-26 
17:01:36 UTC ---

Please use an open (non-proprietory) file format for attaching files.  I don't

have access to RAR format.

[Bug target/56441] [ARM Thumb] generated asm code produces "branch out of range" error in gas with -O1 -mcpu=cortex-m3

2013-02-26 Thread rearnsha at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56441



--- Comment #9 from Richard Earnshaw  2013-02-26 
17:03:10 UTC ---

(In reply to comment #7)

> I was looking completely wrong, the arm_addsi3 is acting wrong.

> 

> The "add%?\\t%0, %1, %2" for "=l,%0,Py"  is set at a length of 2.

> (first entry in the list)

> 

> However the "ADD r6,r6, #65" is a thumb2 instruction which is 4 bytes and not

> 2.

> 

> An "ADDS r6,r6,#65" will go right because this is a thumb instruction of 2

> bytes.

> 

> Same for the first "SUB" in the same list.

> 

> So I end up with a miscalculation of 2bytes. 

> 

> Perhaps it's better to make it conservative and always use length of 4.

> 

> 

> I guess that this isn't the right way but I have put un-predicables in front 
> of

> the predicable counter parts with the right length of 4.



This doesn't make sense.  arm_addsi3 in my copy of the sources doesn't use the

Py constraint.



Exactly what version of GCC are you using?

[Bug target/56470] [4.8 Regression] ICE output_operand: invalid shift operand

2013-03-03 Thread rearnsha at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56470



Richard Earnshaw  changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-03-04

 Ever Confirmed|0   |1



--- Comment #1 from Richard Earnshaw  2013-03-04 
07:01:07 UTC ---

Confirmed. 



I'm a bit wary of just truncating the value.  Shifts by 32 may be valid in the

ARM back-end in some circumstances where we're using the shift as part of

setting up the flags.



Fixing this fully would require getting rid of "shift_operator" and replacing

it with iterators.  But that's a pretty radical overhaul.  Long term that might

well be worthwhile, but not this close to a release.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1349 matches

Mail list logo