from:"segher at gcc dot gnu.org via Gcc\-bugs"

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #6 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #4)
> I'm not sure what your proposed not noreturn trap() would do in terms of
> IL semantics compared to a not specially annotated general call?

Nothing I think?  But __builtin_trap *is* very different: it ends BBs.

> "recoverable" likely means resuming after the trap, not on an exception
> path (so it'll not be a throw())?

"recoverable" is super unclear.  For example, on Power the hardware has a
concept "recoverable interrupt", which set MSR[RI]=1, and traps never do.
This is a very different concept as what is wanted here, which has nothing
to do with recoverability, and is simply about not being an abort() (which
__builtin_trap *is*!)

> The only thing that might be useful to the middle-end would be marking
> the function as not altering the memory state.  But I suppose it should
> still serve as a barrier for code motion of both loads and stores, even
> of those loads/stores are known to not trap.  The only magic we'd have
> for this would be __attribute__((const,returns_twice)).  Which likely
> will be more detrimental to general optimization.
> 
> So - what's the "sub-optimal code generation" you refer to from the
> (presumably) volatile asm() you use for the trap?
> 
> [yeah, asm() on GIMPLE is less optimized than a call]

The rs6000 backend can optimise the used instructions: we have trap_if
instructions, both with registers and with immediates.  A single
instruction can do a comparison and a conditional trap.  This works great
with __builtin_trap, *if* the kernel's trap handler has abort() semantics.

__builtin_trap_no_abort() maybe?

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #7 from Segher Boessenkool  ---
(In reply to Franz Sirl from comment #5)
> For the naming I suggest __builtin_debugtrap() to align with clang. Maybe
> with an aliased __debugbreak() on Windows platforms.

Those are terrible names.  This would *not* be used more often than
__builtin_trap, for debugging.

In general, builtins should say what they *do*, nott what you imagine they
will be used for.

[Bug middle-end/99299] Need a recoverable version of __builtin_trap()

2021-03-01 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299

--- Comment #9 from Segher Boessenkool  ---
The i386 port has

===
(define_insn "trap"
  [(trap_if (const_int 1) (const_int 6))]
  ""
{
#ifdef HAVE_AS_IX86_UD2
  return "ud2";
#else
  return ASM_SHORT "0x0b0f";
#endif
}
  [(set_attr "length" "2")])
===

which implements __builtin_trap, and can implement __builtin_trap_no_abort
just fine as well, if your OS kernel (or similar) can return after a ud2.

If clang uses terribly confusing names (or semantics, or syntax, etc.) we
should not copy that from them.  *Especially* when that already conflicts
with names they copied from us.

[Bug testsuite/99352] New: check_effective_target_sqrt_insn for powerpc is wrong

2021-03-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Bug ID: 99352
   Summary: check_effective_target_sqrt_insn for powerpc is wrong
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

It just just says
  [istarget powerpc*-*-*]
but it should test whether the preprocessor symbol "_ARCH_PPCSQ" is defined.

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Segher Boessenkool  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Target||powerpc*-*-*
   Last reconfirmed||2021-03-02
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Segher Boessenkool  ---
Mine.

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

--- Comment #3 from Segher Boessenkool  ---
rs6000 has check_effective_target_powerpc_fprs already (with slightly
different semantics).

[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
Just FYI:

There are four Power Linux systems in the cfarm (as well as some AIX).

gcc110  POWER7  BE
gcc203  POWER8  BE
gcc112  POWER8  LE
gcc135  POWER9  LE

The last one is by far the most powerful of these.

[Bug target/98959] ICE in extract_constrain_insn, at recog.c:2670

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98959

--- Comment #20 from Segher Boessenkool  ---
(In reply to Bill Schmidt from comment #14)
> We should definitely not be allowing the AltiVec "& ~16" flavors into these
> patterns.  I'm not certain whether your fix is the best way to achieve that,
> but it could well be; I'll defer to Segher on that.

Hey, it works, so it is okay for now at least.  Longer term we should
probably think of something more elegant and less failure-prone.

[Bug testsuite/99352] check_effective_target_sqrt_insn for powerpc is wrong

2021-03-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99352

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Segher Boessenkool  ---
commit c60ad1c5fe0249f48362be0f989184ca447f9d17
Author: Segher Boessenkool 
Date:   Wed Mar 3 20:34:32 2021 +

rs6000: Fix check_effective_target_sqrt_insn (PR99352)

The previous version returned true for all PowerPC.  This is incorrect.
We only support floating point square root instructions if a) we support
floating point instructions at all, and b) we have _ARCH_PPCSQ defined.

2020-03-09  Segher Boessenkool  

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_powerpc_sqrt): New.
(check_effective_target_sqrt_insn): Use it.

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #5 from Segher Boessenkool  ---
Thanks Vladimir.  It is indeed a problem in LRA (or triggered by it).
We have
8: {[r121:DI+low(unspec[`*.LANCHOR0',%2:DI]
47+0x92a4)]=asm_operands;clobber

so this is an offset that is too big for a machine instruction, those can take
-32768..32767.

Changing the constraint to "m" you get in LRA
Inserting insn reload before:
   13: r121:DI=high(unspec[`*.LANCHOR0',%2:DI] 47+0x92a4)

but this doesn't happen if you keep it "o", and it dies later.

[Bug other/99496] [11 regression] g++.dg/modules/xtreme-header-3_c.C ICEs after r11-7557

2021-03-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99496

--- Comment #13 from Segher Boessenkool  ---
Hi Nathan,

I think you didn't push the branch that is on?

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #7 from Segher Boessenkool  ---
>From the offending patch:

-/* Return true if the eliminated form of AD is a legitimate target address. 
*/
+/* Return true if the eliminated form of AD is a legitimate target address.
+   If OP is a MEM, AD is the address within OP, otherwise OP should be
+   ignored.  CONSTRAINT is one constraint that the operand may need
+   to meet.  */
 static bool
-valid_address_p (struct address_info *ad)
+valid_address_p (rtx op, struct address_info *ad,
+enum constraint_num constraint)

The addition of those extra args makes clear that the function is no
longer just testing if it is a valid address.  It should be renamed.
And perhaps most callers should still use the old version, the one that
actually tests if something is a valid address?

[Bug target/98092] [11 Regression] ICE in extract_insn, at recog.c:2315 (error: unrecognizable insn) since r11-4623

2021-03-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98092

Segher Boessenkool  changed:

   What|Removed |Added

  Attachment #50040|0   |1
is obsolete||

--- Comment #6 from Segher Boessenkool  ---
Created attachment 50401
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50401&action=edit
Patch

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|acsawdey at gcc dot gnu.org|segher at gcc dot 
gnu.org

--- Comment #4 from Segher Boessenkool  ---
That is not where the UNGE and UNLE come from.  I have no idea where they
*do* come from though :-/

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

--- Comment #5 from Segher Boessenkool  ---
It helps if you test the compiler you just built, not something old.  Sigh.

Patch is testing.

[Bug target/99581] [11 Regression] internal compiler error: during RTL pass: final - void QTWTF::TCMalloc_PageHeap::scavengerThread() since r11-7526

2021-03-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99581

--- Comment #14 from Segher Boessenkool  ---
Well, V=m-o (not the same thing, these are sets) -- but, it is clear that "o"
should be a subset of "m":

(define_memory_constraint "TARGET_MEM_CONSTRAINT"
  "Matches any valid memory."

(define_memory_constraint "o"
  "Matches an offsettable memory reference."

So yeah, it should get the memory_address_addr_space_p thing.

[Bug testsuite/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2021-03-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

Segher Boessenkool  changed:

   What|Removed |Added

  Component|target  |testsuite
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Segher Boessenkool  ---
Fixed.

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2021-03-22 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

--- Comment #1 from Segher Boessenkool  ---
Yes, the __SIZEOF_* macros do not say whether some type can be used.  This is
true for all targets!

What would it be useful for to define these macros?  They all are equivalent to

#define SIXTEEN 16

:-)

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2021-03-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

--- Comment #3 from Segher Boessenkool  ---
The only such __SIZEOF_* macro that is not about a standards-required type
is for int128.  Not the best example ;-)

There are not predefines for __SIZEOF_FLOAT128__ etc. either.

In an ideal world the user can just assume those types exist always.  In a
less ideal world, use autoconf?  You have to anyway, if you want to support
older compilers at all.

[Bug target/97329] POWER9 default cache and line sizes appear to be wrong

2021-03-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329

--- Comment #10 from Segher Boessenkool  ---
GCC 11 stage 4 will be fine.

I doubt you can ever measure a difference, but you can try :-)

[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits

2021-03-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718

--- Comment #5 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #3)
> If the non-constant vec_set can't be supported when
> !(TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT)

I don't see why not?  It may need different code, sure, but that is much
preferable over contorting the rest of the backend.

[Bug target/99708] __SIZEOF_FLOAT128__ not defined on powerpc64le-linux

2021-03-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99708

--- Comment #6 from Segher Boessenkool  ---
(In reply to Jonathan Wakely from comment #5)
> (In reply to Segher Boessenkool from comment #3)
> > In an ideal world the user can just assume those types exist always.



> Arguably a __SIZEOF_xxx__ macro isn't a very sensible macro for types where
> the type has a guaranteed size,

Yes.  And it does not mean the type exist (or is usable), either.

> but we need *something* that says the type
> exists.

Do we?  The types should always exist!

> Since all other targets already use __SIZEOF_xxx__ to say that the
> type exists, it would be consistent and helpful for powerpc to do the same.

Other targets do not have __ieee128 or __ibm128.

[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits

2021-03-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718

--- Comment #7 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #6)
> I did not know whether it is implementable (in VSX or in Altivec) for 32-bit
> targets etc., all I was suggesting was what to do if it is not implementable.

Yes.

> If it is implementable, somebody familiar with VSX/Altivec should add the
> implementation, or we can temporarily use the patch that has been posted and
> get back to it later.

I haven't seen a patch posted yet?

> Or if it is partly implementable (e.g. can be done in
> VSX and can't be done in Altivec, etc.), then the patch can still be used
> after amendments for what will and what will not work.

The only thing I am saying it should be massively easier to just implement it
for -m32 as well, much easier than adding extra conditions (and unavoidably
getting that wrong).

> Right now it is a P1 blocker because we ICE on something that worked
> perfectly fine (perhaps slower than it could) in GCC 10.  So something needs
> to be done before GCC 11 and we have ~ a month left for that.

Yup.

I'll review any patch that is sent (cc: me, so that I see it immediately,
instead of after 3 to 6 weeks).

Thanks,


Segher

[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits

2021-03-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718

--- Comment #17 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #10)
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567215.html

Ah, that is more recent than anything I have replied to :-(

[Bug target/99718] [11 regression] ICE in new test case gcc.target/powerpc/pr98914.c for 32 bits

2021-03-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718

--- Comment #18 from Segher Boessenkool  ---
(In reply to luoxhu from comment #12)
> Not sure whether TARGET_DIRECT_MOVE_64BIT is the right MACRO to correctly
> differentiate m32 and m64?

It is not.  It looks at TARGET_POWERPC64 only, and that can be set for -m32
just fine.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-06 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #3 from Segher Boessenkool  ---
What happens here is
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/combine.c;h=3294575357bfcb19e589868da34364498a860dcf;hb=HEAD#l1884

"*2_1" for absneg:MODEF has a bare "use".  And then we trigger

  If the USE in INSN was for a pseudo register, the matching
  insn pattern will likely match any register; combining this
  with any other USE would only be safe if we knew that the
  used registers have identical values, or if there was
  something to tell them apart, e.g. different modes.  For
  now, we forgo such complicated tests and simply disallow
  combining of USES of pseudo registers with any other USE.

because both the abs and the neg have a bare use.

The patterns should be rewritten to not have such bare uses.  Alternatively
we can add some pretty-much-never-triggered code do combine to handle this
case.  Patches welcome.

[Bug tree-optimization/99927] [11 Regression] Maybe wrong code since r11-39-gf9e1ea10e657af9f

2021-04-06 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #9 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #5)
> But what is wrong is that try_combine has been called at all, because
> (reg:CCZ 17 flags) is used in 3 instructions rather than just one.

That is not a problem; If that were true it just would mean that
added_sets_2 should be set:

  added_sets_2 = !dead_or_set_p (i3, i2dest);

But, the flags reg actually *is* dead in i3 (insn 108), it dies in i2
(insn 107):

 (expr_list:REG_DEAD (reg:SI 107)

So something earlier is bad already.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #8 from Segher Boessenkool  ---
That patch is no good.  The combination is not allowed because it is not
known what the "use"s are *for*.  Checking if something is from the constant
pools is not enough at all.

[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #11 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #7)
> Ah, create_log_links wants to work like that.
> So, the bug seems to be that insn 108 has REG_DEAD (reg:CC 17 flags) note.
> It doesn't initially, but it is added during 106 -> 108 combination

But that combination should never have been made: flags is set in insn 107!

[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #13 from Segher Boessenkool  ---
Yes, combine just drops that clobber of flags, that was a thinko :-)

[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

--- Comment #14 from Segher Boessenkool  ---
distribute_notes says
  Any clobbers from i2 or i1 can only exist if they were added by
  recog_for_combine.
which is not true apparently.  But all of this code *does* depend
on that, it just doesn't make sense otherwise.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #10 from Segher Boessenkool  ---
You cannot fix a simplify-rtx problem in much earlier passes!  It may be
useful of course (I have no idea, I don't know gimple well enough), but
it is no solution to the problem at all.  The xor/and/xor thing should be
simplified to something proper.

((A^B)&C)^A = (A&~C)^(B&C) = (A&~C)|(B&C)

This should already be done by the expand pass.  At gimple level the logical
complement is counted as an operation, making the contorted xor/and/xor form
the best form to use, but in a system that considers more than just operation
counts (like in RTL) this is not the best form at all.  But, anyway, RTL
simplification should be able to do this.

Similar problems happen all over the place, fwiw -- see the various rl* tests
for rs6000, for example.

[Bug rtl-optimization/99930] Failure to optimize floating point -abs(x) in nontrivial code at -O2/3

2021-04-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99930

--- Comment #10 from Segher Boessenkool  ---
That is a USE of a constant, which is a no-op always.  Here we have a USE
of a register, which is not.  We actually have *two* uses of pseudos, and
combine cannot know what that means for the target (all PARALLELs are split
up in combine).

[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830

--- Comment #5 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #3)
> In normal insns such clobbers would be rejected by recog, but for
> DEBUG_INSNs we don't have strict validity tests, but guess we need to throw
> away at least the worst garbage.

combine puts clobbers of const0_rtx in instructions precisely because
those *should* be rejected; it does it to abort a combination attempt.
So it isn't clear to me why we end up with this here?  Papering over it
(as the proposed patch does) is not a good idea imho.

[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830

--- Comment #7 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #6)
> In the end on the actual instruction the clobber is optimized away

That is a very serious bug.

[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830

--- Comment #10 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #8)
> In particular, it is combine_simplify_rtx that is called on:
> (zero_extend:SI (subreg:QI (ior:TI (and:TI (reg/v:TI 103 [ f ])
> (const_int -16711681 [0xff00]))
> (ashift:TI (and:TI (clobber:TI (const_int 0 [0]))
> (const_int 255 [0xff]))
> (const_int 16 [0x10]))) 0))
> which simplifies it into
> (and:SI (subreg:SI (reg/v:TI 103 [ f ]) 0)
> (const_int 255 [0xff]))

That is very wrong.  A clobber of 0 should *never* be removed.  Various
parts of generic code know about that already, btw.

A clobber of 0 means "Abort! Abort!"  It does not mean "well, here is
something you can optimise away more easily".

Do you want to investigate further, or shall I?

[Bug c/100005] undefined reference to `_rdrand64_step'

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
So the only bug here is that we should give a better error message?  One
when taking the address, already.

[Bug c/100005] undefined reference to `_rdrand64_step'

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15

--- Comment #3 from Segher Boessenkool  ---
I'm not sure how/why "artificial" should prevent taking the address though?

[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830

--- Comment #12 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #11)
> I don't understand what is wrong about that.
> (clobber:TI (const_int 0 [0])) in there stands for couldn't figure out what
> this value is or how to represent it, so it is wildcard for I don't know
> what the value is.

That is not what it means.  It means "This instruction is invalid".  It should
never be "optimised" away.

> I'd think if one has say (and:TI (clobber:TI (const_int 0 [0])) (const_int 0
> [0])) one should be able to still simplify it into 0, etc.,

No.  That RTL has no meaning at all, you cannot use a clobber as a RHS!

> and what happens
> here is the same thing, the clobber value, whatever it is, doesn't influence
> in any way the whole expression value, therefore it is optimized away.
> If it remained there, sure, the instruction would fail recog_for_combine.

Yes.  And that is why it should never be removed!

[Bug debug/99830] [11 Regression] ICE: in lra_eliminate_regs_1, at lra-eliminations.c:659 with -O2 -fno-expensive-optimizations -fno-split-wide-types -g

2021-04-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99830

--- Comment #14 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #13)
> Seems the exact spot where the clobber is optimized away is e.g. when
> simplify_and_const_int_1 (SImode, (ashift:SI (subreg:SI (and:TI (clobber:TI
> (const_int 0 [0])) (const_int 255 [0xff])) 0) (const_int 16 [0x10])), 255);
> is called.
> It calls nonzero_bits, nonzero_bits sees VARYING << 16 and so returns
> 0x,
>   /* Turn off all bits in the constant that are known to already be zero.
>  Thus, if the AND isn't needed at all, we will have CONSTOP ==
> NONZERO_BITS
>  which is tested below.  */
> 
>   constop &= nonzero;
> 
>   /* If we don't have any bits left, return zero.  */
>   if (constop == 0)
> return const0_rtx;
> 
> So, are you suggesting that in all such spots we need to test side_effects_p
> and punt?

Yes, you need to do check side_effects_p *everywhere* you can potentially
remove a side effect.  This is not specific to combine, even.

> Note, simplify_and_const_int_1 already starts with:
>   if (GET_CODE (varop) == CLOBBER)
> return NULL_RTX;
> so it would need to use
>   if (side_effects_p (varop))
> return NULL_RTX;
> instead.

Yeah.  This no longer disallows a VOIDmode clobber, but we should not see
those here anyway.

You'll need the same change a few lines later, btw:

  varop = force_to_mode (varop, mode, constop, 0);

  /* If VAROP is a CLOBBER, we will fail so return it.  */
  if (GET_CODE (varop) == CLOBBER)
return varop;

(you only need that second one, even, force_to_mode immediately returns
its arg if it is a clobber).

[Bug tree-optimization/99927] [11 Regression] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Segher Boessenkool  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org

--- Comment #16 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #15)
> So ... the conclusion is?

The conclusion is I have a patch and I will commit it after testing it
successfully on enough targets.  This takes time.

I see I forgot to self-assign the bug.  Fixed.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #13 from Segher Boessenkool  ---
(In reply to luoxhu from comment #11)
> I noticed that you added the below optimization with commit
> a62436c0a505155fc8becac07a8c0abe2c265bfe. But it doesn't even handle this
> case, cse1 pass will call simplify_binary_operation_1, both op0 and op1 are
> REGs instead of AND operators, do you have a test case to cover that piece
> of code?

This worked at the time.  It broke some time ago in simple testcases,
triggered by the "don't combine hard registers" thing I did.  This is
PR98468.

[Bug middle-end/90323] powerpc should convert equivalent sequences to vec_sel()

2021-04-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90323

--- Comment #14 from Segher Boessenkool  ---
(In reply to luoxhu from comment #12)
> That code was called by combine pass but fail to match. 

> 
> pr newpat
> (set (reg:DI 125 [ l ])
> (xor:DI (and:DI (xor:DI (reg/v:DI 120 [ l ])
> (reg:DI 127))
> (const_int 267390975 [0xff00fff]))
> (reg/v:DI 120 [ l ])))

Note this is 0x0ff00fff, and this is not a valid mask for rlwimi.

[Bug target/97142] __builtin_fmod not optimized on POWER

2021-04-13 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97142

--- Comment #11 from Segher Boessenkool  ---
(In reply to luoxhu from comment #10)
> If not built with fast-math, gimple_has_side_effects will return true and
> cause the expand_call_stmt fail to expand the "_1 = fmod (x_2(D), y_3(D));"
> to internal function. X86 also produces "bl fmod" for O3 build.
>  
> 
> xlF expands the fmod to below ASM, no FMA generated?
> 
> 
> 1900 :
> 1900:   8c 03 01 10 vspltisw v0,1
> 1904:   00 00 24 c8 lfd f1,0(r4)
> 1908:   00 00 03 c8 lfd f0,0(r3)
> 190c:   e2 03 40 f0 xvcvsxwdp vs2,vs32
> 1910:   c0 09 62 f0 xsdivdp vs3,vs2,vs1
> 1914:   80 19 80 f0 xsmuldp vs4,vs0,vs3
> 1918:   64 21 a0 f0 xsrdpiz vs5,vs4
> 191c:   88 2d 01 f0 xsnmsubadp vs0,vs1,vs5
> 1920:   18 00 20 fc frspf1,f0
> 1924:   20 00 80 4e blr

xsnmsubadp is an FMA.  Multiply-subtract in this case, but that is just
a sign switch -- I often say FMA for all of fmadd, fnmadd, fnmsub, fmsub,
and their VSX counterparts.  "Anything that does a multiply-type operation
followed by an addition-type operation".  (And often call integer MADs
"FMA" as well, which is totally wrong, but :-) )

[Bug target/100085] Bad code for union transfer from __float128 to vector types

2021-04-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100085

--- Comment #3 from Segher Boessenkool  ---
The rotates in 6 and 7 are not merged, and neither are the vec_selects in
8 and 9.  Both should be pretty easy to do, there is no unspec in sight,
etc.

[Bug rtl-optimization/99927] Wrong code since r11-39-gf9e1ea10e657af9f

2021-04-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99927

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #18 from Segher Boessenkool  ---
Fixed for 11.  This still needs backports for 10 and everything before,
please don't close the bug.

[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option

2021-04-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108

--- Comment #4 from Segher Boessenkool  ---
(In reply to Andrew Pinski from comment #1)
> e500 support had been moved to the powerpcspe target; so assuming power9 for
> -misel is correct.
> 
> e500mc support is still there though.

There never *was* separate e500 support in GCC!

[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option

2021-04-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108

--- Comment #5 from Segher Boessenkool  ---
Created attachment 50629
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50629&action=edit
Proposed simpler patch

A simpler patch.  I'll commit this later today (if no one stops me).

[Bug target/100108] [10/11 Regression] powerpc: recognize 32-bit CPU as POWER9 with -misel option

2021-04-19 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100108

Segher Boessenkool  changed:

   What|Removed |Added

 Target|powerpc--netbsd |powerpc
   Last reconfirmed||2021-04-19
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |segher at gcc dot 
gnu.org
 Ever confirmed|0   |1

[Bug libgcc/98952] powerpc*: __trampoline_setup inverted test for trampoline size

2021-04-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98952

--- Comment #4 from Segher Boessenkool  ---
Fixed on trunk.  Needs backports to 11 and whatever else is still an open
branch when the backports are done :-)

[Bug target/97329] POWER9 default cache and line sizes appear to be wrong

2020-10-09 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329

--- Comment #8 from Segher Boessenkool  ---
The default -mcpu= for a compiler targeting powerpc64le-linux is
normally power8 (you can change this with the --with-cpu= configure
option though).  -mcpu=powerpc64le is also (currently) equal to
-mcpu=power8.  But the numbers for Power8 (in power8_cost) are
wrong it seems: it has a 64kB L1-D cache, and a 512kB L2 cache (it
looks like we have simply copied the Power7 numbers here; 32 and
256 is correct for Power7).

[Bug rtl-optimization/97249] Missing vec_select and subreg optimization

2020-10-12 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #5 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #3)
> Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if
> it was appropriately simplified (and simplify_rtx would handle this case).
> In any case the vec_select is the same as (subreg:V8QI (V16QI)).

This case for vec_select isn't yet handled in simplify-rtx.  It
looks like it does not yet handle any cases that do not use full
vector length?  (Or, in other words, it only handles cases where
all vectors are the same length.)

[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value

2020-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437

--- Comment #5 from Segher Boessenkool  ---
Trying 7 -> 9:
7: r97:SI=0x2a
9: {flags:CCC=cmp(r97:SI+r98:SI,r97:SI);r99:SI=r97:SI+r98:SI;}
  REG_DEAD r98:SI
  REG_DEAD r97:SI
Failed to match this instruction:
(parallel [
(set (reg:CC 17 flags)
(compare:CC (reg:SI 98 [ *b_12(D) ])
(const_int -42 [0xffd6])))
(set (reg:SI 99)
(plus:SI (reg:SI 98 [ *b_12(D) ])
(const_int 42 [0x2a])))
])

On rs6000 we have four special variants for the immediate add-with-carry
insn patterns: imm 0, imm -1, imm pos, imm neg.  All of these have
different canonical RTL.

[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value

2020-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437

--- Comment #6 from Segher Boessenkool  ---
I forgot to add: subtract immediate is the same as add immediate for us,
we don't change the sense of the carry bit to a "borrow bit" (and instead,
we have a subtract-from-immediate).  But this doesn't change much at all
to the situation here.

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #3 from Segher Boessenkool  ---
AFAICS the point is that this always compiles to just a handful of insns,
and the inliner should be able to see that (even if the source is biggish).

[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value

2020-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437

--- Comment #8 from Segher Boessenkool  ---
So is that something than can/should be improved in ix86_cc_mode?

[Bug target/97437] builtins subcarry and addcarry still not generate the right code. Not get optimized to immediate value

2020-10-15 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437

--- Comment #10 from Segher Boessenkool  ---
Not even an alternative SELECT_CC_MODE; just add an argument to it, giving
the original mode?  We already have that in combine, so we can trivially
pass it.  Will that work for x86 here?

[Bug bootstrap/94761] host != target

2020-10-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94761

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #2 from Segher Boessenkool  ---
All of the text of the report is missing, apparently?

[Bug bootstrap/94761] host != target

2020-10-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94761

--- Comment #3 from Segher Boessenkool  ---
Commit e69bf64be925 added the host and target flags originally, and it
seems to have been just a mistake that is used --build=${build_alias}
--host=${build_alias}.  (Now of course that has spread to many more
places.)

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #31 from Segher Boessenkool  ---
(In reply to Jan Hubicka from comment #27)
> It is because --param inline-insns-single was reduced for -O2 from 200
> to 70.  GCC 10 has newly different set of parameters for -O2 and -O3 and
> enables auto-inlining at -O2.
> 
> Problem with inlininig funtions declared inline is that C++ codebases
> tends to abuse this keyword for things that are really too large (and
> get_order would be such example if it did not have builtin_constant_p
> check which inliner does not understand well). So having same limit at
> -O2 and -O3 turned out to be problematic with respect to code size and
> especially with respect to LTO, where a lot more inlining oppurtunities
> appear.

Do the heuristics account for that not inlining a "static inline" results
in multiple copies?

> I will implement the heuristics to push up inline limits of functions
> having builtin_constant_p of parameter which should help a bit in this
> case

Thank you!

> (but not very systematically: as dicussed in the PR log it is quite
> hard problem to get builtin_constant_p right in the code size metrics
> used by inliner before it knows exactly what is going to be constant and
> what is not).

That is true for many other inlining things as well...  builtin_constant_p
is worse than most I guess ;-)

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #26 from Segher Boessenkool  ---
It isn't easy to do.  Feel free to try your hand at it :-)

[Bug tree-optimization/97360] [11 Regression] ICE in range_on_exit

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97360

--- Comment #35 from Segher Boessenkool  ---
Send it to gcc-patches@ please, with explanation and everything?

[Bug c/97445] Some fonctions marked static inline in Linux kernel are not inlined

2020-10-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

--- Comment #46 from Segher Boessenkool  ---
(In reply to Christophe Leroy from comment #43)
> int g(int x)
> {
>   return __builtin_clz(0);
> }
> 
> Gives
> 
> 0018 :
>   18: 38 60 00 20 li  r3,32
>   1c: 4e 80 00 20 blr

That is because rs6000 has

/* The cntlzw and cntlzd instructions return 32 and 64 for input of zero.  */
#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
  ((VALUE) = GET_MODE_BITSIZE (MODE), 2)

This says that at RTL level and in the optabs, clz of 0 *is* defined,
for rs6000.  But the builtin is not valid with an arg of 0!

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-21 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #29 from Segher Boessenkool  ---
Yup, and that is a more elegant way of writing this anyway.  But we
still do not handle the exact testcase code optimally ;-)

[Bug target/43892] PowerPC suboptimal "add with carry" optimization

2020-10-21 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

--- Comment #31 from Segher Boessenkool  ---
Performing a jump based on the carry bit is not something we can
easily do (there are no simple insns for it, and those sequences
that will do the trick are expensive).  But I'll look at that,
thanks for the hint!  At least in the __builtin_add_overflow case
most of it will be ootimised away :-)

[Bug libgcc/97543] powerpc64le: libgcc has unexpected long double in .gnu_attribute

2020-10-23 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97543

--- Comment #3 from Segher Boessenkool  ---
This part of the attribute (all but the low 2 bits) is not documented
in the as manual, btw.

[Bug rtl-optimization/97583] New: Unknown mode_attribute (or iterator) ignored

2020-10-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97583

Bug ID: 97583
   Summary: Unknown mode_attribute (or iterator) ignored
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

This leads to errors at compiler runtime instead of at compiler build time.
See https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556998.html .

Code from md_reader::apply_iterator_to_string :

  p = start + 1;

  *end = 0;
  v = map_attr_string (loc, p);
  *end = '>';
  if (v == 0)
continue;

It could report an error instead.

[Bug libgcc/97543] powerpc64le: libgcc has unexpected long double in .gnu_attribute

2020-10-26 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97543

--- Comment #9 from Segher Boessenkool  ---
Yes, that looks correct.

[Bug rtl-optimization/97676] New: "*" should skip a constraint, not just one char of it

2020-11-02 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97676

Bug ID: 97676
   Summary: "*" should skip a constraint, not just one char of it
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

See https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557759.html and
the thread leading up to it.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|INVALID |---
 Status|RESOLVED|REOPENED
   Last reconfirmed||2020-11-03
 Ever confirmed|0   |1

--- Comment #3 from Segher Boessenkool  ---
Yes, exactly.  GCC silently does the wrong thing, contradicting its
documentation.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-03 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|INVALID |---
 Status|RESOLVED|REOPENED

--- Comment #5 from Segher Boessenkool  ---
The only supported use for this feature is to specify registers
for input and output operands when calling Extended @code{asm} 
(@pxref{Extended Asm}).  This may be necessary if the constraints for a 
particular machine don't provide sufficient control to select the desired 
register.  To force an operand into a register, create a local variable 
and specify the register name after the variable's declaration.  Then use 
the local variable for the @code{asm} operand and specify any constraint 
letter that matches the register:


Stop marking this as invalid.  It is not.


"r" *is* valid.  And even if it was not, the compiler should just error,
not silently do the wrong thing!

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

--- Comment #19 from Segher Boessenkool  ---
Documenting that GCC behaves differently is just documenting a bug :-(

It should not be hard to detect this and give an error somewhere?

Saying "the user did something wrong" is true of course, but then
saying "so the compiler can do whatever" might be technically true,
but doesn't help the user, who would rather the compiler did not
silently do the opposite of what the user asked it to do!

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

--- Comment #21 from Segher Boessenkool  ---
register float foo asm ("xmm0") = 0.99f;

asm volatile("movl %0, %%r8d\n\t"
  "vmcall\n\t"
  :: "g" (foo));

The user said operands[0] should go in xmm0, but that hard reg is not
valid for its constraint.

"""
Then use the local variable for the asm operand and specify any constraint
letter that matches the register:
"""

Not following that rule, causing a reload, is the user error.  The reload
you get is diametrically opposite to what local register vars are *for*,
so it would be good if we could give an error.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-04 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

Segher Boessenkool  changed:

   What|Removed |Added

 Resolution|INVALID |FIXED

--- Comment #23 from Segher Boessenkool  ---
The user said that foo should be in xmm1 when used in an asm.  That is
what local register asm does, nothing more, nothing less.

Reloading it is never allowed.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-05 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

--- Comment #27 from Segher Boessenkool  ---
(In reply to Alexander Monakov from comment #24)
> Segher, did you really mean to mark the bug resolved/fixed?

No, if I did that, I have no idea how :-)

> Given that the only supported use of local register variables is passing
> operands to inline asm in specific registers, I really think that GCC
> shouldn't silently change the operand's location like that.

Yes, exactly.

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-05 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

--- Comment #28 from Segher Boessenkool  ---
(In reply to Jakub Jelinek from comment #25)
> Even if we wanted to do something about it (which I disagree with, e.g.
> given that the implementation matches the documentation), you run into the
> problem that even GIMPLE nor RTL differentiates between:
> void
> foo (void)
> {
>   register int a __asm ("eax") = 1;
>   __asm ("# %0 " : : "c" (a+0));
>   __asm ("# %0 " : : "c" (a));
> }
> And "c" (a+0) unquestionably must be valid, it is just an expression that
> happens to be equal to a value of local register variable.

The documentation says

"""
The only supported use for this feature is to specify registers
for input and output operands when calling Extended @code{asm}-
(@pxref{Extended Asm}).  This may be necessary if the constraints for a-
particular machine don't provide sufficient control to select the desired-
register.  To force an operand into a register, create a local variable-
and specify the register name after the variable's declaration.  Then use-
the local variable for the @code{asm} operand and specify any constraint-
letter that matches the register:

@smallexample
register int *p1 asm ("r0") = @dots{};
register int *p2 asm ("r1") = @dots{};
register int *result asm ("r0");
asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
@end smallexample
"""

Note the "use the local variable *for* the asm operand".  Not *in* the asm
operand.  We really do care about the identity here (for all asm operands),
not the value contained in the operand.

So (a+0) is not valid.  It is of course likely this will be optimised to
just (a) and might even work, but that is not guaranteed.

(The documentation here could be much improved, of course.)

[Bug inline-asm/97708] Inline asm does not use the local register asm specified with register ... asm() as input

2020-11-05 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97708

--- Comment #29 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #26)
> So it would need to be diagnosed in the FE (only), making a + 0 valid and
> a not.  Eh.

We do not *have* to diagnose anything, certainly not things that just
happen to work (if "a+0" is optimised to just "a", say).  But it would
be good if we could diagnose the obvious and certainly wrong cases we
do not do now -- like a register asm that does not match the operand
constraint!

[Bug rtl-optimization/97784] New: Expressions evaluated as long chain instead of as tree or the like

2020-11-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

Bug ID: 97784
   Summary: Expressions evaluated as long chain instead of as tree
or the like
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

When compiling something like

#define O +
long x4(long x, long a, long b, long c, long d) { return x O a O b O c O d; }

we end up with machine code like

add 3,3,4# 10   [c=4 l=4]  *adddi3/0
add 3,3,5# 11   [c=4 l=4]  *adddi3/0
add 3,3,6# 12   [c=4 l=4]  *adddi3/0
add 3,3,7# 18   [c=4 l=4]  *adddi3/0
blr  # 30   [c=4 l=4]  simple_return

Every of those "add" insns depends on the result of the previous one,
making this slower than necessary: it has the latency of 4 add insns in
series, while some can be done in parallel.


This problem is there on gimple level already:

  _1 = x_4(D) + a_5(D);
  _2 = _1 + b_6(D);
  _3 = _2 + c_7(D);
  _9 = _3 + d_8(D);
  return _9;


A very similar problem also happens as a result of RTL unrolling.

[Bug rtl-optimization/97784] Expressions evaluated as long chain instead of as tree or the like

2020-11-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #2 from Segher Boessenkool  ---
No, it is exactly the same with unsigned types :-(

Use  -Dlong="unsigned long"  or use  #define O ^  (as in my original test).
I forgot about this signed thing, but it has nothing to do with it (that
matters on gimple level, sure, but the problem exists in pure RTL as well).

[Bug target/97786] New: rs6000 isinf etc. are pretty horrible

2020-11-10 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97786

Bug ID: 97786
   Summary: rs6000 isinf etc. are pretty horrible
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

int isfinite(double x) { return __builtin_isfinite (x); }
int isinf(double x) { return __builtin_isinf (x); }
int isinf_sign(double x) { return __builtin_isinf_sign (x); }
int isnan(double x) { return __builtin_isnan (x); }
int isnormal(double x) { return __builtin_isnormal (x); }
int fpclassify(double x) { return __builtin_fpclassify (5, 6, 7, 8, 9, x); }

We can generate much better code for all these than the generic code
we use now.

[Bug target/97784] Expressions evaluated as long chain instead of as tree or the like

2020-11-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97784

--- Comment #6 from Segher Boessenkool  ---
(In reply to Richard Biener from comment #3)
> There is targetm.sched.reassociation_width which specifies how re-assocation
> should make such sequence "wide".

Ah cool, thank you :-)

> Andrew is correct that we don't do this
> for any types that are TYPE_OVERFLOW_UNDEFINED.

Yes; but I see the sub-optimal behaviour for unsigned, too.

> And powerpc has
> 
> static int
> rs6000_reassociation_width (unsigned int opc ATTRIBUTE_UNUSED,
> machine_mode mode)
> {
>   switch (rs6000_tune)
> {
> case PROCESSOR_POWER8:
> case PROCESSOR_POWER9:
> case PROCESSOR_POWER10:
>   if (DECIMAL_FLOAT_MODE_P (mode))
> return 1;
>   if (VECTOR_MODE_P (mode))
> return 4;
>   if (INTEGRAL_MODE_P (mode))
> return 1;

Yeah this last 1 is the problem :-)

> thus you get width 1 which means a linear chain (even if the user wrote
> a tree).

Yup.

> Note RTL doesn't do any such thing like re-assocation (I guess in principle
> scheduling could, and that's the only place where it would make sense
> on RTL).

RTL unrolling can, actually!  "Variable expansion" is its horrible name
(and it makes a lot of sense there: it allows breaking a bit linear chain
into pieces).

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-16 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

Segher Boessenkool  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #1 from Segher Boessenkool  ---
I cannot reproduce this?  Not with any -mcpu= either, or any -O option.

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #8 from Segher Boessenkool  ---
The fmadd;frsp sequence is correct for this source code.  It does double
rounding of the result (first to DP float, then to SP float), so using
just fmadds is only correct for -ffast-math or similar.

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

--- Comment #3 from Segher Boessenkool  ---
I can now reproduce it, with a compiler built yesterday (previous was a
few days older), and -O0.

Confirmed.

[Bug target/97847] [11 Regression] ICE in insert_insn_on_edge, at cfgrtl.c:1976

2020-11-18 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97847

--- Comment #4 from Segher Boessenkool  ---
This was caused (or exposed) by e3b3b59683c1:

commit e3b3b59683c1e7d31a9d313dd97394abebf644be
Author: Vladimir N. Makarov 
Date:   Fri Nov 13 12:45:59 2020 -0500

[PATCH] Implementation of asm goto outputs

[Bug target/97926] ICE in patch_jump_insn, at cfgrtl.c:1298

2020-11-20 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97926

--- Comment #1 from Segher Boessenkool  ---
Confirmed (needs -O0).

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #15 from Segher Boessenkool  ---
Why does that compiler default to -mcpu=power10?

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-24 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #16 from Segher Boessenkool  ---
Oh, it's a different testcase, in comment 6.  Yeah a new PR would
have been better ;-/

[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972

--- Comment #2 from Segher Boessenkool  ---
Confirmed.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #19 from Segher Boessenkool  ---
(In reply to Arseny Solokha from comment #17)
> (In reply to Segher Boessenkool from comment #16)
> > Oh, it's a different testcase, in comment 6.  Yeah a new PR would
> > have been better ;-/
> 
> Do you want me to reopen PR97963 and copy comment 14 there until it's not
> too late?

Nah, it already is too late...  Just keep it in mind for the future :-)

It is easy to join two PRs.  It is very hard / annoying to separate PRs;
it is much easier if separate bugs just start out separate, so don't
piggy-back it onto a PR that you think may have to do with it (you can
always point to the existing PR!)

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #20 from Segher Boessenkool  ---
(In reply to Peter Bergner from comment #18)
> So why don't we default to the Altivec ABI with -m32 on cpus that have
> Altivec and VSX units???

History.  I'm not sure all our ABIs are compatible with vectors enabled,
either.

Since always, you have needed to use -mabi=altivec on 32-bit.

[Bug rtl-optimization/97972] [9/10/11 Regression] ICE in moving_insn_creates_bookkeeping_block_p, at sel-sched.c:2031 since r9-2064-gc4c5ad1d6d1e1e1f

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97972

--- Comment #3 from Segher Boessenkool  ---
#0  moving_insn_creates_bookkeeping_block_p (through_insn=0x3fffb5b23138, 
insn=0x3fffb5b736c0) at /home/segher/src/gcc/gcc/sel-sched.c:2031

It crashes here because the insn is not in any BB; which is correct
actually, because the insn has been deleted!

It is deleted in sel-sched, and it was created there as well.  I don't
see anything wrong in the earlier debug dump; afaics this was just
expose by the 2-2 combine thing.

[Bug target/96791] ICE in convert_mode_scalar, at expr.c:412

2020-11-25 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96791

--- Comment #23 from Segher Boessenkool  ---
Changing the ABI (silently, even!) is never an expected thing.  All of the
four 32-bit ABIs we support have an AltiVec variant that isn't fully
compatible to the non-AltiVec base variant.  It would be a huge disservice
to the user to change the ABI from under his/her feet.

Anyway, patch in testing.

[Bug rtl-optimization/98179] New: gcc.dg/pr97954.c fails on (at least) BE powerpc

2020-12-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98179

Bug ID: 98179
   Summary: gcc.dg/pr97954.c fails on (at least) BE powerpc
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: segher at gcc dot gnu.org
  Target Milestone: ---

/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c: In function 'foo':
/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: error: too many
outgoing branch edges from bb 4
during RTL pass: loop2_invariant
/home/segher/src/gcc/gcc/testsuite/gcc.dg/pr97954.c:12:1: internal compiler
error: verify_flow_info failed
0x10435cb3 verify_flow_info()
/home/segher/src/gcc/gcc/cfghooks.c:269
0x10876cc7 checking_verify_flow_info
/home/segher/src/gcc/gcc/cfghooks.h:212
0x10876cc7 move_loop_invariants()
/home/segher/src/gcc/gcc/loop-invariant.c:2299
0x1087142f execute
/home/segher/src/gcc/gcc/loop-init.c:530

This happens because this passed moved insn 8 from bb 4 to 2:

(jump_insn 8 2 22 2 (parallel [
(set (reg:SI 118 [ x ])
(asm_operands:SI ("") ("=r") 0 []
 []
 [
(label_ref:DI 22)
] pr97954.c:10))
(clobber (reg:SI 98 ca))
]) "pr97954.c":10:3 -1
 (expr_list:REG_UNUSED (reg:SI 98 ca)
(nil))
 -> 22)

We shouldn't allow such a move at all (not of any jump_insn!)

[Bug rtl-optimization/98178] Combine splitter does not split to single instruction

2020-12-07 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98178

--- Comment #3 from Segher Boessenkool  ---
Yup, this is true in general, we almost never say why we don't combine so
far.  Patches welcome!  (Make sure you use TDF_DETAILS for such prints).

[Bug target/98020] PPC: mfvsrwz+extsw not merged to mtvsrwa

2020-12-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98020

Segher Boessenkool  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-12-08
 Ever confirmed|0   |1

--- Comment #1 from Segher Boessenkool  ---
mtvsrwa is the wrong way around, and mfvsrwa does not exist.  Am I missing
anything?

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-12-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

--- Comment #18 from Segher Boessenkool  ---
Why is it correct to convert the double x to single precision here?!

[Bug tree-optimization/22326] promotions (from float to double) are not removed when they should be able to

2020-12-11 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22326

--- Comment #20 from Segher Boessenkool  ---
Yes, that is clear...  But we have ***double*** x in that example even,
as the declared type of the parameter, so converting that to float is
almost certainly a bad idea?

[Bug target/97329] POWER9 default cache and line sizes appear to be wrong

2020-10-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329

Segher Boessenkool  changed:

   What|Removed |Added

   Last reconfirmed||2020-10-08
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||segher at gcc dot gnu.org

--- Comment #3 from Segher Boessenkool  ---
At least as far back as GCC 5 we report D-L1 size 64kB (for most CPUs,
not just p9).  Confirmed.

[Bug target/97329] POWER9 default cache and line sizes appear to be wrong

2020-10-08 Thread segher at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97329

--- Comment #5 from Segher Boessenkool  ---
So both the cache line size and the cache size are wrong for GCC 10
and before, but okay on trunk, on all compiler I tested (I tested on
Linux only so far).

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 916 matches

Mail list logo