from:"segher"

Re: rs6000: Generate an lxvp instead of two adjacent lxv instructions

2021-07-10 Thread segher

On Fri, Jul 09, 2021 at 06:14:49PM -0500, Peter Bergner wrote:
> Ok, I removed the consecutive_mem_locations() function from the previous
> patch and just call adjacent_mem_locations() directly now.  I also moved
> rs6000_split_multireg_move() to later in the file to fix the declaration
> issue.  However, since rs6000_split_multireg_move() is where the new code
> was added to emit the lxvp's, it can be hard to see what I changed because
> of the move.  I'll note that all of my changes are restrictd to within the
> 
>   if (GET_CODE (src) == UNSPEC)
> {
>   gcc_assert (XINT (src, 1) == UNSPEC_MMA_ASSEMBLE);
> ...
>   }
> 
> ...code section.  Does this look better?  I'm currently running bootstraps
> and regtests on LE and BE.

It is very hard to see the differences now.  Don't fold the changes into
one patch, just have the code movement in a separate trivial patch, and
then the actual changes as a separate patch?  That way it is much easier
to review :-)

> +   unsigned subreg =
> + (WORDS_BIG_ENDIAN) ? i : (nregs - reg_mode_nregs - i);

This is not new code, but it caught my eye, so just for the record: the
"=" should start a new line:
  unsigned subreg
= WORDS_BIG_ENDIAN ? i : (nregs - reg_mode_nregs - i);
(and don't put parens around random words please :-) ).


> +   int nvecs = XVECLEN (src, 0);
> +   for (int i = 0; i < nvecs; i++)
> + {
> +   rtx opnd;

Just "op" (and "op2") please?  If you use long names you might as well
just spell "operand" :-)

> +   if (WORDS_BIG_ENDIAN)
> + opnd = XVECEXP (src, 0, i);
> +   else
> + opnd = XVECEXP (src, 0, nvecs - i - 1);

Put this together with the case below as well?  Probably keep the
WORDS_BIG_ENDIAN test as the outer "if"?

> +   /* If we are loading an even VSX register and the memory location
> +  is adjacent to the next register's memory location (if any),
> +  then we can load them both with one LXVP instruction.  */
> +   if ((regno & 1) == 0)
> + {
> +   if (WORDS_BIG_ENDIAN)
> + {
> +   rtx opnd2 = XVECEXP (src, 0, i + 1);
> +   if (adjacent_mem_locations (opnd, opnd2) == opnd)
> + {
> +   opnd = adjust_address (opnd, OOmode, 0);
> +   /* Skip the next register, since we're going to
> +  load it together with this register.  */
> +   i++;
> + }
> + }
> +   else
> + {
> +   rtx opnd2 = XVECEXP (src, 0, nvecs - i - 2);
> +   if (adjacent_mem_locations (opnd2, opnd) == opnd2)
> + {
> +   opnd = adjust_address (opnd2, OOmode, 0);
> +   /* Skip the next register, since we're going to
> +  load it together with this register.  */
> +   i++;
> + }
> + }
> + }

I think it is fine now, but please factor the patch and repost.  Thanks!


Segher

Re: [PATCH 1/2] teach mklog to get name / email from git config when available

2014-04-29 Thread segher

> +# if this is a git tree then take name and email from the git configuration
> +if (-d .git) {
> +  $gitname = `git config user.name`;
> +  chomp($gitname);
> +  if ($gitname) {
> +   $name = $gitname;
> +  }
> +
> +  $gitaddr = `git config user.email`;
> +  chomp($gitaddr);
> +  if ($gitaddr) {
> +   $addr = $gitaddr;
> +  }
> +}

"-d .git" is, erm, not so great.

How about something like

sub get_git_config {
  my $res = `git config --get @_`;
  return undef if $?;
  chomp $res;
  return $res;
}


Segher

Re: [PATCH, rs6000] Improve TImode add/sub

2014-04-08 Thread segher

Hello,

On Tue, Apr 08, 2014 at 07:06:58PM -0500, Pat Haugen wrote:
> The following patch improves the code generated for TImode add/sub so 
> that we now generate a 2 insn sequence which makes use of the carry bit. 

> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */

Please leave out the default arguments.  Why does this need skipping
on Darwin?

> +;; Define the TImode operations that can be done in a small number
> +;; of instructions.  The & constraints are to prevent the register
> +;; allocator from allocating registers that overlap with the inputs
> +;; (for example, having an input in 7,8 and an output in 6,7).  We
> +;; also allow for the output being the same as one of the inputs.
> +
> +(define_insn "addti3"
> +  [(set (match_operand:TI 0 "gpc_reg_operand" "=&r,&r,r,r")
> + (plus:TI (match_operand:TI 1 "gpc_reg_operand" "%r,r,0,0")
> +  (match_operand:TI 2 "reg_or_short_operand" "r,I,r,I")))]
> +  "TARGET_POWERPC64"

That's not the correct condition: the carry bit is set based on the 32-bit
carry in 32-bit mode, so the condition has to be TARGET_64BIT.

The adddi3 pattern has !TARGET_POWERPC64 since a 64-bit addition can
be done without addc on a 64-bit machine, no matter what mode the CPU
is in.

> +  "*
> +{

Might as well leave out this stuff on new code, just use the braces :-)


Segher

Re: [PATCH] PR60822 (m68k, missing earlyclobber in extendplussidi)

2014-04-16 Thread segher

On Wed, Apr 16, 2014 at 02:45:28PM -0600, Jeff Law wrote:
> Isn't the problem that operands 1 is a MEM which use the same register 
> as operands 3 in the memory address?

Yes, exactly.

> ISTM either removing the memory constraint entirely, or splitting it off 
> into a separate alternative and only earlyclobbering that alternative 
> would be better.
> 
> Or am I missing something?

No, that does seem better :-)

I tried both your suggestions; the first results in better code.  Here's
a new patch.  As before, it builds and fixes the testcase, but I didn't
run the testsuite (I have no emulator set up).

Thanks,


Segher



gcc/

PR target/60822
2014-04-16  Segher Boessenkool  

* config/m68k/m68k.md (extendplussidi): Don't allow memory for
operand 1.

---
 gcc/config/m68k/m68k.md | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index e61048b..57ba1a1 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -1868,9 +1868,12 @@ (define_insn "extendsidi2"
 ;; Maybe there is a way to make that the general case, by forcing the
 ;; result of the SI tree to be in the lower register of the DI target
 
+;; Don't allow two memory operands: it needs an earlyclobber and will
+;; result in worse code.
+
 (define_insn "extendplussidi"
   [(set (match_operand:DI 0 "register_operand" "=d")
-(sign_extend:DI (plus:SI (match_operand:SI 1 "general_operand" "%rmn")
+(sign_extend:DI (plus:SI (match_operand:SI 1 "nonmemory_operand" "%rn")
 (match_operand:SI 2 "general_operand" "rmn"]
   ""
 {
-- 
1.8.1.4

Re: [PATCH] Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL

2023-09-30 Thread Segher Boessenkool

On Fri, Sep 29, 2023 at 02:09:12PM -0400, Michael Meissner wrote:
>   * config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete.
>   (copysign3_fcpsg): Use copysign RTL instead of UNSPEC.

(typo, it is _fcpsgn)

Nice to see unnecessary unspecs going away :-)


Segher

Re: [PATCH] Make sure rs6000-modes.h is installed in plugin/include/config/rs6000/ subdir

2018-07-02 Thread Segher Boessenkool

Hi Jakub,

On Fri, Jun 29, 2018 at 12:52:59AM +0200, Jakub Jelinek wrote:
> The newly added rs6000-modes.h is now included from rs6000.h, so it is
> needed when building plugins that include tm.h, but it wasn't listed in the
> Makefile fragments and therefore included among PLUGIN_HEADERS.
> 
> Fixed thusly, tested on powerpc64le-linux, ok for trunk and 8.2?

Yes please.  Thanks!


Segher


> 2018-06-28  Jakub Jelinek  
> 
>   * config/rs6000/t-rs6000: Append rs6000-modes.h to TM_H.

Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-07-02 Thread Segher Boessenkool

Hi!

On Fri, Jun 29, 2018 at 07:38:39AM -0700, Carl Love wrote:
> +;; Unpack high elements of float vector to vector of doubles
> +(define_expand "altivec_unpackh_v4sf"
> +  [(set (match_operand:V2DF 0 "register_operand" "=v")
> +(match_operand:V4SF 1 "register_operand" "v"))]
> +  "TARGET_VSX"
> +{
> +  emit_insn (gen_doublehv4sf2 (operands[0], operands[1]));
> +  DONE;
> +}
> +  [(set_attr "type" "veccomplex")])

I wondered if these mactually work for all VSX registers, not just the VMX
registers (i.e. "wa" or similar instead of "v").  But constraints in
define_expand are meaningless anyway; just leave them out please.

Does it help to define these altivec_unpackh_v4sf, when all it does is
expand as doublehv4sf2?  Can't you more easily put the latter in the tables?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -0,0 +1,257 @@
> +/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-mpower8-vector -maltivec" } */

This needs p8vector_ok then.  Is that correct?  What requires p8?
Is VSX (p7) enough for everything here?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -0,0 +1,94 @@
> +/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-mpower8-vector -mvsx" } */

Same here: required target does not match options.

Rest looks fine.

Segher

[PATCH] rs6000: Set up ieee128_float_type_node correctly (PR86285)

2018-07-02 Thread Segher Boessenkool

[ I wonder if I ever sent this patch to the list.  I now backported it
  to 8 branch as well.  Apologies. ]

We shouldn't init __ieee128 to be the same as long double if the
latter is not even a 128-bit type.

This also reorders the nearby __ibm128 code so both types use similar
logic.

Committing to trunk.  This should also go to 8.


Segher


2018-06-26  Segher Boessenkool  

PR target/86285
* config/rs6000/rs6000.c (rs6000_init_builtins): Do not set
ieee128_float_type_node to long_double_type_node unless
TARGET_LONG_DOUBLE_128 is set.

---
 gcc/config/rs6000/rs6000.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index cb228ad..5b70521 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16414,21 +16414,24 @@ rs6000_init_builtins (void)
  __ieee128.  */
   if (TARGET_FLOAT128_TYPE)
 {
-  if (TARGET_IEEEQUAD || !TARGET_LONG_DOUBLE_128)
+  if (!TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
+   ibm128_float_type_node = long_double_type_node;
+  else
{
  ibm128_float_type_node = make_node (REAL_TYPE);
  TYPE_PRECISION (ibm128_float_type_node) = 128;
  SET_TYPE_MODE (ibm128_float_type_node, IFmode);
  layout_type (ibm128_float_type_node);
}
-  else
-   ibm128_float_type_node = long_double_type_node;
 
   lang_hooks.types.register_builtin_type (ibm128_float_type_node,
  "__ibm128");
 
-  ieee128_float_type_node
-   = TARGET_IEEEQUAD ? long_double_type_node : float128_type_node;
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
+   ieee128_float_type_node = long_double_type_node;
+  else
+   ieee128_float_type_node = float128_type_node;
+
   lang_hooks.types.register_builtin_type (ieee128_float_type_node,
  "__ieee128");
 }
-- 
1.8.3.1

Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]

2018-07-03 Thread Segher Boessenkool

On Tue, Jul 03, 2018 at 10:53:20AM +0200, Martin Liška wrote:
> On 06/29/2018 09:04 PM, Jeff Law wrote:
> > I think this is fine for the trunk.
> > 
> > jeff
> 
> Thank you Jeff.
> 
> I found some issues when doing build of all targets (contrib/config-list.mk).
> I'll update patch and test that affected cross-compilers still produce same 
> output.
> 
> However I noticed one ppc64 issue:
> 
> $ cat -n gcc/config/powerpcspe/powerpcspe.c
> 
>   5401/* Set branch target alignment, if not optimizing for size.  */
>   5402if (!optimize_size)
>   5403  {
>   5404/* Cell wants to be aligned 8byte for dual issue.  Titan 
> wants to be
>   5405   aligned 8byte to avoid misprediction by the branch 
> predictor.  */
>   5406if (rs6000_cpu == PROCESSOR_TITAN
>   5407|| rs6000_cpu == PROCESSOR_CELL)
>   5408  {
>   5409if (align_functions <= 0)
>   5410  align_functions = 8;
>   5411if (align_jumps <= 0)
>   5412  align_jumps = 8;
>   5413if (align_loops <= 0)
>   5414  align_loops = 8;
>   5415  }
>   5416if (rs6000_align_branch_targets)
>   5417  {
>   5418if (align_functions <= 0)
>   5419  align_functions = 16;
>   5420if (align_jumps <= 0)
>   5421  align_jumps = 16;
>   5422if (align_loops <= 0)
>   5423  {
>   5424can_override_loop_align = 1;
>   5425align_loops = 16;
>   5426  }
>   5427  }
>   5428if (align_jumps_max_skip <= 0)
>   5429  align_jumps_max_skip = 15;
>   5430if (align_loops_max_skip <= 0)
>   5431  align_loops_max_skip = 15;
> 
> Note that at line 5429 there's set of align_jumps_max_skip to 15 if not set 
> by default.
> At line 5412 align_jumps is set to 8, and align_jumps_max_skip should be 
> equal align_jumps - 1.
> That's a discrepancy. Segher can you please take a look?

This is powerpcspe, that's not mine.

But rs6000 has the same code, sure.  Why do you say "align_jumps_max_skip
should be equal align_jumps - 1"?  If that were true, why does it exist
at all?

toplev.c already has (in init_alignments):

  if (align_jumps_max_skip > align_jumps)
align_jumps_max_skip = align_jumps - 1;

so why would targets duplicate that logic?  (The target override is called
before init_alignments).


Segher

Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]

2018-07-03 Thread Segher Boessenkool

On Tue, Jul 03, 2018 at 12:15:48PM +0200, Martin Liška wrote:
> > toplev.c already has (in init_alignments):
> > 
> >   if (align_jumps_max_skip > align_jumps)
> > align_jumps_max_skip = align_jumps - 1;
> 
> I'm rewriting this logic in the patch set. Issue is that 
> checking for value of align_jumps_max_skip is done
> in rs6000_option_override_internal, which is place before
> align_jumps_max_skip is parsed.
> 
> That said, 'align_jumps_max_skip <= 0' is always true.

It's not clear to me what you want me to do.

You should write your patch so that the end result behaves the same as
before, on all targets.  If that requires changing (or at least checking)
all targets, then you have a lot of work to do.

If you think the rs6000 backend is doing something wrong, please say
what exactly?  I don't see it.

Still confused,

Segher

[PATCH] Remove powerpc-linux_paired from config-list.mk

2018-07-03 Thread Segher Boessenkool

The target has been removed, so we shouldn't try to build it.


Segher


2018-07-03  Segher Boessenkool  

* contrib/config-list.mk: Remove powerpc-linux_paired.

---
 contrib/config-list.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index d04aca2..c3537d2 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -75,7 +75,7 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   powerpc-eabispe powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
   powerpc-rtems powerpc-linux_spe \
-  powerpc-linux_paired powerpc64-linux_altivec \
+  powerpc64-linux_altivec \
   powerpc-wrs-vxworks powerpc-wrs-vxworksae powerpc-wrs-vxworksmils \
   powerpc-lynxos powerpcle-elf \
   powerpcle-eabisim powerpcle-eabi \
-- 
1.8.3.1

Re: [PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]

2018-07-03 Thread Segher Boessenkool

On Tue, Jul 03, 2018 at 02:51:27PM +0200, Martin Liška wrote:
> On 07/03/2018 12:58 PM, Segher Boessenkool wrote:
> > On Tue, Jul 03, 2018 at 12:15:48PM +0200, Martin Liška wrote:
> >>> toplev.c already has (in init_alignments):
> >>>
> >>>   if (align_jumps_max_skip > align_jumps)
> >>> align_jumps_max_skip = align_jumps - 1;
> >>
> >> I'm rewriting this logic in the patch set. Issue is that 
> >> checking for value of align_jumps_max_skip is done
> >> in rs6000_option_override_internal, which is place before
> >> align_jumps_max_skip is parsed.
> >>
> >> That said, 'align_jumps_max_skip <= 0' is always true.
> > 
> > It's not clear to me what you want me to do.
> > 
> > You should write your patch so that the end result behaves the same as
> > before, on all targets.  If that requires changing (or at least checking)
> > all targets, then you have a lot of work to do.
> > 
> > If you think the rs6000 backend is doing something wrong, please say
> > what exactly?  I don't see it.
> 
> Uf, it's quite complicated I would say.
> So first I believe for all -falign-{labels,loops,jumps} we don't handle 
> properly
> value of the argument. More precisely for a value of N (not power of 2),
> we don't respect max_skip and we generate alignment to M, where M is first 
> bigger
> power of 2 number. Example:
> 
> $ gcc 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 
> -falign-labels=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
>   1   .align 32
> 132   .p2align 11
>   7   .p2align 4,,15
> 
> 2^11 == 2048, but I would expect '.p2align 11,,1024' to be generated. That's 
> what you get for function alignment:
> 
> $ gcc 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 
> -falign-functions=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
>   1   .align 32
>   7   .p2align 11,,1024
>  55   .p2align 3
>  48   .p2align 4,,10
> 
> Do I understand that correctly that it's broken?

Yes, this behaviour contradicts our documentation:

'-falign-labels=N'
 Align all branch targets to a power-of-two boundary, skipping up to
 N bytes like '-falign-functions'.

'-falign-functions=N'
 Align the start of functions to the next power-of-two greater than
 N, skipping up to N bytes.  For instance, '-falign-functions=32'
 aligns functions to the next 32-byte boundary, but
 '-falign-functions=24' aligns to the next 32-byte boundary only if
 this can be done by skipping 23 bytes or less.

> On powerpc, because align_jumps_max_skip is set to 15, then we see 
> inconsistency like:
> 
> ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 
> -falign-jumps=14 -c -S  -o /dev/stdout | grep align | sort | uniq -c
> ...
>  27   .p2align 4,,13
> ...
> 
> which is correct.
> 
> but:
> 
> ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/params/blocksort-part.c -O2 
> -falign-jumps=1025 -c -S  -o /dev/stdout | grep align | sort | uniq -c
> ...
>  27   .p2align 11,,15
> ...
> 
> Here 11,,15 is completely broken value.

Yup.

This is specific to align-jumps...  Not many people ever change that :-)


Segher

Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-07-05 Thread Segher Boessenkool

Hi Carl,

On Tue, Jul 03, 2018 at 02:36:22PM -0700, Carl Love wrote:
> Please let me know if the patch looks OK for GCC mainline. The patch
> also needs to be backported to GCC 8.

Looks great, thanks!  Okay for trunk, and also for 8.


Segher


> 2018-07-03  Carl Love  
> 
>   * config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
>   float argument to VSX_BUILTIN_DOUBLEH_V4SF.
>   Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
>   VSX_BUILTIN_DOUBLEL_V4SF.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-03  Carl Love  
>   * gcc.target/altivec-1-runnable.c: New test file.
>   * gcc.target/altivec-2-runnable.c: New test file.
>   * gcc.target/vsx-7.c (main2): Change expected expected instruction
>   for tests.

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-06 Thread Segher Boessenkool

On Fri, Jul 06, 2018 at 01:51:37AM -0400, Michael Meissner wrote:
>  case "$target:$with_long_double_format" in

> -  xpowerpc64*-*-linux*:*)

So this case could never happen.  The changelog should mention it fixes
that bug (and having it as a separate patch is much preferred!)

Other than this thing, the original code was easier to read.  What does
this part of the patch improve?

Segher

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-06 Thread Segher Boessenkool

On Fri, Jul 06, 2018 at 09:38:02AM -0400, Michael Meissner wrote:
> On Fri, Jul 06, 2018 at 06:38:55AM -0500, Segher Boessenkool wrote:
> > On Fri, Jul 06, 2018 at 01:51:37AM -0400, Michael Meissner wrote:
> > >  case "$target:$with_long_double_format" in
> > 
> > > -  xpowerpc64*-*-linux*:*)
> > 
> > So this case could never happen.  The changelog should mention it fixes
> > that bug (and having it as a separate patch is much preferred!)
> 
> I assume what happened is I accidently added the 'x' to the working copy after
> submitting the patch, but before committing it and I didn't notice it.  Since
> it is in configuration support, it isn't part of the test sutie, and it wasn't
> noticed.
> 
> I can add a line to the ChangeLog if desired.
> 
> > Other than this thing, the original code was easier to read.  What does
> > this part of the patch improve?
> 
> You complained that you were getting errors when using the system glibc (based
> on 2.27 on an Ubuntu system) and using --with-long-double-format=ieee (where 
> it
> would die in the middle of building libstdc++-v3).

No, _this part_ of the patch.  The part that doesn't change behaviour (but
see above):

* configure.ac (powerpc64*-*-linux*): Combine big and little
endian checks for the long double format.

I asked Tulio to comment on if this is a good way to detect usable glibc
versions for default ieee long double.  I haven't reviewed that part of
your patch yet.


Segher

Re: [PATCH], Add configuration checks to PowerPC --with-long-double-format=ieee

2018-07-06 Thread Segher Boessenkool

On Fri, Jul 06, 2018 at 09:46:36AM -0400, Michael Meissner wrote:
> On Fri, Jul 06, 2018 at 10:16:34AM -0300, Tulio Magno Quites Machado Filho 
> wrote:
> > I suggest to test with the following program:
> > 
> > #include 
> > 
> > int
> > main ()
> > {
> >   return !isinfl(__builtin_infl());
> > }
> > 
> > Build it with:
> > gcc -mabi=ieeelongdouble -fno-builtin -Wno-psabi -lm test-ldbl.c
> > 
> > If the execution of the program returns 0, your math library supports IEEE 
> > long
> > double.
> 
> Thanks, but I suspect that it won't work for building cross compilers or for
> building where the compiler built uses the Advance Toolchain libraries and
> shared library loader instead of the system versions using the configuration
> option --with-advance-toolchain=atx.y.
> 
> The issue is you need to test whether the target GLIBC has the support when
> configuring the compiler, but if you are building for a cross target, you 
> can't
> run the resulting binary.  Even on a native system, with options like
> --with-advance-toolchain and --with-sysroot, the libraries used by the host
> compiler used to build stage1 of GCC might be different from the libraries 
> used
> to build the target compiler (or stage2/stage3 in a bootstrap native build).
> 
> So I used the GLIBC version tests that were already part of the GCC
> configuration.
> 
> If there is a simple method that works for cross compilers or where a 
> specified
> sysroot is used, it would be simpler than having version checks.

Version checks are terrible.  This is nothing new.

For cross builds you can just assume it works.  That should work fine here.
For cross builds the burden of having compatible versions of everything is
on the user anyhow (I'm not saying this is a good thing, but this is the
status quo).

Maybe some test could be done that mimics what happens 10m into a build
(the build failure)?  But Tulio's test should be okay for most cases, too.


Segher

Re: [PATCH,rs6000] Backport of stxvl instruction fix to GCC 7

2018-07-10 Thread Segher Boessenkool

On Mon, Jul 09, 2018 at 04:50:03PM -0700, Carl Love wrote:
> The following patch is a back port for a commit to mainline prior to
> GCC 8 release.  Note, the code fixed by this patch was later modified
> in commit 256798 as part of adding vec_xst_len support.  The sldi
> instruction gets replaced by an ashift of the operand for the stxvl
> instruction.  Commit 256798 adds additional functionality and does not
> fix any functional issues.  Hence it is not being back ported, just the
> original bug fix given below.
> 
> The patch has been tested on 
> 
> powerpc64le-unknown-linux-gnu (Power 8 LE)  
> 
> With no regressions.
> 
> Please let me know if the patch looks OK for GCC 7.

This is fine.  Thanks!


Segher


>   Backport from mainline
>   2017-09-07  Carl Love  
> 
>   * config/rs6000/vsx.md (define_insn "*stxvl"): Add missing argument to
>   the sldi instruction.

Re: [PATCH, rs6000] gimple folding support for vec_pack and vec_unpack

2018-07-11 Thread Segher Boessenkool

Hi!

On Mon, Jul 09, 2018 at 02:08:37PM -0500, Will Schmidt wrote:
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin):
>   Add support for gimple-folding of vec_pack() and vec_unpack()
>   intrinsics.

> +case ALTIVEC_BUILTIN_VUPKHPX:
> +case ALTIVEC_BUILTIN_VUPKLPX:
> +  {
> +   return false;
> +  }

A block around a signle statement looks a bit silly (and in the other
cases in your patch it isn't necessary either; it is nice if you use it
to give some scope to a local var, but you don't have that here).

But, patch is fine as far as I can see :-)

Segher

Re: [PATCH, rs6000] Testcase adds for vec_unpack

2018-07-11 Thread Segher Boessenkool

Hi Will,


On Mon, Jul 09, 2018 at 02:08:49PM -0500, Will Schmidt wrote:
>   * gcc.target/powerpc/fold-vec-unpack-char.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-float.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-int.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-pixel.c: New.
>   * gcc.target/powerpc/fold-vec-unpack-short.c: New.

This looks fine.  Okay for trunk.  Thanks!


Segher

Re: [PATCH, rs6000] Add support for gimple folding vec_perm()

2018-07-11 Thread Segher Boessenkool

On Mon, Jul 09, 2018 at 02:08:55PM -0500, Will Schmidt wrote:
>Add support for early gimple folding of vec_perm.   Testcases are already 
> in-tree as
> gcc.target/powerpc/fold-vec-perm-*.c
> 
> OK for trunk?

Looks fine to me.  Okay if no one else complains :-)


Segher


>   * gcc/config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support
>   for folding vec_perm.

Re: [PATCH, rs6000 v4] enable gimple folding for vec_xl, vec_xst

2018-07-12 Thread Segher Boessenkool

Hi!

On Tue, Jul 10, 2018 at 12:10:51PM -0500, Will Schmidt wrote:
> Add support for Gimple folding for unaligned vector loads and stores.

This is fine if the experts agree.  Thanks!  And thanks to the reviewers,
too.

One detail:

>   * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
>   vec_xst variants to the list.
>   (rs6000_gimple_fold_builtin): Add support for folding unaligned
>   vector loads and stores.

> +case VSX_BUILTIN_STXVW4X_V4SI:
> +case VSX_BUILTIN_STXVD2X_V2DF:
> +case VSX_BUILTIN_STXVD2X_V2DI:
> +  {
> +  arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */

Indent here is wrong.


Segher

Re: [PATCH, rs6000] Add missing logical-op interfaces to emmintrin.h

2018-07-13 Thread Segher Boessenkool

Hi!

On Wed, Jul 11, 2018 at 12:26:24PM -0500, Bill Schmidt wrote:
> It was recently brought to our attention that the existing emmintrin.h
> header, which was believed to be feature-complete for SSE2 support, is
> actually missing four logical-op interfaces:
> 
>  _mm_and_si128
>  _mm_andnot_si128
>  _mm_or_si128
>  _mm_xor_si128
> 
> This patch provides those with the obvious implementations, along with
> test cases.  I've bootstrapped it on powerpc64le-linux-gnu (P8, P9)
> and powerpc64-linux-gnu (P7, P8) and tested it with no regressions.
> Is this okay for trunk?
> 
> Although this isn't a regression, it is an oversight that leaves the
> SSE2 support incomplete.  Thus I'd like to ask permission to also
> backport this to gcc-8-branch after a short waiting period.  It's
> passed regstrap on P8 and P9 LE, and P7/P8 BE testing is underway.
> Is that backport okay if testing succeeds?
> 
> [BTW, I'm shepherding this patch on behalf of Steve Munroe.]

This looks fine.  Okay for trunk.  Also okay for 8 (as we discussed, you
probably should check if 8 hasn't diverged from trunk here; it shouldn't
have).

Thanks to both of you,


Segher

Re: [PATCH, rs6000] Alphabetize prototypes of AltiVec built-in functions in extend.texi

2018-07-13 Thread Segher Boessenkool

On Tue, Jul 10, 2018 at 06:13:50PM -0500, Kelvin Nilsen wrote:
> This patch alphabetizes the list of AltiVec built-in function prototypes that 
> consume about 15 pages of the gcc.pdf file.  As part of the alphabetization 
> effort, certain functions that should not be documented in this section of 
> the manual are separated from the others and moved to the end of the section 
> with comments to explain their role.
> 
> This patch prepares the way for future patches that will remove certain 
> prototypes from this section and will insert certain prototypes that are 
> currently missing from this section.  It also improves readability and 
> maintainability of the section.

I don't think having these thing alphabetical is a good idea; there are
much better orderings possible.  But it is a step towards sorting out the
mess that is the current documentation, so okay for trunk!

Thanks for working on this.


Segher


>   * doc/extend.texi (PowerPC AltiVec Built-in Functions):
>   Alphabetize prototypes of built-in functions, separating out
>   built-in functions that are listed in this section but should be
>   described elsewhere.

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread Segher Boessenkool

On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote:
> On AIX it would be calling divtc3, but AIX defaults to 64 bit long
> double.  Either all of these tests need
> 
> /* { dg-require-effective-target longdouble128 } */
> 
> or
> 
> /* { dg-additional-options "-mlong-double-128" { target powerpc-ibm-aix* } } 
> */
> 
> along with testing for "tc", e.g., bl .__divtc3

Which would you prefer David?  (I'd do the former).


Segher

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-16 Thread Segher Boessenkool

On Fri, Jul 13, 2018 at 04:15:26PM -0700, Carl Love wrote:
> Segher, David:
> 
> I reworked the patch per the first option that David gave.  The tests
> divkc3-2.c, divkc3-3.c, mulkc3-2.c and mulkc3-3.c pass on Power 9 Linux
> as they did before.  The tests are unsupported on Power8 Linux as they
> were before.  Now, the tests are reported as unsupported on AIX rather
> then failing on AIX.
> 
> Please let me know if you both approve the updated patch below.  Thanks
> for the input and help on this.

You need only one approval ;-)

(The patch is fine.  Thanks!)


Segher



> 2018-07-13  Carl Love  
> 
>   * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
>   longdouble128.
>   * gcc.target/powerpc/divkc3-3.c: Ditto.
>   * gcc.target/powerpc/mulkc3-2.c: Ditto.
>   * gcc.target/powerpc/mulkc3-3.c: Ditto.
>   * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
>   * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.

[PATCH 0/6] rs6000: Test all rs6000 floating point conversions

2018-07-16 Thread Segher Boessenkool

This series adds new codegen tests for converting any of our seven
floating point modes to any of those seven.  It also fixes a bunch of
bugs so that these testcases pass.

Tested on powerpc64-linux {-m32,-m64} (a power7); on powerpc64le-linux,
both on power8 and on power9; and on AIX.

Also tested the new testcases with options
{-mlong-double-64,-mlong-double-128}
{-mabi=ibmlongdouble,-mabi=ieeelongdouble}
{-mcpu=power4,-mcpu=970,-mcpu=power6,-mcpu=power7,-mcpu=power9}
{-mabi=elfv1/-mbig,-mabi=elfv2/-mlittle}
{-m32,-m64}
(but not -mabi=ieeelongdouble before power7 because that is not
supported).

Committing to trunk.


Segher


Segher Boessenkool (6):
  rs6000: Use more correct names for some trunc/extend libcalls
  rs6000: Use correct names for some trunc/extend libcalls
  rs6000: Improve truncifsf2
  rs6000: Fix testsuite bug in check_ppc_float128_hw_available
  rs6000: New testsuite selectors
  rs6000: New testcase fp-convert.c

 gcc/config/rs6000/rs6000.c| 24 +++---
 gcc/config/rs6000/rs6000.md   | 23 ++
 gcc/testsuite/gcc.target/powerpc/convert-fp-128.c | 99 +++
 gcc/testsuite/gcc.target/powerpc/convert-fp-64.c  | 61 ++
 gcc/testsuite/lib/target-supports.exp | 41 +-
 5 files changed, 217 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/convert-fp-128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/convert-fp-64.c

-- 
1.8.3.1

[PATCH 4/6] rs6000: Fix testsuite bug in check_ppc_float128_hw_available

2018-07-16 Thread Segher Boessenkool

The test program for ppc_float128_hw_available would always return
false, since there is a syntax error in that test program.


2018-07-16  Segher Boessenkool  

gcc/testsuite/
* lib/target-supports.exp (check_ppc_float128_hw_available): Fix
syntax error.

---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 99613fd..ec4a35d 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2182,7 +2182,7 @@ proc check_ppc_float128_hw_available { } {
__float128 w = -1.0q;
 
__asm__ ("xsaddqp %0,%1,%2" : "+v" (w) : "v" (x), "v" (y));
-   return ((z != 3.0q) || (z != w);
+   return ((z != 3.0q) || (z != w));
}
} $options
}
-- 
1.8.3.1

[PATCH 2/6] rs6000: Use correct names for some trunc/extend libcalls

2018-07-16 Thread Segher Boessenkool

The libcalls for trunc and extend of a decimal float to a binary float,
and vice versa, do not have "2" in the name, although all other such
conversions do.


2018-07-16  Segher Boessenkool  

* config/rs6000/rs6000.c (init_float128_ibm): Use the correct names
for conversions between IFmode and the decimal floating point modes.
(init_float128_ieee): Use the correct names for conversions between
KFmode and the decimal floating point modes.

---
 gcc/config/rs6000/rs6000.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f95aa59..62b8ea3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -17847,12 +17847,12 @@ init_float128_ibm (machine_mode mode)
  names.  */
   if (mode == IFmode)
 {
-  set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdtf2");
-  set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddtf2");
-  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdtf2");
-  set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunctfsd2");
-  set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunctfdd2");
-  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendtftd2");
+  set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdtf");
+  set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddtf");
+  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdtf");
+  set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunctfsd");
+  set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunctfdd");
+  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendtftd");
 
   if (TARGET_POWERPC64)
{
@@ -17951,12 +17951,12 @@ init_float128_ieee (machine_mode mode)
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
 
-  set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdkf2");
-  set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddkf2");
-  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdkf2");
-  set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunckfsd2");
-  set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunckfdd2");
-  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendkftd2");
+  set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdkf");
+  set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddkf");
+  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdkf");
+  set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunckfsd");
+  set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunckfdd");
+  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendkftd");
 
   set_conv_libfunc (sfix_optab, SImode, mode, "__fixkfsi");
   set_conv_libfunc (ufix_optab, SImode, mode, "__fixunskfsi");
-- 
1.8.3.1

[PATCH 3/6] rs6000: Improve truncifsf2

2018-07-16 Thread Segher Boessenkool

The current implementation leaves an unnecessary register move.  It is
easier to just expand things in the expander already.  This patch does
that.


2018-07-16  Segher Boessenkool  

* config/rs6000/rs6000.md (truncsf2): Expand truncates of
double-double modes to SFmode directly directly.
(truncsf2_fprs): Delete.

---
 gcc/config/rs6000/rs6000.md | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44d32d9..94a0f7d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7702,28 +7702,15 @@ (define_expand "truncsf2"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
-  else if (mode == TFmode)
-emit_insn (gen_trunctfsf2_fprs (operands[0], operands[1]));
-  else if (mode == IFmode)
-emit_insn (gen_truncifsf2_fprs (operands[0], operands[1]));
   else
-gcc_unreachable ();
+{
+  rtx tmp = gen_reg_rtx (DFmode);
+  emit_insn (gen_truncdf2 (tmp, operands[1]));
+  emit_insn (gen_truncdfsf2 (operands[0], tmp));
+}
   DONE;
 })
 
-(define_insn_and_split "truncsf2_fprs"
-  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
-   (float_truncate:SF (match_operand:IBM128 1 "gpc_reg_operand" "d")))
-   (clobber (match_scratch:DF 2 "=d"))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 2)
-   (float_truncate:DF (match_dup 1)))
-   (set (match_dup 0)
-   (float_truncate:SF (match_dup 2)))]
-  "")
-
 (define_expand "floatsi2"
   [(parallel [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
   (float:FLOAT128 (match_operand:SI 1 "gpc_reg_operand")))
-- 
1.8.3.1

[PATCH 5/6] rs6000: New testsuite selectors

2018-07-16 Thread Segher Boessenkool

This introduces four new selectors for use with Power testcases:
longdouble64, ppc_float128, ppc_float128_insns, powerpc_vsx.


2018-07-16  Segher Boessenkool  

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_longdouble64,
check_effective_target_ppc_float128,
check_effective_target_ppc_float128_insns,
check_effective_target_powerpc_vsx): New.

---
 gcc/testsuite/lib/target-supports.exp | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ec4a35d..c2d814c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2684,6 +2684,15 @@ proc check_effective_target_longdouble128 { } {
 }]
 }
 
+# Return 1 if the target supports long double of 64 bits,
+# 0 otherwise.
+
+proc check_effective_target_longdouble64 { } {
+return [check_no_compiler_messages longdouble64 object {
+   int dummy[sizeof(long double) == 8 ? 1 : -1];
+}]
+}
+
 # Return 1 if the target supports double of 64 bits,
 # 0 otherwise.
 
@@ -5141,6 +5150,36 @@ proc check_effective_target_powerpc_float128_hw_ok { } {
 }
 }
 
+# Return 1 if current options define float128, 0 otherwise.
+
+proc check_effective_target_ppc_float128 { } {
+return [check_no_compiler_messages_nocache ppc_float128 object {
+   #ifndef __FLOAT128__
+ nope no good
+   #endif
+}]
+}
+
+# Return 1 if current options generate float128 insns, 0 otherwise.
+
+proc check_effective_target_ppc_float128_insns { } {
+return [check_no_compiler_messages_nocache ppc_float128 object {
+   #ifndef __FLOAT128_HARDWARE__
+ nope no good
+   #endif
+}]
+}
+
+# Return 1 if current options generate VSX instructions, 0 otherwise.
+
+proc check_effective_target_powerpc_vsx { } {
+return [check_no_compiler_messages_nocache powerpc_vsx object {
+   #ifndef __VSX__
+ nope no vsx
+   #endif
+}]
+}
+
 # Return 1 if this is a PowerPC target supporting -mvsx
 
 proc check_effective_target_powerpc_vsx_ok { } {
-- 
1.8.3.1

[PATCH 1/6] rs6000: Use more correct names for some trunc/extend libcalls

2018-07-16 Thread Segher Boessenkool

They had source and destination swapped in the name.


2018-07-16  Segher Boessenkool  

* config/rs6000/rs6000.c (init_float128_ibm): Use more correct names
for the conversions between TDmode and IFmode.
(init_float128_ieee): Use more correct names for the conversions
between TDmode and KFmode.

---
 gcc/config/rs6000/rs6000.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1976072..f95aa59 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -17849,10 +17849,10 @@ init_float128_ibm (machine_mode mode)
 {
   set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdtf2");
   set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddtf2");
-  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctftd2");
+  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdtf2");
   set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunctfsd2");
   set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunctfdd2");
-  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendtdtf2");
+  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendtftd2");
 
   if (TARGET_POWERPC64)
{
@@ -17953,10 +17953,10 @@ init_float128_ieee (machine_mode mode)
 
   set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdkf2");
   set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddkf2");
-  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunckftd2");
+  set_conv_libfunc (trunc_optab, mode, TDmode, "__dpd_trunctdkf2");
   set_conv_libfunc (trunc_optab, SDmode, mode, "__dpd_trunckfsd2");
   set_conv_libfunc (trunc_optab, DDmode, mode, "__dpd_trunckfdd2");
-  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendtdkf2");
+  set_conv_libfunc (sext_optab, TDmode, mode, "__dpd_extendkftd2");
 
   set_conv_libfunc (sfix_optab, SImode, mode, "__fixkfsi");
   set_conv_libfunc (ufix_optab, SImode, mode, "__fixunskfsi");
-- 
1.8.3.1

[PATCH 6/6] rs6000: New testcase fp-convert.c

2018-07-16 Thread Segher Boessenkool

This tests the generated code for all conversions between floating point
point types, binary and decimal.


2018-07-16  Segher Boessenkool  

gcc/testsuite/
* gcc.target/powerpc/convert-fp-128.c: New testcase.
* gcc.target/powerpc/convert-fp-64.c: New testcase.

---
 gcc/testsuite/gcc.target/powerpc/convert-fp-128.c | 99 +++
 gcc/testsuite/gcc.target/powerpc/convert-fp-64.c  | 61 ++
 2 files changed, 160 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/convert-fp-128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/convert-fp-64.c

diff --git a/gcc/testsuite/gcc.target/powerpc/convert-fp-128.c 
b/gcc/testsuite/gcc.target/powerpc/convert-fp-128.c
new file mode 100644
index 000..67896d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/convert-fp-128.c
@@ -0,0 +1,99 @@
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target longdouble128 } */
+/* { dg-require-effective-target dfp } */
+
+#define conv(M,N) mode_##N conv##M##N(mode_##M x) { return x; }
+
+#define mode_sf float
+#define mode_df double
+typedef float __attribute__((mode(IF))) mode_if;
+typedef float __attribute__((mode(KF))) mode_kf;
+#define mode_sd _Decimal32
+#define mode_dd _Decimal64
+#define mode_td _Decimal128
+
+#ifdef __FLOAT128_TYPE__
+#define conv1(M) \
+   conv(M,sf) conv(M,df) conv(M,if) conv(M,kf) \
+   conv(M,sd) conv(M,dd) conv(M,td)
+#define conv2 \
+   conv1(sf) conv1(df) conv1(if) conv1(kf) \
+   conv1(sd) conv1(dd) conv1(td)
+#else
+#define conv1(M) \
+   conv(M,sf) conv(M,df) conv(M,if) \
+   conv(M,sd) conv(M,dd) conv(M,td)
+#define conv2 \
+   conv1(sf) conv1(df) conv1(if) \
+   conv1(sd) conv1(dd) conv1(td)
+#endif
+
+conv2
+
+
+
+/* { dg-final { scan-assembler-times {\mbl\M} 24 { target { ! hard_dfp } } } } 
*/
+/* { dg-final { scan-assembler-times {\mbl\M} 19 { target { hard_dfp && { ! 
ppc_float128 } } } } } */
+/* { dg-final { scan-assembler-times {\mbl\M} 31 { target { hard_dfp && { 
ppc_float128 && { ! ppc_float128_insns } } } } } } */
+/* { dg-final { scan-assembler-times {\mbl\M} 27 { target { hard_dfp && { 
ppc_float128 && { ppc_float128_insns } } } } } } */
+
+
+/* { dg-final { scan-assembler-times {\mbl __extendsfkf2\M} 1 { target { 
ppc_float128 && { ! ppc_float128_insns } } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsfsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsfdd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsftd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __extenddfkf2\M} 1 { target { 
ppc_float128 && { ! ppc_float128_insns } } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_truncdfsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extenddfdd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extenddftd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __trunctfkf2\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_trunctfsd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_trunctfdd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendtftd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __trunckfsf2\M} 1 { target { 
ppc_float128 && { ! ppc_float128_insns } } } } } */
+/* { dg-final { scan-assembler-times {\mbl __trunckfdf2\M} 1 { target { 
ppc_float128 && { ! ppc_float128_insns } } } } } */
+/* { dg-final { scan-assembler-times {\mbl __extendkftf2\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_trunckfsd\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_trunckfdd\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendkftd\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_truncsdsf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsddf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsdtf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsdkf\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsddd2\M} 1 { target { ! 
dfp } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendsdtd2\M} 1 { target { ! 
dfp } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_truncddsf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_truncdddf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendddtf\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendddkf\M} 1 { target { 
ppc_float128 } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_truncddsd2\M} 1 { target { ! 
dfp } } } } */
+/* { dg-final { scan-assembler-times {\mbl __dpd_extendddtd2\M} 1 { target { ! 
dfp } } } } */
+/* { dg-final { scan

Re: [PATCH, testsuite] PPC testsuite pr85456.c

2018-07-17 Thread Segher Boessenkool

On Tue, Jul 17, 2018 at 01:34:18PM -0400, David Edelsohn wrote:
> And another test that needs to be restricted because it assumes 128
> bit long double.

Thanks :-)


Segher

Re: [PATCH, rs6000] Sort Altivec/VSX built-in functions into subsubsections according to configuration requirements

2018-07-17 Thread Segher Boessenkool

On Tue, Jul 17, 2018 at 03:44:43PM -0500, Kelvin Nilsen wrote:
> The many PowerPC built-in functions (intrinsics) that are enabled by 
> including  each have different configuration requirements.  To 
> simplify the description of the requirements, this patch sorts these 
> functions into different subsubsections.  
> 
> A subsequent patch will add and remove various functions from each section to 
> correct incompatibilities between what is implemented and what is documented.
> 
> I have built and regression tested this patch on powerpc64le-unknown-linux 
> and on powerpc-linux (P8 big-endian) with no regressions.  I have also built 
> and reviewed the gcc.pdf file.

A changelog nit:

>   * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions):
>   Corrected spelling of this subsection.  Moved some material to new
>   subsubsections "PowerPC AltiVec Built-in Functions on ISA 2.06" and
>   "PowerPC AltiVec Built-in Functions on ISA 2.07".

* doc/extend.texi (PowerPC AltiVec Built-in Functions): Rename to...
(PowerPC AltiVec/VSX Built-in Functions): ... this.  Move some material
to new subsubsections "PowerPC AltiVec Built-in Functions on ISA 2.06"
and "PowerPC AltiVec Built-in Functions on ISA 2.07".

(I.e., mention old name as well as new, and use imperative instead of
past tense).

Looks fine otherwise.  Okay for trunk.  Thanks!

Segher

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-18 Thread Segher Boessenkool

Hi Carl,

On Tue, Jul 17, 2018 at 04:39:58PM -0700, Carl Love wrote:
> I was requested to backport the patch for the AIX test case failures to
> GCC 8.  The trunk patch applied cleanly to GCC 8.  I updated the
> changelog patch, built and retested the patch on:
> 
>     powerpc64le-unknown-linux-gnu (Power 8 LE)  
> powerpc64-unknown-linux-gnu (Power 8 BE)
> AIX 7200-00-01-1543 (Power 8 BE)
> 
> With no regressions.
> 
> Please let me know if it is OK to apply the patch to the GCC 8 branch. 

Sure, it's okay.  Thanks!


Segher


> 2018-07-17  Carl Love  
> 
>   Backport from mainline
>   2018-07-16  Carl Love  
> 
>   PR target/86414
>   * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
>   longdouble128.
>   * gcc.target/powerpc/divkc3-3.c: Ditto.
>   * gcc.target/powerpc/mulkc3-2.c: Ditto.
>   * gcc.target/powerpc/mulkc3-3.c: Ditto.
>   * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
>   * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.

Re: [PATCH], Remove undocumented -mtoc-fusion from PowerPC

2018-07-18 Thread Segher Boessenkool

Hi Mike,

On Fri, Jul 13, 2018 at 04:56:13PM -0400, Michael Meissner wrote:
> This means rather than keeping the toc fusion around (that nobody used), I
> would prefer to delete the current code, and replace it with better code as I
> implement it.

> +++ gcc/config/rs6000/constraints.md  (working copy)

> +;; wG is now available.  Previously it was a memory operand suitable for TOC
> +;; fusion.

There are many other constraints unused.  Keep track of all, instead?
Like we have (at the top of this file)
;; Available constraint letters: e k q t u A B C D S T
you could do something similar for the "w" names.

> --- gcc/config/rs6000/predicates.md   (revision 262647)
> +++ gcc/config/rs6000/predicates.md   (working copy)
> @@ -412,7 +412,7 @@ (define_predicate "fpr_reg_operand"
>  ;;
>  ;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
>  ;; power9 fusion allow the floating point types.
> -(define_predicate "toc_fusion_or_p9_reg_operand"
> +(define_predicate "p9_fusion_reg_operand"

The comment before this needs fixing, too:

;; Return true if this is a register that can has D-form addressing (GPR and
;; traditional FPR registers for scalars).  ISA 3.0 (power9) adds D-form
;; addressing for scalars in Altivec registers.
;;
;; If this is a pseudo only allow for GPR fusion in power8.  If we have the
;; power9 fusion allow the floating point types.

It's not clear to me what this really stands for (not before the patch,
either).

Okay for trunk.  Thanks!  Please follow up to the above two things.

Segher

Re: [PATCH,rs6000] AIX test fixes 2

2018-07-21 Thread Segher Boessenkool

Hi Carl,

On Fri, Jul 20, 2018 at 12:42:33PM -0700, Carl Love wrote:
> The following patch fixes errors on AIX for the "vector double" tests
> in altivec-1-runnable.c file.  The type "vector double" requires the
> use of the GCC command line option -mvsx. The vector double tests
> in altivec-1-runnable.c should be in altivec-2-runnable.c.  It looks
> like my Linux testing of the original patch worked because I configured
> GCC by default with -mcpu=power8.  AIX is not using that as the default
> processor thus causing the compile of altivec-1-runnable.c to fail.

This also fails on powerpc64-linux on a Power7.

> The vec_or tests in builtins-1.c were moved to another file by a
> previous patch.  The vec_or test generated the xxlor instruction.  The
> count of the xxlor instruction varies depending on the target as it is
> used as a move instruction.  No other tests generate the xxlor
> instruction. Hence, the count check was removed.

Okay, good idea.

> --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
> @@ -287,7 +287,6 @@ int main ()
> vec_mergeh  xxmrglw, vmrglh
> vec_mul mulld | mullw, mulhwu
> vec_nor xxlnor
> -   vec_or  xxlor

So there is no vec_or test anymore?  Is there more incorrect / out of date
in this comment?

> @@ -334,8 +331,10 @@ int main ()
>  /* { dg-final { scan-assembler-times "divd" 8  { target lp64 } } } */
>  /* { dg-final { scan-assembler-times "divdu" 2  { target lp64 } } } */
>  /* { dg-final { scan-assembler-times "mulld" 4  { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2  { target { ilp32 } 
> } } } */
> -/* { dg-final { scan-assembler-times {\mbl __udivdi3\M} 2  { target {ilp32 } 
> } } } */
> +/* { dg-final { scan-assembler-times {\mbl .__divdi3\M} 2   { target { ilp32 
> &&  powerpc*-*-aix* } } } } */
> +/* { dg-final { scan-assembler-times {\mbl .__udivdi3\M} 2  { target { ilp32 
> &&  powerpc*-*-aix* } } } } */
> +/* { dg-final { scan-assembler-times {\mbl __divdi3\M} 2  { target { ilp32 
> && powerpc*-*-linux* } } } } */
> +/* { dg-final { scan-assembler-times {\mbl __udivdi3\M} 2 { target { ilp32 
> && powerpc*-*-linux* } } } } */

Those "." are the dot in a regexp, i.e. they match any character.

You could do something like

/* { dg-final { scan-assembler-times {\mbl \.?__divdi3\M} 2   { target { ilp32 
} } } } */
/* { dg-final { scan-assembler-times {\mbl \.?__udivdi3\M} 2  { target { ilp32 
} } } } */

to handle both ABIs with and without dots.

But, current patch is okay for trunk as well.  Thanks!


Segher

[PATCH 1/2] rs6000: Generate rl*imi for memory some more

2018-07-23 Thread Segher Boessenkool

An rlimi instruction is often written like "(a << 8) | (b & 255)".
If "b" now is a byte in memory, combine will combine the load with the
masking (with 255 in the example), since that is a single instruction;
and then the rl*imi isn't combined from the remaining pieces.

This patch adds a splitter to make combine handle this case.
Tested on powerpc64-linux {-m32,-m64} and on powerpc64le-linux; committing.


Segher


2018-07-23  Segher Boessenkool  

* config/rs6000/rs6000.md (splitters for rldimi and rlwimi with the
zero_extend argument from memory): New.

---
 gcc/config/rs6000/rs6000.md | 41 +
 1 file changed, 41 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 94a0f7d..68ba5fd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4065,6 +4065,47 @@ (define_insn_and_split "*ior_mask"
(set_attr "length" "8")])
 
 
+; Yet another case is an rldimi with the second value coming from memory.
+; The zero_extend that should become part of the rldimi is merged into the
+; load from memory instead.  Split things properly again.
+(define_split
+  [(set (match_operand:DI 0 "gpc_reg_operand")
+   (ior:DI (ashift:DI (match_operand:DI 1 "gpc_reg_operand")
+  (match_operand:SI 2 "const_int_operand"))
+   (zero_extend:DI (match_operand:QHSI 3 "memory_operand"]
+  "INTVAL (operands[2]) == "
+  [(set (match_dup 4)
+   (zero_extend:DI (match_dup 3)))
+   (set (match_dup 0)
+   (ior:DI (and:DI (match_dup 4)
+   (match_dup 5))
+   (ashift:DI (match_dup 1)
+  (match_dup 2]
+{
+  operands[4] = gen_reg_rtx (DImode);
+  operands[5] = GEN_INT ((HOST_WIDE_INT_1U << ) - 1);
+})
+
+; rlwimi, too.
+(define_split
+  [(set (match_operand:SI 0 "gpc_reg_operand")
+   (ior:SI (ashift:SI (match_operand:SI 1 "gpc_reg_operand")
+  (match_operand:SI 2 "const_int_operand"))
+   (zero_extend:SI (match_operand:QHI 3 "memory_operand"]
+  "INTVAL (operands[2]) == "
+  [(set (match_dup 4)
+   (zero_extend:SI (match_dup 3)))
+   (set (match_dup 0)
+   (ior:SI (and:SI (match_dup 4)
+   (match_dup 5))
+   (ashift:SI (match_dup 1)
+  (match_dup 2]
+{
+  operands[4] = gen_reg_rtx (SImode);
+  operands[5] = GEN_INT ((HOST_WIDE_INT_1U << ) - 1);
+})
+
+
 ;; Now the simple shifts.
 
 (define_insn "rotl3"
-- 
1.8.3.1

[PATCH 2/2] rs6000: Improve vsx_init_v4si

2018-07-23 Thread Segher Boessenkool

This changes vsx_init_v4si to be an expander.  That way, no special
cases are needed anymore for special arguments: the normal RTL passes
can deal with it.

Tested as usual; committing.


Segher


2018-07-23  Segher Boessenkool  

* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
* config/rs6000/rs6000-protos.h (rs6000_split_v4si_init): Delete.
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Always force
the elements into a register.
(rs6000_split_v4si_init_di_reg): Delete.
(rs6000_split_v4si_init): Delete.
* config/rs6000/vsx.md (unspec): Delete UNSPEC_VSX_VEC_INIT.
(vsx_init_v4si): Rewrite as a define_expand.

---
 gcc/config/rs6000/rs6000-p8swap.c |  1 -
 gcc/config/rs6000/rs6000-protos.h |  1 -
 gcc/config/rs6000/rs6000.c| 92 +--
 gcc/config/rs6000/vsx.md  | 49 ++---
 4 files changed, 34 insertions(+), 109 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-p8swap.c 
b/gcc/config/rs6000/rs6000-p8swap.c
index 071bc0c..f32db38 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -772,7 +772,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
  case UNSPEC_VSX_EXTRACT:
  case UNSPEC_VSX_SET:
  case UNSPEC_VSX_SLDWI:
- case UNSPEC_VSX_VEC_INIT:
  case UNSPEC_VSX_VSLO:
  case UNSPEC_VUNPACK_HI_SIGN:
  case UNSPEC_VUNPACK_HI_SIGN_DIRECT:
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index d548d80..9dec245 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -61,7 +61,6 @@ extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
-extern void rs6000_split_v4si_init (rtx []);
 extern void altivec_expand_vec_perm_le (rtx op[4]);
 extern void rs6000_expand_extract_even (rtx, rtx, rtx);
 extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 62b8ea3..8f65a9f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6867,11 +6867,7 @@ rs6000_expand_vector_init (rtx target, rtx vals)
  size_t i;
 
  for (i = 0; i < 4; i++)
-   {
- elements[i] = XVECEXP (vals, 0, i);
- if (!CONST_INT_P (elements[i]) && !REG_P (elements[i]))
-   elements[i] = copy_to_mode_reg (SImode, elements[i]);
-   }
+   elements[i] = force_reg (SImode, XVECEXP (vals, 0, i));
 
  emit_insn (gen_vsx_init_v4si (target, elements[0], elements[1],
elements[2], elements[3]));
@@ -7578,92 +7574,6 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx 
element, rtx tmp_gpr,
 gcc_unreachable ();
  }
 
-/* Helper function for rs6000_split_v4si_init to build up a DImode value from
-   two SImode values.  */
-
-static void
-rs6000_split_v4si_init_di_reg (rtx dest, rtx si1, rtx si2, rtx tmp)
-{
-  const unsigned HOST_WIDE_INT mask_32bit = HOST_WIDE_INT_C (0x);
-
-  if (CONST_INT_P (si1) && CONST_INT_P (si2))
-{
-  unsigned HOST_WIDE_INT const1 = (UINTVAL (si1) & mask_32bit) << 32;
-  unsigned HOST_WIDE_INT const2 = UINTVAL (si2) & mask_32bit;
-
-  emit_move_insn (dest, GEN_INT (const1 | const2));
-  return;
-}
-
-  /* Put si1 into upper 32-bits of dest.  */
-  if (CONST_INT_P (si1))
-emit_move_insn (dest, GEN_INT ((UINTVAL (si1) & mask_32bit) << 32));
-  else
-{
-  /* Generate RLDIC.  */
-  rtx si1_di = gen_rtx_REG (DImode, regno_or_subregno (si1));
-  rtx shift_rtx = gen_rtx_ASHIFT (DImode, si1_di, GEN_INT (32));
-  rtx mask_rtx = GEN_INT (mask_32bit << 32);
-  rtx and_rtx = gen_rtx_AND (DImode, shift_rtx, mask_rtx);
-  gcc_assert (!reg_overlap_mentioned_p (dest, si1));
-  emit_insn (gen_rtx_SET (dest, and_rtx));
-}
-
-  /* Put si2 into the temporary.  */
-  gcc_assert (!reg_overlap_mentioned_p (dest, tmp));
-  if (CONST_INT_P (si2))
-emit_move_insn (tmp, GEN_INT (UINTVAL (si2) & mask_32bit));
-  else
-emit_insn (gen_zero_extendsidi2 (tmp, si2));
-
-  /* Combine the two parts.  */
-  emit_insn (gen_iordi3 (dest, dest, tmp));
-  return;
-}
-
-/* Split a V4SI initialization.  */
-
-void
-rs6000_split_v4si_init (rtx operands[])
-{
-  rtx dest = operands[0];
-
-  /* Destination is a GPR, build up the two DImode parts in place.  */
-  if (REG_P (dest) || SUBREG_P (dest))
-{
-  int d_regno = regno_or_subregno (dest);
-  rtx scalar1 = operands[1];
-  rtx scalar2 = operands[2];
-  rtx scalar3 = operands[3];
-  rtx scalar4 = operands[4];
-  rtx tmp1 = operands[5];
-  rt

[PATCH] combine: Allow combining two insns to two insns

2018-07-24 Thread Segher Boessenkool

This patch allows combine to combine two insns into two.  This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen.  PR85160 is a typical example of code
that it can improve.

This patch does not allow such combinations if either of the original
instructions was a simple move instruction.  In those cases combining
the two instructions increases register pressure without improving the
code.  With this move test register pressure does no longer increase
noticably as far as I can tell.

(At first I also didn't allow either of the resulting insns to be a
move instruction.  But that is actually a very good thing to have, as
should have been obvious).

Tested for many months; tested on about 30 targets.

I'll commit this later this week if there are no objections.


Segher


2018-07-24  Segher Boessenkool  

PR rtl-optimization/85160
* combine.c (is_just_move): New function.
(try_combine): Allow combining two instructions into two if neither of
the original instructions was a move.

---
 gcc/combine.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index cfe0f19..d64e84d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2604,6 +2604,17 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
   return true;
 }
 
+/* Return whether X is just a single set, with the source
+   a general_operand.  */
+static bool
+is_just_move (rtx x)
+{
+  if (INSN_P (x))
+x = PATTERN (x);
+
+  return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode));
+}
+
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
I0 and I1 can be zero; then we combine just I2 into I3, or I1 and I2 into
@@ -2668,6 +2679,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   int swap_i2i3 = 0;
   int split_i2i3 = 0;
   int changed_i3_dest = 0;
+  bool i2_was_move = false, i3_was_move = false;
 
   int maxreg;
   rtx_insn *temp_insn;
@@ -3059,6 +3071,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   return 0;
 }
 
+  /* Record whether i2 and i3 are trivial moves.  */
+  i2_was_move = is_just_move (i2);
+  i3_was_move = is_just_move (i3);
+
   /* Record whether I2DEST is used in I2SRC and similarly for the other
  cases.  Knowing this will help in register status updating below.  */
   i2dest_in_i2src = reg_overlap_mentioned_p (i2dest, i2src);
@@ -4014,8 +4030,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && XVECLEN (newpat, 0) == 2
   && GET_CODE (XVECEXP (newpat, 0, 0)) == SET
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
-  && (i1 || set_noop_p (XVECEXP (newpat, 0, 0))
- || set_noop_p (XVECEXP (newpat, 0, 1)))
+  && (i1
+  || set_noop_p (XVECEXP (newpat, 0, 0))
+  || set_noop_p (XVECEXP (newpat, 0, 1))
+  || (!i2_was_move && !i3_was_move))
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
-- 
1.8.3.1

Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Segher Boessenkool

On Mon, Jul 23, 2018 at 10:26:35PM -0600, Sandra Loosemore wrote:
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 18cabaf..b2ee0c9 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -94,6 +94,9 @@ am33_2.0-*-linux*)
>  arc*-*-*)
>   cpu_type=arc
>   ;;
> +csky*-*-*)
> + cpu_type=csky
> + ;;
>  arm*-*-*)
>   cpu_type=arm
>   ;;

This long list was alphabetic before (except x86_64 and tic6x, alas);
let's not make things worse?


Segher

Re: [5/5] C-SKY port: libgcc

2018-07-24 Thread Segher Boessenkool

On Tue, Jul 24, 2018 at 12:19:30PM -0600, Sandra Loosemore wrote:
> On 07/24/2018 12:10 PM, Segher Boessenkool wrote:
> >On Mon, Jul 23, 2018 at 10:26:35PM -0600, Sandra Loosemore wrote:
> >>diff --git a/libgcc/config.host b/libgcc/config.host
> >>index 18cabaf..b2ee0c9 100644
> >>--- a/libgcc/config.host
> >>+++ b/libgcc/config.host
> >>@@ -94,6 +94,9 @@ am33_2.0-*-linux*)
> >>  arc*-*-*)
> >>cpu_type=arc
> >>;;
> >>+csky*-*-*)
> >>+   cpu_type=csky
> >>+   ;;
> >>  arm*-*-*)
> >>cpu_type=arm
> >>;;
> >
> >This long list was alphabetic before (except x86_64 and tic6x, alas);
> >let's not make things worse?
> 
> Oops!  Good catch on that.  I'll take care of it.

Thanks!

Rest looks fine fwiw (I just skimmed it, and I cannot read .gz files
without some effort, so take it for what it's worth: not too much ;-) ).


Segher

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread Segher Boessenkool

On Wed, Jul 25, 2018 at 10:28:30AM +0200, Richard Biener wrote:
> On Tue, Jul 24, 2018 at 7:18 PM Segher Boessenkool
>  wrote:
> >
> > This patch allows combine to combine two insns into two.  This helps
> > in many cases, by reducing instruction path length, and also allowing
> > further combinations to happen.  PR85160 is a typical example of code
> > that it can improve.
> >
> > This patch does not allow such combinations if either of the original
> > instructions was a simple move instruction.  In those cases combining
> > the two instructions increases register pressure without improving the
> > code.  With this move test register pressure does no longer increase
> > noticably as far as I can tell.
> >
> > (At first I also didn't allow either of the resulting insns to be a
> > move instruction.  But that is actually a very good thing to have, as
> > should have been obvious).
> >
> > Tested for many months; tested on about 30 targets.
> >
> > I'll commit this later this week if there are no objections.
> 
> Sounds good - but, _any_ testcase?  Please! ;)

I only have target-specific ones.  Most *simple* ones will already be
optimised by current code (via 3->2 combination).  But I've now got one
that trunk does not optimise, and it can be confirmed with looking at
the resulting machine code even (no need to look at the combine dump,
which is a very good thing).  And it is a proper thing to test even: it
tests that some source is compiled to properly optimised machine code.

Any other kind of testcase is worse than useless, of course.

Testing it results in working code isn't very feasible or useful either.

Segher

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-25 Thread Segher Boessenkool

On Wed, Jul 25, 2018 at 09:47:31AM -0400, David Malcolm wrote:
> > +/* Return whether X is just a single set, with the source
> > +   a general_operand.  */
> > +static bool
> > +is_just_move (rtx x)
> > +{
> > +  if (INSN_P (x))
> > +x = PATTERN (x);
> > +
> > +  return (GET_CODE (x) == SET && general_operand (SET_SRC (x),
> > VOIDmode));
> > +}
> 
> If I'm reading it right, the patch only calls this function on i2 and
> i3, which are known to be rtx_insn *, rather than just rtx.

I used to also have

  is_just_move (XVECEXP (newpat, 0, 0))

etc.; during most of combine you do not have instructions, just patterns.


Segher

[PATCH] combine: Another hard register problem (PR85805)

2018-07-26 Thread Segher Boessenkool

The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode.  This is
completely wrong for non-pseudos.  For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts.  The value written to the DImode is not the same as that
written to the lowest-numbered QImode!

This patch fixes it.  Committing.  Will backport later, too.


Segher


2018-07-26  Segher Boessenkool  

PR rtl-optimization/85805
* combine.c (reg_nonzero_bits_for_combine): Only use the last set
value for hard registers if that was written in the same mode.

---
 gcc/combine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 09cbad4..fe71f3a 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -10186,7 +10186,8 @@ reg_nonzero_bits_for_combine (const_rtx x, 
scalar_int_mode xmode,
   rsp = ®_stat[REGNO (x)];
   if (rsp->last_set_value != 0
   && (rsp->last_set_mode == mode
- || (GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
+ || (REGNO (x) >= FIRST_PSEUDO_REGISTER
+ && GET_MODE_CLASS (rsp->last_set_mode) == MODE_INT
  && GET_MODE_CLASS (mode) == MODE_INT))
   && ((rsp->last_set_label >= label_tick_ebb_start
   && rsp->last_set_label < label_tick)
-- 
1.8.3.1

Re: [PATCH] combine: Another hard register problem (PR85805)

2018-07-26 Thread Segher Boessenkool

On Thu, Jul 26, 2018 at 01:16:42PM +0200, Richard Biener wrote:
> On Thu, Jul 26, 2018 at 12:12 PM Segher Boessenkool
>  wrote:
> >
> > The current code in reg_nonzero_bits_for_combine allows using the
> > reg_stat info when last_set_mode is a different integer mode.  This is
> > completely wrong for non-pseudos.  For example, as in the PR, a value
> > in a DImode hard register is set by eight writes to its constituent
> > QImode parts.  The value written to the DImode is not the same as that
> > written to the lowest-numbered QImode!
> >
> > This patch fixes it.  Committing.  Will backport later, too.
> 
> testcase?

Feel free to write one?


Segher

Re: Fwd: [PATCH, rs6000] Replace __uint128_t and __int128_t with uint128 and int128 in Power PC built-in documentation

2018-07-26 Thread Segher Boessenkool

On Thu, Jul 26, 2018 at 08:40:01AM -0500, Kelvin Nilsen wrote:
> To improve internal consistency and to improve consistency with published ABI 
> documents, this patch replaces the __uint128_t type with __uint128 and 
> replaces __int128_t with __int128.

> Is this ok for trunk?

Looks good, thanks!  Most (all?) of these functions are not documented
in the ABI, but this is a step forward anyway.  Okay for trunk.

What do things like error messages involving these functions look like?
What types do those say?

Segher

Re: [PATCH rs6000] Fix PR86612

2018-07-26 Thread Segher Boessenkool

On Thu, Jul 26, 2018 at 02:53:09PM -0500, Pat Haugen wrote:
> Probably an obvious patch but...
> 
> The testcase fails because it looks like recent glibc headers (2.27 at least) 
> no longer contain a declaration for __strdup, which results in warning 
> messages being generated and failure for excess errors. Fixed by calling the 
> standard name.
> 
> Verified the testcase now passes, ok for trunk?

I thought you pasted this strangely, but nope, the testcase is really
like that.

Okay for trunk and all branches.  Thanks!


Segher

[PATCH] arm: Generate correct const_ints (PR86640)

2018-07-30 Thread Segher Boessenkool

In arm_block_set_aligned_vect 8-bit constants are generated as zero-
extended const_ints, not sign-extended as required.  Fix that.

Tamar tested the patch (see PR); no problems were found.  Is this okay
for trunk?


Segher


2018-07-30  Segher Boessenkool  

PR target/86640
* config/arm/arm.c (arm_block_set_aligned_vect): Use gen_int_mode
instead of GEN_INT.

---
 gcc/config/arm/arm.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cf12ace..f5eece4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30046,7 +30046,6 @@ arm_block_set_aligned_vect (rtx dstbase,
   rtx dst, addr, mem;
   rtx val_vec, reg;
   machine_mode mode;
-  unsigned HOST_WIDE_INT v = value;
   unsigned int offset = 0;
 
   gcc_assert ((align & 0x3) == 0);
@@ -30065,10 +30064,8 @@ arm_block_set_aligned_vect (rtx dstbase,
 
   dst = copy_addr_to_reg (XEXP (dstbase, 0));
 
-  v = sext_hwi (v, BITS_PER_WORD);
-
   reg = gen_reg_rtx (mode);
-  val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
+  val_vec = gen_const_vec_duplicate (mode, gen_int_mode (value, QImode));
   /* Emit instruction loading the constant value.  */
   emit_move_insn (reg, val_vec);
 
-- 
1.8.3.1

[PATCH] testcase for 2-2 combine

2018-07-30 Thread Segher Boessenkool

Committing.


Segher


2018-07-30  Segher Boessenkool  

gcc/testsuite/
PR rtl-optimization/85160
* gcc.target/powerpc/combine-2-2.c: New testcase.

---
 gcc/testsuite/gcc.target/powerpc/combine-2-2.c | 17 +
 1 file changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/combine-2-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/combine-2-2.c 
b/gcc/testsuite/gcc.target/powerpc/combine-2-2.c
new file mode 100644
index 000..234476d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/combine-2-2.c
@@ -0,0 +1,17 @@
+/* { dg-options "-O2" } */
+
+/* PR85160 */
+
+/* Originally, the "x >> 14" are CSEd away (eventually becoming a srawi
+   instruction), and the two ANDs remain separate instructions because
+   combine cannot deal with this.
+
+   Now that combine knows how to combine two RTL insns into two, it manages
+   to make this just the sum of two rlwinm instructions.  */
+
+int f(int x)
+{
+  return ((x >> 14) & 6) + ((x >> 14) & 4);
+}
+
+/* { dg-final { scan-assembler-not {\msrawi\M} } } */
-- 
1.8.3.1

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-30 Thread Segher Boessenkool

On Tue, Jul 24, 2018 at 05:18:41PM +, Segher Boessenkool wrote:
> This patch allows combine to combine two insns into two.  This helps
> in many cases, by reducing instruction path length, and also allowing
> further combinations to happen.  PR85160 is a typical example of code
> that it can improve.
> 
> This patch does not allow such combinations if either of the original
> instructions was a simple move instruction.  In those cases combining
> the two instructions increases register pressure without improving the
> code.  With this move test register pressure does no longer increase
> noticably as far as I can tell.
> 
> (At first I also didn't allow either of the resulting insns to be a
> move instruction.  But that is actually a very good thing to have, as
> should have been obvious).
> 
> Tested for many months; tested on about 30 targets.
> 
> I'll commit this later this week if there are no objections.

Done now, with the testcase at 
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01856.html .


Segher

Re: [PATCH] arm: Generate correct const_ints (PR86640)

2018-07-30 Thread Segher Boessenkool

On Mon, Jul 30, 2018 at 03:55:30PM +0100, Kyrill Tkachov wrote:
> Hi Segher,
> 
> On 30/07/18 14:14, Segher Boessenkool wrote:
> >In arm_block_set_aligned_vect 8-bit constants are generated as zero-
> >extended const_ints, not sign-extended as required.  Fix that.
> >
> >Tamar tested the patch (see PR); no problems were found.  Is this okay
> >for trunk?
> >
> 
> The patch is okay but please add the testcase from the PR to gcc.dg/
> or somewhere else generic (it's not arm-specific).

It only failed with very specific options, which aren't valid on every
ARM config either I think?

-O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a

I don't know the magic incantations for ARM tests, sorry.


Segher

Re: Fwd: [PATCH, rs6000] Replace __uint128_t and __int128_t with uint128 and int128 in Power PC built-in documentation

2018-07-30 Thread Segher Boessenkool

On Fri, Jul 27, 2018 at 10:07:20AM -0500, Kelvin Nilsen wrote:
> On 7/26/18 9:54 AM, Segher Boessenkool wrote:
> > On Thu, Jul 26, 2018 at 08:40:01AM -0500, Kelvin Nilsen wrote:
> >> To improve internal consistency and to improve consistency with published 
> >> ABI documents, this patch replaces the __uint128_t type with __uint128 and 
> >> replaces __int128_t with __int128.
> > 
> >> Is this ok for trunk?
> > 
> > Looks good, thanks!  Most (all?) of these functions are not documented
> > in the ABI, but this is a step forward anyway.  Okay for trunk.
> > 
> > What do things like error messages involving these functions look like?
> > What types do those say?
> 
> Thanks for review and approval.  To respond to your question about error 
> messages:
> > 
> > microdoc3.c:22:3: error: invalid parameter combination for AltiVec 
> > intrinsic ‘__builtin_vec_vaddcuq’
> >u1 = vec_vaddcuq (d2, d3);
> >^~

Ah, so no type at all.  Well that's good :-)  Thanks,


Segher

Re: [PATCH] arm: Generate correct const_ints (PR86640)

2018-07-31 Thread Segher Boessenkool

On Tue, Jul 31, 2018 at 09:02:56AM +0100, Kyrill Tkachov wrote:
> Hi Segher,
> 
> On 30/07/18 18:37, Segher Boessenkool wrote:
> >On Mon, Jul 30, 2018 at 03:55:30PM +0100, Kyrill Tkachov wrote:
> >>Hi Segher,
> >>
> >>On 30/07/18 14:14, Segher Boessenkool wrote:
> >>>In arm_block_set_aligned_vect 8-bit constants are generated as zero-
> >>>extended const_ints, not sign-extended as required.  Fix that.
> >>>
> >>>Tamar tested the patch (see PR); no problems were found.  Is this okay
> >>>for trunk?
> >>>
> >>The patch is okay but please add the testcase from the PR to gcc.dg/
> >>or somewhere else generic (it's not arm-specific).
> >It only failed with very specific options, which aren't valid on every
> >ARM config either I think?
> >
> >-O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a
> 
> This is a fairly common Arm configuration that is tested by many people by 
> default,
> so you wouldn't need to add any arm-specific directives. Just putting it in 
> one of
> the generic C tests folders would be enough.

Ah ok, so just dg-options -O3 will do?

(As before, I have no reasonable way to test this).


Segher

[PATCH] arm: Testcase for PR86640

2018-07-31 Thread Segher Boessenkool

Hi Kyrill,

As before, untested.  Is this okay for trunk, or will you handle it
yourself (or will someone else do it?)


Segher


2018-07-31  Segher Boessenkool  

gcc/testsuite/
PR target/86640
* gcc.target/arm/pr86640.c: New testcase.

---
 gcc/testsuite/gcc.target/arm/pr86640.c | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr86640.c

diff --git a/gcc/testsuite/gcc.target/arm/pr86640.c 
b/gcc/testsuite/gcc.target/arm/pr86640.c
new file mode 100644
index 000..e104602
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86640.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O3" } */
+
+/* This ICEd with  -O3 -mfpu=neon -mfloat-abi=hard -march=armv7-a  .  */
+
+char fn1() {
+  long long b[5];
+  for (int a = 0; a < 5; a++)
+b[a] = ~0ULL;
+  return b[3];
+}
-- 
1.8.3.1

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Segher Boessenkool

Hi Christophe,

On Tue, Jul 31, 2018 at 02:34:06PM +0200, Christophe Lyon wrote:
> Since this was committed, I've noticed regressions
> on aarch64:
> FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:

This went from
and w0, w0, 255
lsl w1, w0, 8
orr w0, w1, w0, lsl 20
ret
to
and w1, w0, 255
ubfiz   w0, w0, 8, 8
orr w0, w0, w1, lsl 20
ret
so it's neither an improvement nor a regression, just different code.
The testcase wants no ANDs in the RTL.


> on arm-none-linux-gnueabi
> FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test

That sounds bad.  Open a PR, maybe?


> On aarch64, I've also noticed a few others regressions but I'm not yet
> 100% sure it's caused by this patch (bisect running):
> gcc.target/aarch64/ashltidisi.c scan-assembler-times asr 4

 ushift_53_i:
-   uxtwx1, w0
-   lsl x0, x1, 53
-   lsr x1, x1, 11
+   lsr w1, w0, 11
+   lsl x0, x0, 53
ret

 shift_53_i:
-   sxtwx1, w0
-   lsl x0, x1, 53
-   asr x1, x1, 11
+   sbfxx1, x0, 11, 21
+   lsl x0, x0, 53
ret

Both are improvements afais.  The number of asr insns changes, sure.


> gcc.target/aarch64/sve/var_stride_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tadd\\tx[0-9]+, x[0-9]+, x[0-9]+, lsl 10\\n 2

Skipping all the SVE tests, sorry.  Richard says they look like
improvements, and exactly of the expected kind.  :-)


Segher

Re: [PATCH] combine: Allow combining two insns to two insns

2018-07-31 Thread Segher Boessenkool

On Tue, Jul 31, 2018 at 05:39:37AM -0700, H.J. Lu wrote:
> For
> 
> ---
> #define N 16
> float f[N];
> double d[N];
> int n[N];
> 
> __attribute__((noinline)) void
> f3 (void)
> {
>   int i;
>   for (i = 0; i < N; i++)
> d[i] = f[i];
> }
> ---
> 
> r263067 improved -O3 -mavx2 -mtune=generic -m64 from
> 
> .cfi_startproc
> vmovaps f(%rip), %xmm2
> vmovaps f+32(%rip), %xmm3
> vinsertf128 $0x1, f+16(%rip), %ymm2, %ymm0
> vcvtps2pd %xmm0, %ymm1
> vextractf128 $0x1, %ymm0, %xmm0
> vmovaps %xmm1, d(%rip)
> vextractf128 $0x1, %ymm1, d+16(%rip)
> vcvtps2pd %xmm0, %ymm0
> vmovaps %xmm0, d+32(%rip)
> vextractf128 $0x1, %ymm0, d+48(%rip)
> vinsertf128 $0x1, f+48(%rip), %ymm3, %ymm0
> vcvtps2pd %xmm0, %ymm1
> vextractf128 $0x1, %ymm0, %xmm0
> vmovaps %xmm1, d+64(%rip)
> vextractf128 $0x1, %ymm1, d+80(%rip)
> vcvtps2pd %xmm0, %ymm0
> vmovaps %xmm0, d+96(%rip)
> vextractf128 $0x1, %ymm0, d+112(%rip)
> vzeroupper
> ret
> .cfi_endproc
> 
> to
> 
> .cfi_startproc
> vcvtps2pd f(%rip), %ymm0
> vmovaps %xmm0, d(%rip)
> vextractf128 $0x1, %ymm0, d+16(%rip)
> vcvtps2pd f+16(%rip), %ymm0
> vmovaps %xmm0, d+32(%rip)
> vextractf128 $0x1, %ymm0, d+48(%rip)
> vcvtps2pd f+32(%rip), %ymm0
> vextractf128 $0x1, %ymm0, d+80(%rip)
> vmovaps %xmm0, d+64(%rip)
> vcvtps2pd f+48(%rip), %ymm0
> vextractf128 $0x1, %ymm0, d+112(%rip)
> vmovaps %xmm0, d+96(%rip)
> vzeroupper
> ret
> .cfi_endproc

I cannot really read AVX, but that looks like better code alright :-)


Segher

Re: [PATCH 11/11] rs6000 - add speculation_barrier pattern

2018-07-31 Thread Segher Boessenkool

On Tue, Jul 31, 2018 at 05:01:02PM -0500, Bill Schmidt wrote:
> > On Jul 27, 2018, at 4:37 AM, Richard Earnshaw  
> > wrote:
> > This patch reworks the existing rs6000_speculation_barrier pattern to
> > work with the new __builtin_sepculation_safe_value() intrinsic.  The
> > change is trivial as it simply requires renaming the existing speculation
> > barrier pattern.
> > 
> > So the total patch is to delete 14 characters!

> I can't ack the patch, but I am happy with it.  Thank you for this work!

Looks fine to me, too.  I'm sure someone has tested it by now, too ;-)
Okay for trunk.  Thanks!


Segher


> > * config/rs6000/rs6000.md (speculation_barrier): Renamed from
> > rs6000_speculation_barrier.
> > * config/rs6000/rs6000.c (rs6000_expand_builtin): Adjust for
> > new barrier pattern name.

Re: [PATCH] combine: Allow combining two insns to two insns

2018-08-01 Thread Segher Boessenkool

On Wed, Aug 01, 2018 at 10:27:31AM +0200, Christophe Lyon wrote:
> On Tue, 31 Jul 2018 at 15:57, Segher Boessenkool
>  wrote:
> > On Tue, Jul 31, 2018 at 02:34:06PM +0200, Christophe Lyon wrote:
> > > Since this was committed, I've noticed regressions
> > > on aarch64:
> > > FAIL: gcc.dg/zero_bits_compound-1.c scan-assembler-not \\(and:
> >
> > This went from
> > and w0, w0, 255
> > lsl w1, w0, 8
> > orr w0, w1, w0, lsl 20
> > ret
> > to
> > and w1, w0, 255
> > ubfiz   w0, w0, 8, 8
> > orr w0, w0, w1, lsl 20
> > ret
> > so it's neither an improvement nor a regression, just different code.
> > The testcase wants no ANDs in the RTL.
> 
> I didn't try to manually regenerate the code before and after the patch,
> but if there was "and w0, w0, 255" before the patch, why did the test 
> pass?

It wasn't an AND in RTL (it was a ZERO_EXTEND).

> > > on arm-none-linux-gnueabi
> > > FAIL: gfortran.dg/actual_array_constructor_1.f90   -O1  execution test
> >
> > That sounds bad.  Open a PR, maybe?
> >
> I've just filed PR86771

Thanks.


Segher

Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.

2023-05-26 Thread Segher Boessenkool

Hi Mike,

On Wed, May 10, 2023 at 11:38:55AM -0400, Michael Meissner wrote:
> This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be 
> clearer.

That is not at all what I asked for, even if I would agree the code is
nicer to read now (I don't).

What I asked for, what is needed, is for your patches to be readable.
This is a prerequisite for them to be reviewable, which is a
prerequisite for them to be approvable.  One way to do that is to split
out refactorings (which I asked for) and rewrites (which you did) to
earlier patches in the series.  Pure refactoring are easy to review:
they change exactly nothing in what code is executed.  Rewrites are much
harder to review.  But even then we can hope you didn't slip up once in
a hundred lines of code, sure.

The later patches can then be much more readable because there isn't so
much noise mixed in.

> Assuming I can check this in, I will
> also commit to the active GCC branches after a burn-in period.

No, you will never do that.  You always need approval for that.  We have
these procedures for a reason.  We do not want other things than what
was approved committed, doubly so if *nothing* was approved.

>   * config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.

This is a regression.

> +# Print the insns for load and compare with -1/0/1.
> +# Arguments:
> +# lmode  -- Integer mode ("DI", "SI", "HI", or "QI").
> +# result -- "clobber", "GPR", or $lmode
> +# ccmode -- Sign vs. unsigned ("CC" or "CCUNS").
> +# mem_format -- Memory format ("d" or "ds").
> +# cmpl   -- Suffix for compare ("l" or "")
> +# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
> +# extend -- "sign", "zero", or "none".
> +# echr   -- Suffix for load ("a", "z", or "").
> +# load   -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
> +# np -- enum non_prefixed_form for memory type
> +# constraint -- constraint to use
> +# mem_pred   -- predicate for the memory operation

If you need a huge block comment for your sub argument, that is a
not-so-subtle hint that you need to refactor.  Or if this was supposed
to be a refactoring, that something went terribly wrong :-(

Segher

Re: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.

2023-05-26 Thread Segher Boessenkool

On Wed, May 10, 2023 at 11:40:00AM -0400, Michael Meissner wrote:
> This patch applies stricter predicates and constraints for LD and LWA
> instructions with power10 fusion.  These instructions are DS-form 
> instructions,
> which means that the bottom 2 bits of the address must be 0.

The low two bits of the offset, yes.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
>print "  \"\"\n";
>print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
>print "   (set_attr \"cost\" \"8\")\n";
> +
> +  if ($extend eq "sign")
> +{
> +  print "   (set_attr \"sign_extend\" \"yes\")\n";
> +}

You never ever need backslashes like this in Perl code, btw.  For
example:
  print qq{   (set_attr "sign_extend" "yes")\n};
or
  print qq/   (set_attr "sign_extend" "yes")\n/;
or
print <<"HERE"
   (set_attr "sign_extend" "yes")
HERE
or millions of other ways, all of which are much nicer than cramped code
that tries to look like C (but has very different semantics in all ways
that matter).  (Also zillions of ways that are worse still, but that is
the price of freedom maybe :-) )

> -  # Memory predicate to use.
> +  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.

Explain *why*?  It is obvious *what*!

Maybe just split the series into more patches?
> @@ -0,0 +1,26 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */

power10_ok should no longer exist, btw.  Technical debt has to be
repaid :-/

This patch is readable btw.  Thanks :-)

Segher

Re: [PATCH] rs6000: Remove duplicate expression [PR106907]

2023-06-05 Thread Segher Boessenkool

Hi!

On Mon, Jun 05, 2023 at 12:11:42PM +0530, P Jeevitha wrote:
> PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
> expression issue here. Here the same expression is used twice in logical
> AND(&&) operation which result in same result so removing that.
> 
> 2023-06-05  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106907
>   * config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove
>   duplicate expression.
> 
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..d197c3f3289 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -28784,7 +28784,6 @@ vec_const_128bit_to_bytes (rtx op,
>  
>info->all_words_same
>  = (info->words[0] == info->words[1]
> -   && info->words[0] == info->words[1]
> && info->words[0] == info->words[2]
> && info->words[0] == info->words[3]);

Thanks!  Okay for trunk.  Also okay for all backports, no need to wait
if unexpected problems in trunk show up.  But still, backport to 13
first, then 12, then 11, only stop when it stops applying (or there are
no open release branches left) :-)


Segher

[PATCH 1/2] rs6000: genfusion: Rewrite load/compare code

2023-06-06 Thread Segher Boessenkool

This makes the code more readable, more digestible, more maintainable,
more extensible.  That kind of thing.  It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.

The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other.  This patch moves the stuff done for one
iteration to a new _one function.

The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text.  Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.

Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.

Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.


2023-06-06  Segher Boessenkool  

* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and
split out from...
(gen_ld_cmpi_p10): ... this.

---
 gcc/config/rs6000/genfusion.pl | 185 +++--
 1 file changed, 103 insertions(+), 82 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352..2851bb7 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -53,92 +53,113 @@ sub mode_to_ldst_char
 return '?';
 }
 
+sub gen_ld_cmpi_p10_one
+{
+  my ($lmode, $result, $ccmode) = @_;
+
+  my $np = "NON_PREFIXED_D";
+  my $mempred = "non_update_memory_operand";
+  my $extend;
+
+  if ($ccmode eq "CC") {
+# ld and lwa are both DS-FORM.
+($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+  } else {
+if ($lmode eq "DI") {
+  # ld is DS-form, but lwz is not.
+  $np = "NON_PREFIXED_DS";
+  $mempred = "ds_form_mem_operand";
+}
+  }
+
+  my $cmpl = ($ccmode eq "CC") ? "" : "l";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
+  my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
+   : "const_0_to_1_operand";
+
+  # For clobber, we need a SI/DI reg in case we
+  # split because we have to sign/zero extend.
+  my $clobbermode = ($lmode =~ /^[QH]I$/) ? "GPR" : $lmode;
+  if ($result =~ /^EXT/ || $result eq "GPR" || $clobbermode eq "GPR") {
+# We always need extension if result > lmode.
+$extend = ($ccmode eq "CC") ? "sign" : "zero";
+  } else {
+# Result of SI/DI does not need sign extension.
+$extend = "none";
+  }
+
+  my $ldst = mode_to_ldst_char($lmode);
+  print < lmode.
- if ( $ccmode eq 'CC' ) {
- $extend = "sign";
- } else {
- $extend = "zero";
- }
- } else {
- # Result of SI/DI does not need sign extension.
- $extend = "none";
- }
- print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
- print ";; load mode is $lmode result mode is $result compare mode is 
$ccmode extend is $extend\n";
+  foreach my $lmode (qw/DI SI HI QI/) {
+foreach my $result ("clobber", $lmode,  "EXT$lmode") {
+  # EXTDI does not exist, and we cannot directly produce HI/QI results.
+  next if $result =~ /^(QI|HI|EXTDI)$/;
 
- print "(define_insn_and_split 
\"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
- print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
\"=x\")\n";
- print "(compare:${ccmode} (match_operand:${lmode} 1 
\"${mempred}\" \"m\")\n";
- if ($ccmode eq 'CCUNS') { print "   "; }
- print "(match_operand:${lmode} 3 \"${constpred}\" 
\"n\")))\n";
- if ($result eq 'clobber') {
- print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
- } elsif ($result eq $lmode) {
- print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" 
\"=r\") (match_dup 1))

[PATCH 2/2] rs6000: genfusion: Delete dead code

2023-06-06 Thread Segher Boessenkool

2023-06-06  Segher Boessenkool  

* config/rs6000/genfusion.pl: Delete some dead code.

---
 gcc/config/rs6000/genfusion.pl | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 2851bb7..82e8f86 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -347,6 +347,3 @@ EOF
 gen_ld_cmpi_p10();
 gen_logical_addsubf();
 gen_addadd;
-
-exit(0);
-
-- 
1.8.3.1

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Segher Boessenkool

Hi!

On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote:
> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
> wrap/overflow/underflow. So, it is valid that "X - N * M" does
> not cross zero and does not wrap/overflow/underflow.

Is it ever valid semi-generally when N does not divide X?

Say X=5, N=2, M=3.  Then the original expression evaluates to 0, but the
new one to -1.  Whenever one of the divisions rounds up and the other
rounds down you have this problem.

Segher

Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-12 Thread Segher Boessenkool

Hi!

On Sat, Feb 25, 2023 at 03:20:33PM +0530, Ajit Agarwal wrote:
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.

>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p7p8v,p9,p9v,p9kf,p9tf,p10"
>(const_string "any"))

This isn't really about what insn we *can* use here.

> + (and (eq_attr "isa" "p7p8v")
> +   (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> + (const_int 1)

What is needed here is test the *tune* setting.  For example if someone
uses -mcpu=power8 -mtune=power9 (this is a setting that is really used,
or was a few years ago anyway) you *do* want fmr insns generated.

So don't do this via the isa attribute at all, just add some insn
condition (testing the tune setting)?

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Segher Boessenkool

Hi!

On Tue, Jun 13, 2023 at 10:15:49AM +0800, Jiufu Guo wrote:
> David Edelsohn  writes:
> >
> > This definitely seems to be a better solution.
> >
> > The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
> > there is no ChangeLog for the patch.
> 
> Thanks a lot for your quick review!! And sorry for the sending this patch
> in a hurry.  I would update the patch accordingly.

> > This generally looks correct and consistent with other ports. I want
> > to give Segher a chance to double check it, if he wishes.

The documentation is very clear that the only thing for which you can
have BLKmode is "mem".  Not unspec, only "mem".

Let's not do this.  The existing code has clear and obvious semantics,
which is documented as well -- there is no reason to make it worse in
every respect!

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Segher Boessenkool

Hi!

As I said in a reply to the original patch: not okay.  Sorry.

But some comments on this patch:

On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> +   && XINT (SET_SRC (set), 1) == UNSPEC_TIE
> +   && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);

This makes it required that the operand of an UNSPEC_TIE unspec is a
const_int 0.  This should be documented somewhere.  Ideally you would
want no operand at all here, but every unspec has an operand.

> +  RTVEC_ELT (p, i)
> + = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
> + UNSPEC_TIE));

If it is hard to indent your code, your code is trying to do to much.
Just have an extra temporary?

  rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), UNSPEC_TIE);
  RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);

That is shorter even, and certainly more readable :-)

> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>operands[4] = gen_frame_mem (Pmode, operands[1]);
>p = rtvec_alloc (1);
>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
> -   const0_rtx);
> +   gen_rtx_UNSPEC (BLKmode,
> +   gen_rtvec (1, const0_rtx),
> +   UNSPEC_TIE));
>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);

I have a hard time to see how this could ever be seen as clearer or more
obvious or anything like that :-(


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote:
> The generic issue here is to fix (not "papering over") the signed
> overflow, we need to perform the addition in a target machine mode.  We
> may always use Pmode (IIRC const_anchor was introduced for optimizing
> some constant addresses), but can we do better?

The main issue is that the machine description generated target code to
compute some constants, but the sanitizer treats it as if it was user
code that might do wrong things.

> Should we try addition in both DImode and SImode for a 64-bit capable
> machine?

Why?  At least on PowerPC there is only one insn, and it is 64 bits.
The SImode version just ignores all bits other than the low 32 bits, in
both inputs and output.

> Or should we even try more operations than addition (for eg bit
> operations like xor or shift)?  Doing so will need to create a new
> target hook for const anchoring, this is the "complete rework" I meant.

This might make const anchor useful for way more targets maybe,
including rs6000, yes :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> I'm also thinking about other solutions:
> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>   This is the existing pattern.  It may be read as an action
>   to clean an unknown-size memory block.

Including a size zero memory block, yes.  BLKmode was originally to do
things like bcopy (before modern names like memcpy were more usually
used), and those very much need size zero as well.

> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> UNSPEC_TIE".
>   Current patch is using this one.

What would be the semantics of that?  Just the same as the current stuff
I'd say, or less?  It cannot be more!

> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> UNSPEC_TIE".
>This avoids using BLK on unspec, but using DI.

And is incorrect because of that.

> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
> UNSPEC_TIE"
>There is still a mode for the unspec.

It has VOIDmode here, which is incorrect.

> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE
> >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
> >
> > This makes it required that the operand of an UNSPEC_TIE unspec is a
> > const_int 0.  This should be documented somewhere.  Ideally you would
> > want no operand at all here, but every unspec has an operand.
> 
> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".

Yes.  But we should write down somewhere (in a comment near the unspec
constant def for example) what the operand is -- so, "operand is usually
(const_int 0) because we have to put *something* there" or such.  The
clumsiness of this is enough for me to prefer some other solution
already ;-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote:
> On Wed, 14 Jun 2023, Jiufu Guo wrote:
> > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> > UNSPEC_TIE".
> >This avoids using BLK on unspec, but using DI.
> 
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Or DSE can delete this tie even, if it can see some later store to the
same location without anything in between that can read what the tie
stores.

BLKmode avoids all of this.  You can call that elegant, you can call it
cheating, you can call it many things -- but it *works*.

> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?

Form rs6000.md:
; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"

and from rs6000-logue.cc:
/* This ties together stack memory (MEM with an alias set of frame_alias_set)
   and the change to the stack pointer.  */
static void
rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)

A big reason this is needed is because of all the hard frame pointer
stuff, which the generic parts of GCC require, but there is no register
for that in the Power architecture.  Nothing is an issue here in most
cases, but sometimes we need to do unusual things to the stack, say for
alloca.

> I'd say a
> 
>   (may_clobber (mem:BLK (reg:DI 1 1)))

"clobber" always means "may clobber".  (clobber X) means X is written
with some unspecified value, which may well be whatever value it
currently holds.  Via some magical means or whatever, there is no
mechanism specified, just the effects :-)

> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?

It is the same thing.  "clobber" means the same thing as "set", except
the value that is written is not specified.

> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).

"clobber" is nicer than the set to (const_int 0).  Does it work though?
All this code is always fragile :-/  I'm all for this change, don't get
me wrong, but preferably things stay in working order.

We use "stack_tie" as a last resort heavy hammer anyway, in all normal
cases we explain the actual data flow explicitly and correctly, also
between the various registers used in the *logues.

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 05:26:52PM +0800, Jiufu Guo wrote:
> Richard Biener  writes:
> >> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> >> UNSPEC_TIE".
> >>This avoids using BLK on unspec, but using DI.
> >
> > That gives the MEM a size which means we can interpret the (set ..)
> > as killing a specific area of memory, enabling DSE of earlier
> > stores.
> 
> Oh, thanks!
> While with 'unspec:DI', I'm wondering if it means this 'set' would
> do some special things other than pure 'set' to the memory. 

No, that is not what unspec means.  It just means "some DImode value I'm
not telling you anything about".  If to get that value there is some
important work done (that should not be oprimised away, say) you need
unspec_volatile, which means just that: there is an unspecified side
effect done by that insn, so it has to be done on the real machine
exactly like on the abstract C machine, so the insn has big restrictions
on being moved and removed etc.

We can replace the RHS of (almost) *every* set with an unspec, and the
compiler would still work, just would generate pretty lousy code.  But
at least CSE and DSE (and everything else purely dataflow) would still
work :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 09:52:37AM +, Richard Biener wrote:
> I see.  So
> 
> (parallel
>  (unspec stack_tie)
>  (clobber (mem:BLK ...)))

Written like this, without a "set", *every* unspec has to be an
unspec_volatile, for the same reason as all inline asms without outputs
always are considered volatile asm.  The "unspec" arm of the parallel
can be omitted, and if that is valid RTL (possibly after other changes,
like omitting the parallel,replacing it by its one remaining arm), this
is a prefectly valid transformation.

> I suppose it needs to be an unspec_volatile?  It feels like
> the stack_ties are a delicate hack preventing enough but not too
> much optimization ...

Yes.  So let's please not disturb it :-)

It isn't a "delicate" hack I would say, but its effects are needed in
some places, and messing it up leads to hard to debug problems.  Which
had happened time and time again over the years.

It just is hard to deal with variable sized stack adjustments and the
like.  As long as we only use stack ties in such unusual cases, all is
fine.  There are worse things, like what we have the
frame_pointer_needed_indeed thing for :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

On Wed, Jun 14, 2023 at 09:22:09AM +, Richard Biener wrote:
> How can a clobber be validly dropped?

Same as any other set: if no code executed after it can read whatever is
written.  This typically means a stack frame goes away, or simply no
more code is executed *at all* after this.

> For the case of stack
> memory if there's no stack use after it it could be elided
> and I suppose the clobber itself can be moved.  But then
> the function return is a stack use as well.

A function return does not access the stack at all on most
architectures, including PowerPC.  Some epilogue insns can do, of
course, but we expand to separate insns during expand already.

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 10:04:20AM +0100, Richard Sandiford wrote:
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.

Yes, it is a set with an unspecified RHS.

> I think the main difference between:
> 
>   (clobber (mem:BLK …))
> 
> and
> 
>   (set (mem:BLK …) (unspec:BLK …))
> 
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

No, the latter can be removed as well, under exactly the same
conditions: if no code after it reads what was written.  This happens in
branches marked dead.


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

On Wed, Jun 14, 2023 at 06:25:10PM +0200, Richard Biener wrote:
> > Form rs6000.md:
> > ; This is to explain that changes to the stack pointer should
> > ; not be moved over loads from or stores to stack memory.
> > (define_insn "stack_tie"
> 
> That suggests it’s the hard register value that‘s protected, not the memory 
> pointed to.  I suppose that means an unspec volatile with the reg as input 
> would serve the same?

No?  It says what it says.  That is pretty vague language, of course,
not entirely by accident no doubt.

> Or maybe that’s not the whole story.
> 
> 
> > and from rs6000-logue.cc:
> > /* This ties together stack memory (MEM with an alias set of 
> > frame_alias_set)
> >   and the change to the stack pointer.  */
> > static void
> > rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
> 
> I cannot make sense of that comment, but not sure if I really want to know …

It really is the same thing: this is a bloody heavy hammer keeping the
change to the stack pointer (or "hard" frame pointer) in place wrt any
accesses to the stack memory.

If there was a nice portable way to avoid needing this we haven't found
it yet -- or a non-portable way even, and it doesn't have to be all that
nice either come to think of it :-)


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Segher Boessenkool

On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote:
> >>   This is the existing pattern.  It may be read as an action
> >>   to clean an unknown-size memory block.
> >
> > Including a size zero memory block, yes.  BLKmode was originally to do
> > things like bcopy (before modern names like memcpy were more usually
> > used), and those very much need size zero as well.h
> 
> The size is possible to be zero.  No asm code needs to
> be generated for "set 'const_int 0' to zero size memory"".
> stack_tie does not generate any real code.  It seems ok :)
> 
> While, it may not be zero size mem.  This may be a concern.
> This is one reason that I would like to have an unspec_tie.

It very much *can* be a zero size mem, that is perfectly find for
mem:BLK.

> Another reason is unspec:blk is used but various ports :) 

unspec:BLK is undefined.  BLKmode is allowed on mem only.

> >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> >> UNSPEC_TIE".
> >>   Current patch is using this one.
> >
> > What would be the semantics of that?  Just the same as the current stuff
> > I'd say, or less?  It cannot be more!
> 
> The semantic that I trying to achieve is "this is a special
> insn, not only a normal set to unknown size mem".

What does that *mean*?  "Special instruction"?  What would what code do
for that?  What would the RTL mean?

> As you explained before on 'unspec:DI', the unspec would
> just decorate the set_src part: something DI value with
> machine-specific operation.

An unspec is an operation on its operands, giving some (in this case)
DImode value.  There is nothing special about that operation, it can be
optimised like any other, it's just not specified what exactly that
value is (to the generic compiler, the backend itself can very much
optimise stuff with it).

> But, since 'tie_operand' is checked for this insn.
> If 'tie_operand' checks UNPSEC_TIE, then the insn
> with UNPSEC_TIE is 'a special insn'.  Or interpret
> the semantic of this insn as: this insn stack_ite
> indicates "set/operate a zero size block".

tie_operand is a predicate.  The predicate of an insn has to return 1,
or the insn is not recognised.  You can do the same in insn conditions
always (in principle anyway).

Segher

Re: [PATCH] testsuite: Fix incorrect -mfloat128-type option

2022-07-06 Thread Segher Boessenkool

On Wed, Jul 06, 2022 at 07:15:08PM +0100, Jonathan Wakely wrote:
> Tested powerpc64-linux, OK for trunk?

Okay, thanks!

> -! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax -mfloat128-type" }
> +! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax -mfloat128" }

We really shouldn't have -mpower9-minmax selectable independently of
-mcpu=power9, but that is a different thing.

Thanks,


Segher

Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-07 Thread Segher Boessenkool

Hi!

On Thu, Jul 07, 2022 at 04:30:50PM +0800, HAO CHEN GUI wrote:
>   This patch modifies the combine pattern after recog fails. With a helper

It modifies combine itself, not just a pattern in the machine
description.

> - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with
> a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT
> or AND. The conversion helps pattern to match rotate and mask insn on some
> targets.

>   For test case rlwimi-2.c, current trunk fails on
> "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch
> reduces the total number of insns from 21305 to 21279.

That is incorrect.  You need to figure out what actually changed, and if
that is wanted or not, and then write some explanation about that.

>   * config/rs6000/rs6000.md (plus_ior_xor): Removed.
>   (anonymous split pattern for plus_ior_xor): Removed.

"Remove.", in both cases.  Always use imperative in changelogs and
commit messages and the like, not some passive tense.

> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
> +   ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits

s/psuedo/pseudo/

> +   is less than its mode mask.  The nonzero_bits in other pass doesn't return
> +   the same value as it does in combine pass.  */

That isn't quite the problem.

Later passes can return a mask for nonzero_bits (which means: bits that
are not known to be zero) that is not a superset of what was known
during combine.  If you use nonzero_bits in the insn condition of a
define_insn (or define_insn_and_split, same thing under the covers) you
then can end up with an insns that is fine during combine, but no longer
recog()ed later.

> +static bool
> +change_pseudo_and_mask (rtx pat)
> +{
> +  rtx src = SET_SRC (pat);
> +  if ((GET_CODE (src) == IOR
> +   || GET_CODE (src) == XOR
> +   || GET_CODE (src) == PLUS)
> +  && (((GET_CODE (XEXP (src, 0)) == ASHIFT
> + || GET_CODE (XEXP (src, 0)) == AND)
> +&& REG_P (XEXP (src, 1)
> +{
> +  rtx *reg = &XEXP (SET_SRC (pat), 1);

Why the extra indirection?  SUBST is a macro, it can take lvalues just
fine :-)

> +  machine_mode mode = GET_MODE (*reg);
> +  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
> +  if (nonzero < GET_MODE_MASK (mode))
> + {
> +   int shift;
> +
> +   if (GET_CODE (XEXP (src, 0)) == ASHIFT)
> + shift = INTVAL (XEXP (XEXP (src, 0), 1));
> +   else
> + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1)));
> +
> +   if (shift > 0
> +   && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero)

Too many parens.

> + {
> +   unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1;
> +   rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask));
> +   SUBST (*reg, x);
> +   maybe_swap_commutative_operands (SET_SRC (pat));
> +   return true;
> + }
> + }
> + }
> +  return false;

Broken indentation.

> --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> @@ -12,7 +12,7 @@ void rotins (unsigned int x)
>b.y = (x<<12) | (x>>20);
>  }
> 
> -/* { dg-final { scan-assembler-not {\mrlwinm} } } */
> +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */

Why?

> +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */

Can this just be
  /* { dg-final { scan-assembler-times {\mrl[wd]imi\M} 2 } } */
or is it necessary to not want rlwimi on 64-bit?

> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -2,14 +2,14 @@
>  /* { dg-options "-O2" } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } 
> } */

You are saying there should be 21279 instructions generated by this test
case.  What makes you say that?  Yes, we regressed some time ago, we
generate too many insns in many cases, but that is *bad*.

>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } 
> } */

This needs an explanation (and then the 32-bit and 64-bit checks can be
merged).


This probably needs changes after 4306339798b6 (if it is still wanted?)


Segher

Re: [PATCH] Be careful with MODE_CC in simplify_const_relational_operation.

2022-07-07 Thread Segher Boessenkool

Hi!

On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> I think it's fair to describe RTL's representation of condition flags
> using MODE_CC as a little counter-intuitive.

"A little challenging", and you should see that as a good thing, as a
puzzle to crack :-)

> For example, the i386
> backend represents the carry flag (in adc instructions) using RTL of
> the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs
> to be taken not to treat this like a normal RTX expression, after all
> LTU (less-than-unsigned) against const0_rtx would normally always be
> false.

A comparison of a MODE_CC thing against 0 means the result of a
*previous* comparison (or other cc setter) is looked at.  Usually it
simply looks at some condition bits in a flags register.  It does not do
any actual comparison: that has been done before (if at all even).

> Hence, MODE_CC comparisons need to be treated with caution,
> and simplify_const_relational_operation returns early (to avoid
> problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.

Not just to avoid problems: there simply isn't enough information to do
a correct job.

> However, consider the (currently) hypothetical situation, where the
> RTL optimizers determine that a previous instruction unconditionally
> sets or clears the carry flag, and this gets propagated by combine into
> the above expression, we'd end up with something that looks like
> (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says.
> Fortunately, simplify_const_relational_operation is passed the
> original mode of the comparison (cmp_mode, the original mode of op0)
> which can be checked for MODE_CC, even when op0 is now VOIDmode
> (const_int) after the substitution.  Defending against this is clearly the
> right thing to do.
> 
> More controversially, rather than just abort simplification/optimization
> in this case, we can use the comparison operator to infer/select the
> semantics of the CC_MODE flag.  Hopefully, whenever a backend uses LTU,
> it represents the (set) carry flag (and behaves like i386.md), in which
> case the result of the simplified expression is the first operand.
> [If there's no standardization of semantics across backends, then
> we should always just return 0; but then miss potential optimizations].

On PowerPC, ltu means the result of an unsigned comparison (we have
instructions for that, cmpl[wd][i] mainly) was "smaller than".  It does
not mean anything is unsigned smaller than zero.  It also has nothing to
do with carries, which are done via a different register (the XER).

> +  /* Handle MODE_CC comparisons that have been simplified to
> + constants.  */
> +  if (GET_MODE_CLASS (mode) == MODE_CC
> +  && op1 == const0_rtx
> +  && CONST_INT_P (op0))
> +{
> +  /* LTU represents the carry flag.  */
> +  if (code == LTU)
> + return op0 == const0_rtx ? const0_rtx : const_true_rtx;
> +  return 0;
> +}
> +
>/* We can't simplify MODE_CC values since we don't know what the
>   actual comparison is.  */

^^^
This comment is 100% true.  We cannot simplify any MODE_CC comparison
without having more context.  The combiner does have that context when
it tries to combine the CC setter with the CC consumer, for example.

Do you have some piece of motivating example code?


Segher

Re: [PATCH/RFC] combine_completed global variable.

2022-07-08 Thread Segher Boessenkool

Hi!

On Thu, Jul 07, 2022 at 08:40:38PM +0100, Roger Sayle wrote:
> Although I've not been able to reproduce your ICE (using gcc135
> on the compile farm), I completely agree with Segher's analysis
> that the Achilles heel with my approach/patch is that there's
> currently no way for the backend/recog to know that we're in a
> pass after combine.

To know combine has been run at least once already, yes.  It currently
won't run more than one of course, but the only thing that really stops
running it more than once was historically that combine was one of the
more expensive passes.  Now otoh it is really quite cheap.

Currently combine is one of the last passes before RA.  It likely will
be nice to run combine quite early as well (or a limited version of it
that is), just like other optimisation passes already are, including its
closest relative fwprop.

> Rather than give up on this optimization (and a similar one for
> I386.md where test;sete can be replaced by xor $1 when combine
> knows that nonzero_bits is 1, but loses that information afterwards),
> I thought I'd post this "strawman" proposal to add a combine_completed
> global variable, matching the reload_completed and regstack_completed
> global variables already used (to track progress) by the middle-end.

It should be called differently, given the above.

> I was wondering if I could ask you could test the attached patch
> in combination with my previous rs6000.md patch (with the obvious
> change of reload_completed to combine_completed) to confirm
> that it fixes the problems you were seeing.
> 
> Segher/Richard, would this sort of patch be considered acceptable?
> Or is there a better approach/solution?

There is the thing I have been working on for about a year now, that
makes combine's nonzero_bits superfluous, by making the generic version
more capable than combine's version.  That is a better way forward imo,
it solves many other problems as well.  But it isn't ready yet.

Your patch is fine with a name change (and documentation change)
indicating it means that combine has run at least once.

Thanks!

Segher

Re: [x86 PATCH] Fun with flags: Adding stc/clc instructions to i386.md.

2022-07-08 Thread Segher Boessenkool

Hi!

On Fri, Jul 08, 2022 at 08:14:58AM +0100, Roger Sayle wrote:
> This patch adds support for x86's single-byte encoded stc (set carry flag)
> and clc (clear carry flag) instructions to i386.md.

Maybe add a test for clc as well?  Because:

> +(define_insn "x86_clc"
> +  [(set (reg:CCC FLAGS_REG) (const_int 0))]
> +  ""
> +  "stc"
> +  [(set_attr "length" "1")
> +   (set_attr "length_immediate" "0")
> +   (set_attr "modrm" "0")])

Spot the problem :-)

> +/* { dg-final { scan-assembler "stc" } } */

This checks if the substring "stc" occurs anywhere in the generated
assembler code.  More robust is to use scan-assembler-times, and to use
\mstc\M (same as \ in some other languages, or \bstc\b in Perl).

Segher

Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-11 Thread Segher Boessenkool

Hi!

On Mon, Jul 11, 2022 at 10:13:41AM +0800, HAO CHEN GUI wrote:
> I did a biset for the problem. After commit "commit 8d2d39587: combine: Do 
> not combine
> moves from hard registers", the case fails. The root cause is it can't 
> combine from the
> hard registers and has to use subreg which causes its high part to be 
> undefined. Thus,
> there is an additional "AND" generated.
> 
> Before the commit
> Trying 2 -> 7:
> 2: r125:DI=%3:DI
>   REG_DEAD %3:DI
> 7: r128:SI=r125:DI#0 0>>0x1f
>   REG_DEAD r125:DI
> Successfully matched this instruction:
> (set (reg:SI 128 [ x ])
> (lshiftrt:SI (reg:SI 3 3 [ x ])
> (const_int 31 [0x1f])))
> allowing combination of insns 2 and 7
> 
> After the commit
> Trying 20 -> 7:
>20: r125:DI=r132:DI
>   REG_DEAD r132:DI
> 7: r128:SI=r125:DI#0 0>>0x1f
>   REG_DEAD r125:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 128 [ x ]) 0)
> (zero_extract:DI (reg:DI 132)
> (const_int 32 [0x20])
> (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 128 [ x ]) 0)
> (and:DI (lshiftrt:DI (reg:DI 132)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0x])))
> allowing combination of insns 20 and 7
> 
> The problem should be fixed in another case? Please advice.

You should not change the expected counts to what is currently
generated.  What is currently generated is sub-optimal.  It all starts
with those zero_extracts, which are always bad for us -- it is a harder
to manipulate representation of a limited subset of more basic
operations we *do* have.  And combine and simplify can handle the more
general and simpler formulation just fine.

Ideally combine would not try to use *_extract at all if this is not
used in the machine description (compare to rotatert for example, a
similarly redundant operation).  But it currently needs it as
intermediate form, untangling this all is quite a bit of work.

These testcases (all the rl* ones) should have a big fat comment
explaining what the expected, wanted code is.

This was easier to do originally, when I actually tested all 65536
possibly combinations, because the expected counts were more "regular"
numbers.  But this is too slow to test in normal testsuite runs :-)

It is wrong to pretend the current state makes the wanted code, these
testcases are meant to show exactly when we make suboptimal machine
code :-)

Segher

Re: [PATCH v3, rs6000] Disable TImode from Bool expanders [PR100694, PR93123]

2022-07-12 Thread Segher Boessenkool

Hi!

On Mon, Jul 04, 2022 at 02:27:42PM +0800, HAO CHEN GUI wrote:
>   This patch fails TImode for all 128-bit logical operation expanders. So
> TImode splits to two DI registers during expand. Potential optimizations can
> be taken after expand pass. Originally, the TImode logical operations are
> split after reload pass. It's too late. The test case illustrates it.

Out of interest, did you see any performance win?  There is a lot of
opportunity for this to cause performance *loss*, on newer systems :-(

> ChangeLog
> 2022-07-04 Haochen Gui 

Two spaces both before and after your name, in changelogs.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7078,27 +7078,38 @@ (define_expand "subti3"
>  })
>  
>  ;; 128-bit logical operations expanders
> +;; Fail TImode in all 128-bit logical operations expanders and split it into
> +;; two DI registers.
> 
>  (define_expand "and3"
>[(set (match_operand:BOOL_128 0 "vlogical_operand")
>   (and:BOOL_128 (match_operand:BOOL_128 1 "vlogical_operand")
> (match_operand:BOOL_128 2 "vlogical_operand")))]
>""
> -  "")
> +{
> +  if (mode == TImode)
> +FAIL;
> +})

It is better to not FAIL it, but simply not have a pattern for the
TImode version at all.

Does nothing depend on the :TI version to exist?

What about the :PTI version?  Getting rid of that as well will allow
some nice optimisations.

Of course we *do* have instructions to do such TImode ops, on newer
CPUs, but in vector registers only.  It isn't obvious what is faster.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100694.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times {\mstd\M} 2 } } */
> +/* { dg-final { scan-assembler-not {\mli\M} } } */
> +/* { dg-final { scan-assembler-not {\mor\M} } } */
> +
> +/* It just needs two std.  */
> +void foo (unsigned __int128* res, unsigned long long hi, unsigned long long 
> lo)
> +{
> +   unsigned __int128 i = hi;
> +   i <<= 64;
> +   i |= lo;
> +   *res = i;
> +}

You can also just count the number of generated insns:
/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 3 } } */
(three, not two, because of the blr insn at the end).

If possible, we should simply not do :TI ops on older systems at all,
and only on the newer systems that have instructions for it (and that
does not fix PR100694 btw, the problems there have to be solved, not
side-stepped :-( )


Segher

Re: [PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread Segher Boessenkool

Hi!

On Wed, Jul 13, 2022 at 01:18:29PM -0500, will schmidt wrote:
>   This cleans up some of the naming around the vstrir and vstril
> instruction definitions, with some cosmetic changes for consistency.

> gcc/
>   * config/rs6000/altivec.md (vstrir_code_): Rename
>   to vstrir_internal_.
>   (vstrir_p_code_): Rename to vstrir_p_internal_.
>   (vstril_code_): Rename to vstril_internal_.
>   (vstril_p_code_): Rename to vstril_p_internal_.

It doesn't show the new names on the lhs this way.  One way to do
better
is to write e.g.
(vstril_code_): Rename to...
(vstril_internal_): ... this.

It often is a good idea to say "... for VIshort" and similar btw.

I'm not a fan of "internal" either, it doesn't say anything.  At least
put it at the very end of the names please?

Okay for trunk with that changed.  Thanks!

Segher

Re: [PATCH, rs6000] Cleanup some vstrir define_expand naming inconsistencies

2022-07-13 Thread Segher Boessenkool

On Wed, Jul 13, 2022 at 04:14:11PM -0500, will schmidt wrote:
> On Wed, 2022-07-13 at 14:39 -0500, Segher Boessenkool wrote:
> > I'm not a fan of "internal" either, it doesn't say anything.  At
> > least
> > put it at the very end of the names please?
> I'm easily convinced. ;-)  I wonder if I should just drop "_internal"
> entirely and go with "vstrir_".  Otherwise I'll rework to be
> "vstrir__internal".

The define_expand already uses that name.  Some other patterns in
altivec.md use *_direct, maybe that is nicer?

> At a glance I see we do have some other existing define_insn entries
> with _internal at the tail and a few others embedded in the middle. 
> I'll leave a note and perhaps review those after.  :-)

Thanks :-)


Segher

Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Segher Boessenkool

On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote:
> I have applied the patch to GCC 12.
> 
> | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001
> | From: Michael Meissner 
> | Date: Thu, 14 Jul 2022 11:16:08 -0400
> | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for 
> block copies.
> 
> Testing has found that using load and store vector pair for block copies
> can result in a slow down on power10.  This patch disables using the
> vector pair instructions for block copies if we are tuning for power10.
> 
> 2022-06-11   Michael Meissner  
> 
> gcc/
> 
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do
>   not generate block copies with vector pair instructions if we are
>   tuning for power10.  Back port from master branch.

You never posted the trunk version of this, so that never was approved
either.

> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p)
>  
>if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
>  {
> -  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX)
> +  /* Do not generate lxvp and stxvp on power10 since there are some
> +  performance issues.  */
> +  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> +   && rs6000_tune != PROCESSOR_POWER10)
>   rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
>else
>   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;

The TARGET_MMA in that should not be there.  Please fix that (that
probably needs more changes).

This statement does the opposite of what the comment says.

Please fix this.  On trunk, first.


Segher

Re: [GCC 12 backport] Disable generating load/store vector pairs for block copies.

2022-07-14 Thread Segher Boessenkool

On Thu, Jul 14, 2022 at 05:49:57PM -0400, Michael Meissner wrote:
> On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote:
> > You never posted the trunk version of this, so that never was approved
> > either.
> 
> I did post the trunk version on June 10th, and your only comment was fix the
> commit message, which I thought I did in the commit.

I did not approve the patch.  Of course not, I didn't even get as far as
reading it.  You should have fixed it and sent again, I did not approve
anything.

> > > +  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> > > +   && rs6000_tune != PROCESSOR_POWER10)
> > >   rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > >else
> > >   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> > 
> > The TARGET_MMA in that should not be there.  Please fix that (that
> > probably needs more changes).
> 
> All of the movoo and movxo support require TARGET_MMA as does the code in
> rs6000-string.cc that could possibly generate load/store vector pair.

And all that is wrong and should be fixed.

> To
> remove the check here would mean also fixing all of the vector load and store
> pairs in mma.md.

That is wha I said, yes,.

> > This statement does the opposite of what the comment says.
> > 
> > Please fix this.  On trunk, first.

This is the core problem with this patch: it is simply wrong.

It is a very roundabout way of saying "only enable vector pairs if
-mcpu=power10 but -mtune=somethingelse".  Which is not a sensible
thing to do, and not what the comment says either.


Segher

Re: [PATCH, rs6000, v2] Cleanup some vstrir define_expand naming inconsistencies

2022-07-20 Thread Segher Boessenkool

On Tue, Jul 19, 2022 at 03:14:52PM -0500, will schmidt wrote:
>   This cleans up some of the naming around the vstrir and vstril
> instruction definitions, with some cosmetic changes for consistency.

Okay for trunk.  Thanks!


Segher

Re: [PATCH] rs6000/test: Update some cases with -mdejagnu-tune

2022-07-21 Thread Segher Boessenkool

Hi!

On Wed, Jul 20, 2022 at 05:31:11PM +0800, Kewen.Lin wrote:
> As PR106345 shows, some test cases should be updated with
> -mdejagnu-tune, since their test points are sensitive to
> rs6000_tune, such as: group_ending_nop, loop align (ic),
> float conversion cost etc.

It does not make sense to require -mdejagnu-tune= if -mdejagnu-cpu= is
already given?  What is the failure case?

> This patch is to replace -mdejagnu-cpu with -mdejagnu-tune
> or append -mdejagnu-tune (keep the original -mdejagnu-cpu
> when it's required) accordingly.

It is *always* required.  Testcases with -mtune= but unspecified -mcpu=
make no sense.

> --- a/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c
> +++ b/gcc/testsuite/gcc.target/powerpc/compress-float-ppc-pic.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target powerpc_fprs } } */
> -/* { dg-options "-O2 -fpic -mdejagnu-cpu=power5" } */
> +/* { dg-options "-O2 -fpic -mdejagnu-cpu=power5 -mdejagnu-tune=power5" } */
>  /* { dg-require-effective-target fpic } */

This should only make a difference if you have -mtune= in your
RUNTEST_FLAGS, and you shouldn't do silly things like that.  I suspect
you see it in other cases, and those are actual bugs then, that need
actual fixing instead of sweeping under the carper.

The testcase suggests this is with a compiler configured with
--with-cpu= --with-tune=, which should just work, and -mcpu= should
override both of those!

Segher

Re: [PATCH] Remove setting -mblock-ops-vector-pair on power10.

2022-07-21 Thread Segher Boessenkool

On Thu, Jul 21, 2022 at 02:42:29AM -0400, Michael Meissner wrote:
> Testing has shown that using the load vector pair and store vector pair
> instructions for block moves has some performance issues on power10.  This
> patch does not set this option by default.  If it is a win in other
> machines in the future, this flag can be set in the ISA options.

This would make rs6000_isa_flags an even bigger misnomer than it already
is, sigh.

>   * config/rs6000/rs6000.cc (rs6000_option_override_internal):
>   Do not enable -mblock-ops-vector-pair by default on power10.

Do not wrap lines early, especially if that would mean leaving a colon
at the end of a line.  Changelog lines are 80 positions long (including
the leading tab, which counts as eight).

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4139,17 +4139,6 @@ rs6000_option_override_internal (bool global_init_p)
>   rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_UNALIGNED_VSX;
>  }
>  
> -  if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR))
> -{
> -  /* Do not generate lxvp and stxvp on power10 since there are some
> -  performance issues.  */
> -  if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX
> -   && rs6000_tune != PROCESSOR_POWER10)
> - rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> -  else
> - rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
> -}

How does this implement what the changelog says it does?  With what it
does the changelog should instead say to not touch it at all (your patch
also disables the code that disables it!)

It isn't clear what you intended: what your changelog says, or what the
code does.

Segher

Re: [PATCH] rs6000/test: Fix empty TU in some cases of effective targets

2022-07-21 Thread Segher Boessenkool

On Wed, Jul 20, 2022 at 05:32:01PM +0800, Kewen.Lin wrote:
> As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
> PR106345 shows, some test sources for some powerpc effective
> targets use empty translation unit wrongly.  The test sources
> could go with options like "-ansi -pedantic-errors", then those
> effective target checkings will fail unexpectedly with the
> error messages like:
> 
>   error: ISO C forbids an empty translation unit [-Wpedantic]
> 
> This patch is to fix empty TUs with one dummy variable definition
> accordingly.

You can also use
  enum{a};
which is shorter, but more importantly does not generate any code.
You can also do
  extern int dummy;
of course -- same idea, no definitions, only declarations.

> I'll push this soon if no objections.

> @@ -6523,6 +6531,7 @@ proc check_effective_target_ppc_float128 { } {
>   #ifndef __FLOAT128__
> nope no good
>   #endif
> + int dummy;

At least put it in #else then?  Or just do things a bit more elegantly
(do a dummy function around this for example).


Segher

Re: [PATCH] rs6000/test: Fix empty TU in some cases of effective targets

2022-07-21 Thread Segher Boessenkool

Hi!

On Fri, Jul 22, 2022 at 08:41:43AM +0800, Kewen.Lin wrote:
> Hi Segher,
> 
> Thanks for the comments!

Always.

> >> This patch is to fix empty TUs with one dummy variable definition
> >> accordingly.
> > 
> > You can also use
> >   enum{a};
> > which is shorter, but more importantly does not generate any code.
> > You can also do
> >   extern int dummy;
> > of course -- same idea, no definitions, only declarations.
> 
> The used "int dummy" follows some existing practices, IMHO in this
> context it doesn't matter that it will generate code or not, any of
> these alternatives still generates an assembly or object file, but
> the generated file gets removed after the checking.

It doesn't matter here, sure.  But it is certainly simple enough to make
it "extern int dummy" instead, not giving a bad example for future cases
where it may matter :-)

> May I still keep this "int dummy" to align with existing practices?

Of course, it was just advice.  If things are wrong (in my opinion that
is!), I'll say so.

> > At least put it in #else then?  Or just do things a bit more elegantly
> > (do a dummy function around this for example).
> 
> OK, since it can still emit error even without "#else", I didn't bother
> to add it.  I will add it, and update the "nope no good" to "#error
> doesn't have float128 support".

Just say

===
void nope (void)
{
#ifndef __FLOAT128__
nope no good
#endif
}
===

which works in all cases?

Less maintenance is a good thing :-)


Segher

Re: [PATCH] rs6000/test: Update some cases with -mdejagnu-tune

2022-07-22 Thread Segher Boessenkool

On Fri, Jul 22, 2022 at 10:22:51AM +0800, Kewen.Lin wrote:
> on 2022/7/22 02:48, Segher Boessenkool wrote:
> > On Wed, Jul 20, 2022 at 05:31:11PM +0800, Kewen.Lin wrote:
> >> As PR106345 shows, some test cases should be updated with
> >> -mdejagnu-tune, since their test points are sensitive to
> >> rs6000_tune, such as: group_ending_nop, loop align (ic),
> >> float conversion cost etc.
> > 
> > It does not make sense to require -mdejagnu-tune= if -mdejagnu-cpu= is
> > already given?  What is the failure case?
> > 
> 
> I think cpu setting only sets tune setting when tune setting isn't
> explicitly provided as:
> 
>   if (rs6000_tune_index >= 0)
> tune_index = rs6000_tune_index;
>   else if (cpu_index >= 0)
> rs6000_tune_index = tune_index = cpu_index;
> 
> As PR106345 shows, GCC can use an explicit tune setting when it's
> configured, even if there is one "-mdejagnu-cpu=", it doesn't
> override the explicit given one, so we need one explicit
> "-mdejagnu-tune=".

And that is the problem.  GCC's automatic setting is *not* an explicit
option, not given by the user.  --with-tune= should not result in adding
an -mtune= option in the resulting compiler, it should not set command-
line options.

> Although the test case has adopted option "-mdejagnu-cpu=power7", but
> the configured "--with-tune-64=power9" takes effect and make it
> return align_loops instead of align_flags (5).

And it should not do that.

> >> This patch is to replace -mdejagnu-cpu with -mdejagnu-tune
> >> or append -mdejagnu-tune (keep the original -mdejagnu-cpu
> >> when it's required) accordingly.
> > 
> > It is *always* required.  Testcases with -mtune= but unspecified -mcpu=
> > make no sense.
> 
> The loop_align.c testings made me think if we know the insn count for
> the loop on all cpus is in range (4,8] then the cpu setting doesn't matter.

Sure, it probably works without -mcpu=, but it does not make sense :-)

Only using -mtune= while not having -mcpu= serves no purpose in any
"normal" use, so we shouldn't do that in the testsuite either.

> > This should only make a difference if you have -mtune= in your
> > RUNTEST_FLAGS, and you shouldn't do silly things like that.  I suspect
> > you see it in other cases, and those are actual bugs then, that need
> > actual fixing instead of sweeping under the carper.
> 
> Unfortunately it's due to the explicit tune setting in configuration.

So that needs some actual fixes.  Something in how --with-tune= works
is suboptimal?

> > The testcase suggests this is with a compiler configured with
> > --with-cpu= --with-tune=, which should just work, and -mcpu= should
> > override both of those!
> 
> Unfortunately -mcpu= (-mdejagnu-cpu=) doesn't actually override here.

... or that.


Segher

Re: [PATCH] Add new target hook: simplify_modecc_const.

2022-07-26 Thread Segher Boessenkool

Hi!

On Tue, Jul 26, 2022 at 01:13:02PM +0100, Roger Sayle wrote:
> This patch is a major revision of the patch I originally proposed here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598040.html
> 
> The primary motivation of this patch is to avoid incorrect optimization
> of MODE_CC comparisons in simplify_const_relational_operation when/if a
> backend represents the (known) contents of a MODE_CC register using a
> CONST_INT.  In such cases, the RTL optimizers don't know the semantics
> of this integer value, so shouldn't change anything (i.e. should return
> NULL_RTX from simplify_const_relational_operation).

This is invalid RTL.  What would  (set (reg:CC) (const_int 0))  mean,
for example?  If this was valid it would make most existing code using
CC modes do essentially random things :-(

The documentation (in tm.texi, "Condition Code") says
  Alternatively, you can use @code{BImode} if the comparison operator is
  specified already in the compare instruction.  In this case, you are not
  interested in most macros in this section.

> The worked example provided with this patch is to allow the i386 backend
> to explicitly model the carry flag (MODE_CCC) using 1 to indicate that
> the carry flag is set, and 0 to indicate the carry flag is clear.  This
> allows the instructions stc (set carry flag), clc (clear carry flag) and
> cmc (complement carry flag) to be represented in RTL.

Hrm, I wonder how other targets do this.

On Power we have a separate hard register for the carry flag of course
(it is a separate bit in the hardware as well, XER[CA]).

On Arm there is arm_carry_operation (as well as arm_borrow_operation).

Aarch64 directly uses
(define_expand "add3_carryin"
  [(set (match_operand:GPI 0 "register_operand")
(plus:GPI
  (plus:GPI
(ltu:GPI (reg:CC_C CC_REGNUM) (const_int 0))
(match_operand:GPI 1 "aarch64_reg_or_zero"))
  (match_operand:GPI 2 "aarch64_reg_or_zero")))]
   ""
   ""
)
(CC_Cmode means only the C bit is validly set).

s390 does similar.  sparc does similar.

> However an even better example would be the rs6000 backend, where this
> patch/target hook would allow improved modelling of the condition register
> CR.  The powerpc's comparison instructions set fields/bits in the CR
> register [where bit 0 indicates less than, bit 1 greater than, bit 2
> equal to and bit3 overflow]

There are eight condition register fields which can be used
interchangeably (some insns only write to CR0, CR1, or CR6).  The
meaning of the four bits in a field depends on the instruction that set
them.  For integer comparisons bit 3 does not mean anything to do with a
comparison: instead, it is a copy of the XER[SO] bit ("summary
overflow").  The rs6000 backend does not currently model this (we do not
model the overflow instructions at all!)

> analogous to x86's flags register [containing
> bits for carry, zero, overflow, parity etc.].  These fields can be
> manipulated directly using crset (aka creqv) and crclr (aka crxor)
> instructions

crand, crnand, cror, crxor, crnor, creqv, crandc, crorc insns, or the
extended mnemonics crmove, crclr, crnot, crset, yes.  All these for
setting single bits; there also is mcrf to copy all four bits of a CR
field to another.

> and even transferred from general purpose registers using
> mtcr.  However, without a patch like this, it's impossible to safely
> model/represent these instructions in rs6000.md.

And yet we do.  See for example @cceq_rev_compare_ which
implements crnot.

> +  /* Handle MODE_CC comparisons that have been simplified to
> + constants.  */
> +  if (GET_MODE_CLASS (mode) == MODE_CC
> +  && op1 == const0_rtx
> +  && CONST_INT_P (op0))
> +return targetm.simplify_modecc_const (mode, (int)code, op0);

Comparing two integer constants is invalid RTL *in all contexts*.  The
items compared do not have a mode!  From rtl.texi:
  A @code{compare} specifying two @code{VOIDmode} constants is not valid
  since there is no way to know in what mode the comparison is to be
  performed; the comparison must either be folded during the compilation
  or the first operand must be loaded into a register while its mode is
  still known.

Segher

Re: [PATCH V1] HIGH part of symbol ref is invalid for constant pool

2022-07-26 Thread Segher Boessenkool

Hi!

On Tue, Jul 19, 2022 at 10:30:54PM +0800, Jiufu Guo wrote:
> In patch https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597712.html,
> test case was not added.  After more check, a testcase is added for it.
> 
> The high part of the symbol address is invalid for the constant pool.

Invalid, how so?  Is there a PR related here?

But it is not particularly useful ever, either: we do not know two
different addresses will have the same HIGH unless we know the exact
address, and then we don't need HIGH anyway.

>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
>   Return true for HIGH code rtx.

* config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem): Return true
for HIGH code rtx.

Please don't wrap lines early: changelog lines are 80 positions long,
including the leading tab (which counts as eight positions).

>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* High part of a symbol ref/address can not be put into constant pool. 
> e.g.
> + (high:DI (symbol_ref:DI ("var")..)) or
> + (high:DI (unspec:DI [(symbol_ref/u:DI ("*.LC0")..)
> + (high:DI (const:DI (plus:DI (symbol_ref:DI ("xx")) (const_int 12.  
> */
> +  if (GET_CODE (x) == HIGH)
>  return true;

I'm not sure the new comment is helpful at all?  Are these examples of
where the compiler (or assembler perhaps) will choke?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/constpoolcheck.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target powerpc*-*-* } } */

Everything in gcc.target/powerpc is target powerpc* always.

> +/* { dg-options "-O1 -mdejagnu-cpu=power10" } */
> +/* (high:DI (symbol_ref:DI ("var_48")..))) should not cause ICE. */

Ah, so there is an ICE, I see.  Please open a PR, and mention that in
the testcase as well as in the commit message and changelog.

I agree with what the patch does, it just needs a little more work :-)

Segher

Re: [PATCH] Add new target hook: simplify_modecc_const.

2022-07-26 Thread Segher Boessenkool

Hi!

On Tue, Jul 26, 2022 at 10:04:45PM +0100, Roger Sayle wrote:
> It's very important to distinguish the invariants that exist for the RTL
> data structures as held in memory (rtx),

"In memory"?  What does that mean here?  RTX are just RTL expressions,
nothing more, nothing less.

> vs. the use of "enum rtx_code"s,
> "machine_mode"s and operands in the various processing functions
> of the middle-end.

Of course.

> Yes, it's very true that RTL integer constants don't specify a mode
> (are VOIDmode), so therefore operations like ZERO_EXTEND or EQ
> don't make sense with all constant operands.

ZERO_EXTEND makes sense for all non-negative operands and no negative
operands.  Anything with some integer mode (so not VOIDmode!) can be
the first argument to EQ, at least structurally.

> This is (one reason)
> why constant-only operands are disallowed from RTL (data structures),
> and why in APIs that perform/simplify these operations, the original
> operand mode (of the const_int(s)) must be/is always passed as a
> parameter.
> 
> Hence, for say simplify_const_binary_operation, op0 and op1 can
> both be const_int, as the mode argument specifies the mode of the
> "code" operation. Likewise, in simplify_relational_operation, both
> op0 and op1 may be CONST_INT as "cmp_mode" explicitly specifies
> the mode that the operation is performed in and "mode" specifies
> the mode of the result.

> Your comment that "comparing two integer constants is invalid
> RTL *in all contexts*" is a serious misunderstanding of what's
> going on.

Not at all.  I showed the quote from the documentation: it is always
invalid to have two VOIDmode arguments to COMPARE.

> At no point is a RTL rtx node ever allocated with two
> integer constant operands.  RTL simplification is for hypothetical
> "what if" transformations (just like try_combine calls recog with
> RTL that may not be real instructions), and these simplifcations
> are even sometimes required to preserve the documented RTL
> invariants.  Comparisons of two integers must be simplified to
> true/false precisely to ensure that they never appear in an actual
> COMPARE node.

As the function documentation clearly states, two VOIDmode args (and
MODE that as well) is a special case for infinite precision arithmetic.

> I worry this fundamental misunderstanding is the same issue that
> has been holding up understanding/approving a previous patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578848.html

https://patchwork.ozlabs.org/project/gcc/patch/001401d7a2a5$5bf07db0$13d17910$@nextmovesoftware.com/

Let's not discuss this in this thread though.

> For a related bug, consider PR rtl-optimization/67382, that's assigned
> to you in bugzilla.  In this case, the RTL optimizers know that both
> operands to a COMPARE are integer constants (both -2), yet the
> compiler still performs a run-time comparison and conditional jump:
> 
> movl$-2, %eax
> movl%eax, 12(%rsp)
> cmpl$-2, %eax
> je  .L1
> 
> Failing to optimize/consider a comparison between two integer
> constants *in any context* just leads to poor code.

If combine would ever generate invalid RTL, the resulting insn does not
pass recog(), making the combine attempt fail.  This is not the way.

The simplifier (part of combine) has a function to actually simplify
tautologies and contradictions, the simplify_const_relational_operation
function you edited here.

> Hopefully, this clears up that the documented constraints on RTL rtx
> aren't exactly the same as the constraints on the use of rtx_codes in
> simplify-rtx's functional APIs.  So simplify_subreg really gets called
> on operands that are neither REG nor MEM, as this is unrelated to
> what the documentation of the SUBREG rtx specifies.

Thank you for telling the maintainer of combine the basics of what all
of this does!  I hadn't noticed any of that before.

> If you don't believe that op0 and op1 can ever both be const_int
> in this function, perhaps consider it harmless dead code and humor
> me.

They can be, as clearly documented (and obvious from the code), but you
can not ever have that in the RTL stream, which is needed for your patch
to do anything.

I consider it harmful, and not dead.  Sorry.


Do you have comments on the rest?


Segher

Re: [PATCH] Add new target hook: simplify_modecc_const.

2022-07-27 Thread Segher Boessenkool

On Wed, Jul 27, 2022 at 08:51:58AM +0100, Roger Sayle wrote:
> > They can be, as clearly documented (and obvious from the code), but you
> can
> > not ever have that in the RTL stream, which is needed for your patch to do
> > anything.
> 
> That's the misunderstanding; neither this nor the previous SUBREG patch,
> affect/change what is in the RTL stream, no COMPARE nodes are every
> changed or modified, only eliminated by the propagation/fusion in combine
> (or CSE).

There are no guarantees at all for that though?

> We have --enable-checking=rtl to guarantee that the documented invariants
> always hold in the RTL stream.

That unfortunately only checks a few structural constraints, and not
even all.  For example not that STRICT_LOW_PART has a SUBREG as
argument, which is documented, and the only thing that makes sense
anyway.  This is PR106101.  Unfortunately many targets violate this.

Segher

Re: [PATCH] Some additional zero-extension related optimizations in simplify-rtx.

2022-07-27 Thread Segher Boessenkool

Hi!

On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> This patch implements some additional zero-extension and sign-extension
> related optimizations in simplify-rtx.cc.  The original motivation comes
> from PR rtl-optimization/71775, where in comment #2 Andrew Pinski sees:
> 
> Failed to match this instruction:
> (set (reg:DI 88 [ _1 ])
> (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> 
> On many platforms the result of DImode CTZ is constrained to be a
> small unsigned integer (between 0 and 64), hence the truncation to
> 32-bits (using a SUBREG) and the following sign extension back to
> 64-bits are effectively a no-op, so the above should ideally (often)
> be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

And you can also do that if ctz is undefined for a zero argument!

> To implement this, and some closely related transformations, we build
> upon the existing val_signbit_known_clear_p predicate.  In the first
> chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> bit set,

Is that guaranteed in all cases?  Also at -O0, also for args bigger than
64 bits?

> Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
> default architecture options is undefined at zero, so we can't be sure
> the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
> nonzero_bits knows this, so the above transformations don't trigger,
> but the transformations themselves are perfectly valid for other
> operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
> settings where CTZ is defined at zero.

It is valid to do whatever you want if the result of CTZ or CLZ is
undefined, so the sign_extend of c[lt]z is a nop.

However if C[LT]Z_DEFINED_VALUE_AT_ZERO is non-zero you have to check
if the returned value (the macro's second arg) survives sign-extending.

>/* If operand is something known to be positive, ignore the ABS.  */
> -  if (GET_CODE (op) == FFS || GET_CODE (op) == ABS
> -   || val_signbit_known_clear_p (GET_MODE (op),
> - nonzero_bits (op, GET_MODE (op
> +  if (val_signbit_known_clear_p (GET_MODE (op),
> +  nonzero_bits (op, GET_MODE (op
>   return op;

I don't think val_signbit_known_clear_p is true in all cases, as I said
above, but we do not care about generated code quality for these cases.
OK.

> +  /* We can canonicalize SIGN_EXTEND (op) as ZERO_EXTEND (op) when
> + we know the sign bit of OP must be clear.  */
> +  if (val_signbit_known_clear_p (GET_MODE (op),
> +  nonzero_bits (op, GET_MODE (op
> + return simplify_gen_unary (ZERO_EXTEND, mode, op, GET_MODE (op));

OK.

> +  /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +  if (GET_CODE (op) == SUBREG
> +   && subreg_lowpart_p (op)
> +   && GET_MODE (SUBREG_REG (op)) == mode
> +   && is_a  (mode, &int_mode)
> +   && is_a  (GET_MODE (op), &op_mode)
> +   && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +   && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +   && (nonzero_bits (SUBREG_REG (op), mode)
> +   & ~(GET_MODE_MASK (op_mode)>>1)) == 0)

(spaces around >> please)

Please use val_signbit_known_{set,clear}_p?

> + return SUBREG_REG (op);

Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if
the value it returns in its second arg does not survive sign extending
unmodified (if it is 0x for an extend from SI to DI for
exxample).

> +  /* (zero_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> +  if (GET_CODE (op) == SUBREG
> +   && subreg_lowpart_p (op)
> +   && GET_MODE (SUBREG_REG (op)) == mode
> +   && is_a  (mode, &int_mode)
> +   && is_a  (GET_MODE (op), &op_mode)
> +   && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> +   && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION (int_mode)
> +   && (nonzero_bits (SUBREG_REG (op), mode)
> +   & ~GET_MODE_MASK (op_mode)) == 0)
> + return SUBREG_REG (op);

This has that same problem.


Segher

Re: [PATCH, rs6000] Add multiply-add expand pattern [PR103109]

2022-08-01 Thread Segher Boessenkool

Hi!

On Mon, Jul 25, 2022 at 01:11:47PM +0800, HAO CHEN GUI wrote:
>   This patch adds an expand and several insns for multiply-add with
> three 64bit operands.

>   PR target/103109
>   * config/rs6000/rs6000.md (maddditi4): New pattern for
>   multiply-add.

Please don't break lines unnecessarily.

> +(define_expand "maddditi4"
> +  [(set (match_operand:TI 0 "gpc_reg_operand")
> + (plus:TI
> +   (mult:TI (any_extend:TI
> +  (match_operand:DI 1 "gpc_reg_operand"))
> +(any_extend:TI
> +  (match_operand:DI 2 "gpc_reg_operand")))
> +   (any_extend:TI
> + (match_operand:DI 3 "gpc_reg_operand"]

Broken indentation, should be

(define_expand "maddditi4"
  [(set (match_operand:TI 0 "gpc_reg_operand")
(plus:TI
  (mult:TI (any_extend:TI (match_operand:DI 1 "gpc_reg_operand"))
   (any_extend:TI (match_operand:DI 2 "gpc_reg_operand")))
  (any_extend:TI (match_operand:DI 3 "gpc_reg_operand"]

> +  "TARGET_POWERPC64 && TARGET_MADDLD"

TARGET_MADDLD should itself already include TARGET_POWERPC64.  Could you
please send a separate patch for that first?

> +(define_insn "madddi4_lowpart"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> + (subreg:DI
> +   (plus:TI
> + (mult:TI (any_extend:TI
> +(match_operand:DI 1 "gpc_reg_operand" "r"))
> +  (any_extend:TI
> +(match_operand:DI 2 "gpc_reg_operand" "r")))
> + (any_extend:TI
> +   (match_operand:DI 3 "gpc_reg_operand" "r")))
> +  8))]

  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
(subreg:DI
  (plus:TI
(mult:TI (any_extend:TI (match_operand:DI 1 "gpc_reg_operand" "r"))
 (any_extend:TI (match_operand:DI 2 "gpc_reg_operand" "r")))
(any_extend:TI (match_operand:DI 3 "gpc_reg_operand" "r")))
 8))]

(and similar for _le of course).


Segher

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6132 matches

Mail list logo