[PATCH] testsuite: Silence analyzer/pr51628-30.c for default_packed

2022-05-08 Thread Dimitar Dimitrov
On default_packed targets like PRU, a warning in the file included from
analyzer/pr51628-30.c is reported as spurious one, even though it has been
annotated there:

  Excess errors:
  
gcc/gcc/testsuite/gcc.dg/analyzer/torture/../../../c-c++-common/pr51628-30.c:7:19:
 warning: 'packed' attribute ignored for field of type 'struct B' [-Wattributes]

DejaGnu does not preprocess the C test case sources.  Hence the "dg-*"
statements in included files are ignored.

Mark that gcc.dg/analyzer/torture/pr51628-30.c generates excess warnings
for default_packed targets.  This is safe because the original test case
covered an ICE, not a diagnostic error.

Ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.dg/analyzer/torture/pr51628-30.c: Test can spill excess
errors for default_packed targets.

CC: David Malcolm 
Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/analyzer/torture/pr51628-30.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/pr51628-30.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/pr51628-30.c
index 4513e0f890c..abc13413f2b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/pr51628-30.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/pr51628-30.c
@@ -1,3 +1,4 @@
 /* { dg-additional-options "-Wno-address-of-packed-member" } */
+/* { dg-excess-errors "warnings about ignored 'packed' attribute" { target 
default_packed } } */
 
 #include "../../../c-c++-common/pr51628-30.c"
-- 
2.35.1



[PATCH] testsuite: mallign: Handle word size of 1 byte

2022-05-08 Thread Dimitar Dimitrov
This patch fixes a spurious warning for pru-unknown-elf target:
  gcc/testsuite/gcc.dg/mallign.c:12:27: warning: ignoring return value of 
'malloc' declared with attribute 'warn_unused_result' [-Wunused-result]

For 8-bit targets the resulting mask ignores all bits in the value
returned by malloc.  Fix by first checking the target word size.

Sanity checked that there are no new failures on x86_64-pc-linux-gnu.

Ok for trunk?

gcc/testsuite/ChangeLog:

* gcc.dg/mallign.c: Skip check if sizeof(word)==1.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/testsuite/gcc.dg/mallign.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/mallign.c b/gcc/testsuite/gcc.dg/mallign.c
index 349cdaa343f..9a18a00c3b0 100644
--- a/gcc/testsuite/gcc.dg/mallign.c
+++ b/gcc/testsuite/gcc.dg/mallign.c
@@ -9,7 +9,7 @@ typedef int word __attribute__((mode(word)));
 
 int main()
 {
-if ((__UINTPTR_TYPE__)malloc (1) & (sizeof(word)-1))
+if ((sizeof(word)>1) && ((__UINTPTR_TYPE__)malloc (1) & (sizeof(word)-1)))
abort ();
 return 0;
 }  
-- 
2.35.1



New template for 'gcc' made available

2022-05-08 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.  (If you have
any questions, send them to .)

A new POT file for textual domain 'gcc' has been made available
to the language teams for translation.  It is archived as:

https://translationproject.org/POT-files/gcc-12.1.0.pot

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

Below is the URL which has been provided to the translators of your
package.  Please inform the translation coordinator, at the address
at the bottom, if this information is not current:

https://gcc.gnu.org/pub/gcc/releases/gcc-12.1.0/gcc-12.1.0.tar.xz

Translated PO files will later be automatically e-mailed to you.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




New German PO file for 'gcc' (version 12.1.0)

2022-05-08 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-12.1.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




New Ukrainian PO file for 'gcc' (version 12.1.0)

2022-05-08 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Ukrainian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/uk.po

(This file, 'gcc-12.1.0.uk.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH] PR fortran/105501 - check for non-optional spaces between adjacent keywords

2022-05-08 Thread Harald Anlauf via Gcc-patches
Dear all,

the PR correctly notes that a space between keywords 'TYPE' and 'IS' is
required in free-form, but we currently accept 'TYPEIS'.  We shouldn't.
The combinations with non-optional blanks are listed in the standard;
in F2018 this is table 6.2.

While at it, I saw a couple of other keyword combinations in the matcher
and fixed these too.  I cross-checked my findings with Intel, Crayftn,
and NAG (as far as possible).

Regarding the testcase: I do not know how to write a (single!) testcase
that is able to check multiple of those fixes.  I also do not think that
it makes sense to provide a testcase for each single fixed pattern.
Therefore a provided a single, minimal testcase based on the report.

Regtested on x86_64-pc-linux-gnu.  OK for mainline (i.e. 13-master)?

Thanks,
Harald

From 8b04cb084e138966cf20187887da676ad9e4a00e Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 8 May 2022 22:04:27 +0200
Subject: [PATCH] Fortran: check for non-optional spaces between adjacent
 keywords

In free format, spaces between adjacent keywords are not optional except
when a combination is explicitly listed (e.g. F2018: table 6.2).  The
following combinations thus require separating blanks: CHANGE TEAM,
ERROR STOP, EVENT POST, EVENT WAIT, FAIL IMAGE, FORM TEAM, SELECT RANK,
SYNC ALL, SYNC IMAGES, SYNC MEMORY, SYNC TEAM, TYPE IS.

gcc/fortran/ChangeLog:

	PR fortran/105501
	* match.cc (gfc_match_if): Adjust patterns used for matching.
	(gfc_match_select_rank): Likewise.
	* parse.cc (decode_statement): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/105501
	* gfortran.dg/pr105501.f90: New test.
---
 gcc/fortran/match.cc   | 22 +++---
 gcc/fortran/parse.cc   | 22 +++---
 gcc/testsuite/gfortran.dg/pr105501.f90 | 15 +++
 3 files changed, 37 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr105501.f90

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index 205811bb969..1aa3053e70e 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -1606,21 +1606,21 @@ gfc_match_if (gfc_statement *if_type)
   match ("assign", gfc_match_assign, ST_LABEL_ASSIGNMENT)
   match ("backspace", gfc_match_backspace, ST_BACKSPACE)
   match ("call", gfc_match_call, ST_CALL)
-  match ("change team", gfc_match_change_team, ST_CHANGE_TEAM)
+  match ("change% team", gfc_match_change_team, ST_CHANGE_TEAM)
   match ("close", gfc_match_close, ST_CLOSE)
   match ("continue", gfc_match_continue, ST_CONTINUE)
   match ("cycle", gfc_match_cycle, ST_CYCLE)
   match ("deallocate", gfc_match_deallocate, ST_DEALLOCATE)
   match ("end file", gfc_match_endfile, ST_END_FILE)
   match ("end team", gfc_match_end_team, ST_END_TEAM)
-  match ("error stop", gfc_match_error_stop, ST_ERROR_STOP)
-  match ("event post", gfc_match_event_post, ST_EVENT_POST)
-  match ("event wait", gfc_match_event_wait, ST_EVENT_WAIT)
+  match ("error% stop", gfc_match_error_stop, ST_ERROR_STOP)
+  match ("event% post", gfc_match_event_post, ST_EVENT_POST)
+  match ("event% wait", gfc_match_event_wait, ST_EVENT_WAIT)
   match ("exit", gfc_match_exit, ST_EXIT)
-  match ("fail image", gfc_match_fail_image, ST_FAIL_IMAGE)
+  match ("fail% image", gfc_match_fail_image, ST_FAIL_IMAGE)
   match ("flush", gfc_match_flush, ST_FLUSH)
   match ("forall", match_simple_forall, ST_FORALL)
-  match ("form team", gfc_match_form_team, ST_FORM_TEAM)
+  match ("form% team", gfc_match_form_team, ST_FORM_TEAM)
   match ("go to", gfc_match_goto, ST_GOTO)
   match ("if", match_arithmetic_if, ST_ARITHMETIC_IF)
   match ("inquire", gfc_match_inquire, ST_INQUIRE)
@@ -1634,10 +1634,10 @@ gfc_match_if (gfc_statement *if_type)
   match ("rewind", gfc_match_rewind, ST_REWIND)
   match ("stop", gfc_match_stop, ST_STOP)
   match ("wait", gfc_match_wait, ST_WAIT)
-  match ("sync all", gfc_match_sync_all, ST_SYNC_CALL);
-  match ("sync images", gfc_match_sync_images, ST_SYNC_IMAGES);
-  match ("sync memory", gfc_match_sync_memory, ST_SYNC_MEMORY);
-  match ("sync team", gfc_match_sync_team, ST_SYNC_TEAM)
+  match ("sync% all", gfc_match_sync_all, ST_SYNC_CALL);
+  match ("sync% images", gfc_match_sync_images, ST_SYNC_IMAGES);
+  match ("sync% memory", gfc_match_sync_memory, ST_SYNC_MEMORY);
+  match ("sync% team", gfc_match_sync_team, ST_SYNC_TEAM)
   match ("unlock", gfc_match_unlock, ST_UNLOCK)
   match ("where", match_simple_where, ST_WHERE)
   match ("write", gfc_match_write, ST_WRITE)
@@ -6716,7 +6716,7 @@ gfc_match_select_rank (void)
   if (m == MATCH_ERROR)
 return m;

-  m = gfc_match (" select rank ( ");
+  m = gfc_match (" select% rank ( ");
   if (m != MATCH_YES)
 return m;

diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index e6e915d2a5e..7356d1b5a3a 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -454,7 +454,7 @@ decode_statement (void)

 case 'c':
   match ("call", gfc_match_call, ST_CALL);
-  match ("change team", gfc_match_change_te

[PATCH] c++: Implement P2324R2, labels at the end of compound-stmts [PR103539]

2022-05-08 Thread Marek Polacek via Gcc-patches
This patch implements C++23 , which allows
labels at the end of a compound statement.   Its C FE counterpart was
already implemented in r11-4813.

In cp_parser_statement I rely on in_compound to determine whether we're
in a compound-statement, so that the patch doesn't accidentally allow

  void fn(int c) {
if (c)
  label:
  }

Strangely, in_compound was reset after seeing a label (this is tested in
c-c++-common/gomp/pr63326.c), so I've made a modifiable copy specific
for OpenMP #pragma purposes.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/103539

gcc/cp/ChangeLog:

* parser.cc (cp_parser_statement): Constify the in_compound parameter.
Create a modifiable copy.  Allow labels at the end of compound
statements.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/label1.C: New test.
* g++.dg/cpp23/label2.C: New test.
---
 gcc/cp/parser.cc| 43 +++---
 gcc/testsuite/g++.dg/cpp23/label1.C | 89 +
 gcc/testsuite/g++.dg/cpp23/label2.C | 52 +
 3 files changed, 175 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp23/label1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp23/label2.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 3ebaa414a3d..a4c3d8aa234 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -12174,7 +12174,7 @@ cp_parser_handle_directive_omp_attributes (cp_parser 
*parser, tree *pattrs,
  atomic-statement
 
   IN_COMPOUND is true when the statement is nested inside a
-  cp_parser_compound_statement; this matters for certain pragmas.
+  cp_parser_compound_statement.
 
   If IF_P is not NULL, *IF_P is set to indicate whether the statement
   is a (possibly labeled) if statement which is not enclosed in braces
@@ -12184,7 +12184,7 @@ cp_parser_handle_directive_omp_attributes (cp_parser 
*parser, tree *pattrs,
 
 static void
 cp_parser_statement (cp_parser* parser, tree in_statement_expr,
-bool in_compound, bool *if_p, vec *chain,
+const bool in_compound, bool *if_p, vec *chain,
 location_t *loc_after_labels)
 {
   tree statement, std_attrs = NULL_TREE;
@@ -12192,6 +12192,9 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
   location_t statement_location, attrs_loc;
   bool in_omp_attribute_pragma = parser->lexer->in_omp_attribute_pragma;
   bool has_std_attrs;
+  /* A copy of IN_COMPOUND which is set to false after seeing a label.
+ This matters for certain pragmas.  */
+  bool in_compound_for_pragma = in_compound;
 
  restart:
   if (if_p != NULL)
@@ -12286,7 +12289,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
 Parse the label, and then use tail recursion to parse
 the statement.  */
  cp_parser_label_for_labeled_statement (parser, std_attrs);
- in_compound = false;
+ in_compound_for_pragma = false;
  in_omp_attribute_pragma = parser->lexer->in_omp_attribute_pragma;
  goto restart;
 
@@ -12370,7 +12373,21 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
 the statement.  */
 
  cp_parser_label_for_labeled_statement (parser, std_attrs);
- in_compound = false;
+
+ /* If there's no statement, it's not a labeled-statement, just
+a label.  That's allowed in C++23, but only if we're at the
+end of a compound-statement.  */
+ if (in_compound
+ && cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_BRACE))
+   {
+ location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+ if (cxx_dialect < cxx23)
+   pedwarn (loc, OPT_Wc__23_extensions,
+"label at end of compound statement only available "
+"with %<-std=c++2b%> or %<-std=gnu++2b%>");
+ return;
+   }
+ in_compound_for_pragma = false;
  in_omp_attribute_pragma = parser->lexer->in_omp_attribute_pragma;
  goto restart;
}
@@ -12393,7 +12410,7 @@ cp_parser_statement (cp_parser* parser, tree 
in_statement_expr,
 the context of a compound, accept the pragma as a "statement" and
 return so that we can check for a close brace.  Otherwise we
 require a real statement and must go back and read one.  */
-  if (in_compound)
+  if (in_compound_for_pragma)
cp_parser_pragma (parser, pragma_compound, if_p);
   else if (!cp_parser_pragma (parser, pragma_stmt, if_p))
do_restart = true;
@@ -12544,9 +12561,13 @@ attr_chainon (tree attrs, tree attr)
 
 /* Parse the label for a labeled-statement, i.e.
 
-   identifier :
-   case constant-expression :
-   default :
+   label:
+ attribute-specifier-seq[opt] identifier :
+ attribute-specifier-seq[opt] case constant-expression :
+ attribute-specifier-seq[opt] defaul

Re: [PATCH] Expand __builtin_memcmp_eq with ptest for OImode.

2022-05-08 Thread Hongtao Liu via Gcc-patches
On Sat, May 7, 2022 at 1:05 PM liuhongt via Gcc-patches
 wrote:
>
> This is adjusted patch only for OImode.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/104610
> * config/i386/i386-expand.cc (ix86_expand_branch): Use ptest
> for QImode when code is EQ or NE.
> * config/i386/sse.md (cbranch4): Extend to OImode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr104610.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc   | 10 +-
>  gcc/config/i386/sse.md   |  8 ++--
>  gcc/testsuite/gcc.target/i386/pr104610.c | 15 +++
>  3 files changed, 30 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr104610.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index bc806ffa283..c2f8776102c 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2267,11 +2267,19 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
> op1, rtx label)
>
>/* Handle special case - vector comparsion with boolean result, transform
>   it using ptest instruction.  */
> -  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +  || (mode == OImode && (code == EQ || code == NE)))
>  {
>rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
>machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
>
> +  if (mode == OImode)
> +   {
> + op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
> + op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
> + mode = p_mode;
> +   }
> +
>gcc_assert (code == EQ || code == NE);
>/* Generate XOR since we can't check that one operand is zero vector.  
> */
>tmp = gen_reg_rtx (mode);
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 7b791def542..9514b8e0234 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -26034,10 +26034,14 @@ (define_expand 
> "maskstore"
>   (match_operand: 2 "register_operand")))]
>"TARGET_AVX512BW")
>
> +(define_mode_iterator VI48_OI_AVX
> +  [(V8SI "TARGET_AVX") (V4DI "TARGET_AVX") (OI "TARGET_AVX")
> +   V4SI V2DI])
> +
>  (define_expand "cbranch4"
>[(set (reg:CC FLAGS_REG)
> -   (compare:CC (match_operand:VI48_AVX 1 "register_operand")
> -   (match_operand:VI48_AVX 2 "nonimmediate_operand")))
> +   (compare:CC (match_operand:VI48_OI_AVX 1 "register_operand")
> +   (match_operand:VI48_OI_AVX 2 "nonimmediate_operand")))
> (set (pc) (if_then_else
>(match_operator 0 "bt_comparison_operator"
> [(reg:CC FLAGS_REG) (const_int 0)])
> diff --git a/gcc/testsuite/gcc.target/i386/pr104610.c 
> b/gcc/testsuite/gcc.target/i386/pr104610.c
> new file mode 100644
> index 000..00866238bd7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr104610.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mmove-max=256 -mstore-max=256" } */
> +/* { dg-final { scan-assembler-times {(?n)vptest.*ymm} 1 } } */
> +/* { dg-final { scan-assembler-times {sete} 1 } } */
> +/* { dg-final { scan-assembler-not {(?n)je.*L[0-9]} } } */
> +/* { dg-final { scan-assembler-not {(?n)jne.*L[0-9]} } } */
> +
> +
> +#include
> +__attribute__((target("avx")))
> +bool f256(char *a)
> +{
> +  char t[] = "0123456789012345678901234567890";
> +  return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao


[PATCH, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-05-08 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
Tests show that outputs of xs[min/max]dp are consistent with the standard
of C99 fmin/max.

  Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-05-09 Haochen Gui 

gcc/
PR target/103605
* rs6000.md (unspec): Add UNSPEC_FMAX and UNSPEC_FMIN.
(fminmax): New.
(minmax_op): Likewise.
(3): New pattern.  Implemented by UNSPEC_FMAX and
UNSPEC_FMIN.

gcc/testsuite/
PR target/103605
* gcc.dg/pr103605.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fdfbc6566a5..8aae3e80bcd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -158,6 +158,8 @@ (define_c_enum "unspec"
UNSPEC_HASHCHK
UNSPEC_XXSPLTIDP_CONST
UNSPEC_XXSPLTIW_CONST
+   UNSPEC_FMAX
+   UNSPEC_FMIN
   ])

 ;;
@@ -5350,6 +5352,25 @@ (define_insn_and_split "*s3_fpr"
   DONE;
 })

+
+(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
+
+(define_int_attr fminmax [(UNSPEC_FMAX "fmax")
+ (UNSPEC_FMIN "fmin")])
+
+(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
+(UNSPEC_FMIN "min")])
+
+(define_insn "3"
+  [(set (match_operand:SFDF 0 "vsx_register_operand" "=")
+   (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "")
+ (match_operand:SFDF 2 "vsx_register_operand" "")]
+ FMINMAX))]
+"TARGET_VSX"
+"xsdp %x0,%x1,%x2"
+[(set_attr "type" "fp")]
+)
+
 (define_expand "movcc"
[(set (match_operand:GPR 0 "gpc_reg_operand")
 (if_then_else:GPR (match_operand 1 "comparison_operator")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
b/gcc/testsuite/gcc.target/powerpc/pr103605.c
new file mode 100644
index 000..a40da064742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O1 -mvsx" } */
+/* { dg-final { scan-assembler-times "xsmaxdp" 2 } } */
+/* { dg-final { scan-assembler-times "xsmindp" 2 } } */
+
+#include 
+
+double test1 (double d0, double d1)
+{
+  return fmin (d0, d1);
+}
+
+float test2 (float d0, float d1)
+{
+  return fmin (d0, d1);
+}
+
+double test3 (double d0, double d1)
+{
+  return fmax (d0, d1);
+}
+
+float test4 (float d0, float d1)
+{
+  return fmax (d0, d1);
+}


[PATCH] [i386] Optimize movzwl + vmovd/vmovq to vmovw.

2022-05-08 Thread liuhongt via Gcc-patches
Similarly optimize movl + vmovq to vmovd.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/104915
* config/i386/sse.md (*vec_set_0_zero_extendhi): New
pre_reload define_insn_and_split.
(*vec_setv2di_0_zero_extendhi_1): Ditto.
(*vec_set_0_zero_extendsi): Ditto.
(*vec_setv2di_0_zero_extendsi_1): Ditto.
(ssewvecmode): New mode attr.
(ssewvecmodelower): Ditto.
(ssepackmodelower): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104915-vmovd.c: New test.
* gcc.target/i386/pr104915-vmovw.c: New test.
---
 gcc/config/i386/sse.md| 94 +++
 .../gcc.target/i386/pr104915-vmovd.c  | 25 +
 .../gcc.target/i386/pr104915-vmovw.c  | 45 +
 3 files changed, 164 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104915-vmovd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104915-vmovw.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7b791def542..2ad8a2b46b8 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -985,6 +985,15 @@ (define_mode_attr sseintvecmode
(V32HI "V32HI") (V64QI "V64QI")
(V32QI "V32QI") (V16QI "V16QI")])
 
+;; Mapping of vector modes to an V*HImode of the same size
+(define_mode_attr ssewvecmode
+  [(V8DI "V32HI") (V4DI "V16HI") (V2DI "V8HI")
+   (V16SI "V32HI") (V8SI "V16HI") (V4SI "V8HI")])
+
+(define_mode_attr ssewvecmodelower
+  [(V8DI "v32hi") (V4DI "v16hi") (V2DI "v8hi")
+   (V16SI "v32hi") (V8SI "v16hi") (V4SI "v8hi")])
+
 (define_mode_attr sseintvecmode2
   [(V8DF "XI") (V4DF "OI") (V2DF "TI")
(V8SF "OI") (V4SF "TI")
@@ -1194,6 +1203,11 @@ (define_mode_attr ssepackmode
(V16HI "V32QI") (V8SI "V16HI") (V4DI "V8SI")
(V32HI "V64QI") (V16SI "V32HI") (V8DI "V16SI")])
 
+(define_mode_attr ssepackmodelower
+  [(V8HI "v16qi") (V4SI "v8hi") (V2DI "v4si")
+   (V16HI "v32qi") (V8SI "v16hi") (V4DI "v8si")
+   (V32HI "v64qi") (V16SI "v32hi") (V8DI "v16si")])
+
 ;; Mapping of the max integer size for xop rotate immediate constraint
 (define_mode_attr sserotatemax
   [(V16QI "7") (V8HI "15") (V4SI "31") (V2DI "63")])
@@ -10681,6 +10695,46 @@ (define_insn "vec_set_0"
(set_attr "prefix" "evex")
(set_attr "mode" "HF")])
 
+(define_insn_and_split "*vec_set_0_zero_extendhi"
+  [(set (match_operand:VI48_AVX512F 0 "register_operand")
+   (vec_merge:VI48_AVX512F
+(vec_duplicate:VI48_AVX512F
+ (zero_extend:
+   (match_operand:HI 1 "nonimmediate_operand")))
+(match_operand:VI48_AVX512F 2 "const0_operand")
+(const_int 1)))]
+  "TARGET_AVX512FP16 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx dest = gen_reg_rtx (mode);
+  emit_insn (gen_vec_set_0 (dest,
+ CONST0_RTX (mode),
+ operands[1]));
+  emit_move_insn (operands[0],
+ lowpart_subreg (mode, dest, mode));
+  DONE;
+})
+
+(define_insn_and_split "*vec_setv2di_0_zero_extendhi_1"
+  [(set (match_operand:V2DI 0 "register_operand")
+   (vec_concat:V2DI
+ (zero_extend:DI
+   (match_operand:HI 1 "nonimmediate_operand"))
+ (const_int 0)))]
+  "TARGET_AVX512FP16 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx dest = gen_reg_rtx (V8HImode);
+  emit_insn (gen_vec_setv8hi_0 (dest, CONST0_RTX (V8HImode), operands[1]));
+  emit_move_insn (operands[0],
+ lowpart_subreg (V2DImode, dest, V8HImode));
+  DONE;
+})
+
 (define_insn "avx512fp16_movsh"
   [(set (match_operand:V8HF 0 "register_operand" "=v")
(vec_merge:V8HF
@@ -10750,6 +10804,46 @@ (define_insn "vec_set_0"
   ]
   (symbol_ref "true")))])
 
+(define_insn_and_split "*vec_set_0_zero_extendsi"
+  [(set (match_operand:VI8 0 "register_operand")
+   (vec_merge:VI8
+(vec_duplicate:VI8
+ (zero_extend:DI
+   (match_operand:SI 1 "nonimmediate_operand")))
+(match_operand:VI8 2 "const0_operand")
+(const_int 1)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx dest = gen_reg_rtx (mode);
+  emit_insn (gen_vec_set_0 (dest,
+ CONST0_RTX (mode),
+ operands[1]));
+  emit_move_insn (operands[0],
+ lowpart_subreg (mode, dest, mode));
+  DONE;
+})
+
+(define_insn_and_split "*vec_setv2di_0_zero_extendsi_1"
+  [(set (match_operand:V2DI 0 "register_operand")
+   (vec_concat:V2DI
+ (zero_extend:DI
+   (match_operand:SI 1 "nonimmediate_operand"))
+ (const_int 0)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx dest = gen_reg_rtx (V4SImode);
+  emit_insn (gen_vec_setv4si_0 (dest, CONST0_RTX (V4SImode), operands[1]));
+  emit_move_insn (operan

[PATCH]rs6000: optimize li+rldicr+cmpd==>rotldi+cmpldi for 16bits cst

2022-05-08 Thread Jiufu Guo via Gcc-patches
Hi!

I would like to ping for trunk:
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591905.html

BR,
Jiufu

--
When checking eq/neq with a constant which has only 16bits, then it can
be optimized to check the rotated data.  By this, the constant building
is optimized.

As the example in PR103743:
For "in == 0x8000LL", this patch generates:
rotldi %r3,%r3,16
cmpldi %cr0,%r3,32768
instead:
li %r9,-1
rldicr %r9,%r9,0,0
cmpd %cr0,%r3,%r9

This patch pass bootstrap and regtest on ppc64 and ppc64le.
Ok for trunk?  Thanks!

BR,
Jiufu


PR target/103743

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rot_to_16bits): New function.
(rs6000_generate_compare): Compare rotated const for eq.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr103743.c: New test.
* gcc.target/powerpc/pr103743_1.c: New test.

---
 gcc/config/rs6000/rs6000.cc   | 62 +
 gcc/testsuite/gcc.target/powerpc/pr103743.c   | 48 ++
 gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 87 +++
 3 files changed, 197 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3afe78f5d04..4aa1d0ca4ab 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14860,6 +14860,42 @@ rs6000_reverse_condition (machine_mode mode, enum 
rtx_code code)
 return reverse_condition (code);
 }
 
+/* If SRC is a constant with only 16bits, return the number by which
+   the constant can be rotated to lowest 16bits.
+   Return 0, if SRC does not meet requirements.
+   Set *sgn to 1 if the rotated cst is negative, otherwise set it to 0.
+   Set *res_cst to the rotated constant.  */
+
+static int
+rot_to_16bits (rtx src, machine_mode mode, bool *sgn, rtx *res_cst)
+{
+  if (!(src && CONST_INT_P (src)))
+return 0;
+
+  unsigned HOST_WIDE_INT C = INTVAL (src);
+
+  /* If all 0 except low 16bits.  */
+  int leadz = clz_hwi (C) + 16;
+  rtx cst = simplify_gen_binary (ROTATE, mode, src, GEN_INT (leadz));
+  if (satisfies_constraint_K (cst))
+{
+  *res_cst = cst;
+  return leadz;
+}
+
+  /* If all 1 except low 15bits.  */
+  leadz = clz_hwi (~C) + 15;
+  cst = simplify_gen_binary (ROTATE, mode, src, GEN_INT (leadz));
+  if (satisfies_constraint_I (cst))
+{
+  *sgn = true;
+  *res_cst = cst;
+  return leadz;
+}
+
+  return 0;
+}
+
 /* Generate a compare for CODE.  Return a brand-new rtx that
represents the result of the compare.  */
 
@@ -14889,6 +14925,32 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
   else
 comp_mode = CCmode;
 
+  /* "i == C" ==> "rotl(i,N) == rotl(C,N)" if rotl(C,N) only low 16bits.  */
+  if ((code == NE || code == EQ) && mode == DImode)
+{
+  /* The constant would already been set to a reg in the last insn.  */
+  rtx_insn *last = get_last_insn_anywhere ();
+  rtx set = last ? single_set (last) : NULL_RTX;
+  rtx src = (set && SET_DEST (set) == op1) ? SET_SRC (set) : NULL_RTX;
+
+  /* It constant may be in constant pool. */
+  if (src && MEM_P (src))
+   src = avoid_constant_pool_reference (src);
+
+  rtx cst = NULL_RTX;
+  bool sgn = false;
+  int n = rot_to_16bits (src, mode, &sgn, &cst);
+  if (n >= 16 && n <= 48)
+   {
+ rtx dest = gen_reg_rtx (mode);
+ emit_insn (
+   gen_rtx_SET (dest, gen_rtx_ROTATE (mode, op0, GEN_INT (n;
+ op0 = dest;
+ op1 = cst;
+ comp_mode = sgn ? CCmode : CCUNSmode;
+   }
+}
+
   /* If we have an unsigned compare, make sure we don't have a signed value as
  an immediate.  */
   if (comp_mode == CCUNSmode && CONST_INT_P (op1)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c 
b/gcc/testsuite/gcc.target/powerpc/pr103743.c
new file mode 100644
index 000..606e5e51c4f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "cmpldi" 8  } } */
+/* { dg-final { scan-assembler-times "cmpdi" 4  } } */
+/* { dg-final { scan-assembler-times "rotldi" 8  } } */
+
+int foo (int a);
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642ULL))
+return foo (1);
+  if (in == (0x7642ULL))
+return foo (12);
+  if (in == (0x8000ULL))
+return foo (32);
+  if (in == (0x8642ULL))
+return foo (46);
+  if (in == (0x7642ULL))
+return foo (51);
+  if (in == (0x756700ULL))
+return foo (9);
+  if (in == (0xFFF8567FULL))
+return foo (19);
+
+  return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+  if (in == (0x8642LL))
+ 

[PATCH V3 0/3] RISC-V:Add mininal support for Zicbo[mzp]

2022-05-08 Thread shiyulong
From: yulong 

This patchset adds support for three recently ratified RISC-V extensions:

-   Zicbom (Cache-Block Management Instructions)
-   Zicbom (Cache-Block Management Instructions)
-   Zicboz (Cache-Block Zero Instructions)

Patch 1: Add Zicbom/z/p mininal support
Patch 2: Add Zicbom/z/p instructions arch support
Patch 3: Add Zicbom/z/p instructions testcases

diff with the previous two versions:
1.The naming of builtin caused oddities, so we have changed the names of 
builtin.
2.According to spec, we have changed the format of the prefetch.i instruction.

cf. 
;

yulong (3):
  RISC-V: Add mininal support for Zicbo[mzp]
  RISC-V:Cache Management Operation instructions
  RISC-V:Cache Management Operation instructions testcases

 gcc/common/config/riscv/riscv-common.cc   |  8 +++
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/riscv-builtins.cc| 16 ++
 gcc/config/riscv/riscv-cmo.def| 17 ++
 gcc/config/riscv/riscv-ftypes.def |  4 ++
 gcc/config/riscv/riscv-opts.h |  8 +++
 gcc/config/riscv/riscv.md | 52 +++
 gcc/config/riscv/riscv.opt|  3 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 14 files changed, 218 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

-- 
2.17.1



[PATCH V3 1/3] RISC-V: Add mininal support for Zicbo[mzp]

2022-05-08 Thread shiyulong
From: yulong 

This commit adds minimal support for 'Zicbom','Zicboz' and 'Zicbop' extensions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zicbom, zicboz, zicbop 
extensions.
* config/riscv/riscv-opts.h (MASK_ZICBOZ): New.
(MASK_ZICBOM): New.
(MASK_ZICBOP): New.
(TARGET_ZICBOZ): New.
(TARGET_ZICBOM): New.
(TARGET_ZICBOP): New.
* config/riscv/riscv.opt: New.

---
 gcc/common/config/riscv/riscv-common.cc | 8 
 gcc/config/riscv/riscv-opts.h   | 8 
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 19 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1501242e296..bf7a7caabef 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -165,6 +165,10 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zksh",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zk",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1110,6 +1114,10 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zksh",   &gcc_options::x_riscv_zk_subext, MASK_ZKSH},
   {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
 
+  {"zicboz", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOZ},
+  {"zicbom", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOM},
+  {"zicbop", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOP},
+
   {"zve32x",   &gcc_options::x_target_flags, MASK_VECTOR},
   {"zve32f",   &gcc_options::x_target_flags, MASK_VECTOR},
   {"zve64x",   &gcc_options::x_target_flags, MASK_VECTOR},
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 15bb5e76854..1e153b3a6e7 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -145,6 +145,14 @@ enum stack_protector_guard {
 #define TARGET_ZVL32768B ((riscv_zvl_flags & MASK_ZVL32768B) != 0)
 #define TARGET_ZVL65536B ((riscv_zvl_flags & MASK_ZVL65536B) != 0)
 
+#define MASK_ZICBOZ   (1 << 0)
+#define MASK_ZICBOM   (1 << 1)
+#define MASK_ZICBOP   (1 << 2)
+
+#define TARGET_ZICBOZ ((riscv_zicmo_subext & MASK_ZICBOZ) != 0)
+#define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
+#define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
+
 /* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
set, e.g. MASK_ZVL64B has set then MASK_ZVL32B is set, so we can use
popcount to caclulate the minimal VLEN.  */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 492aad12324..d1b3c1840a6 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -209,6 +209,9 @@ int riscv_vector_elen_flags
 TargetVariable
 int riscv_zvl_flags
 
+TargetVariable
+int riscv_zicmo_subext
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.17.1



[PATCH V3 2/3] RISC-V:Cache Management Operation instructions

2022-05-08 Thread shiyulong
From: yulong 

This commit adds cbo.clea,cbo.flush,cbo.inval,cbo.zero,prefetch.i,prefetch.r 
and prefetch.w instructions.
diff with the previous two versions:
1.We change the instruction format from "prefetch.i\t%0" to "prefetch.i\t%a0" 
about the prefetch.i, cbo.clean, cbo.flush, cbo.inval, cbo.zero modes in 
riscv.md.
2.We change the the names of builtin about cbo.clean, cbo.flush, cbo.inval, 
cbo.zero and prefetch.i instructions in the riscv-cmo.def.

gcc/ChangeLog:
* config/riscv/predicates.md (imm5_operand): Add a new operand type for 
prefetch instructions.
* config/riscv/riscv-builtins.cc (AVAIL): Add new AVAILs for CMO ISA 
Extensions.
(RISCV_ATYPE_SI): New.
(RISCV_ATYPE_DI): New.
* config/riscv/riscv-ftypes.def (0): New.
(1): New.
* config/riscv/riscv.md (riscv_clean_): New.
(riscv_flush_): New.
(riscv_inval_): New.
(riscv_zero_): New.
(prefetch): New.
(riscv_prefetchi_): New.
* config/riscv/riscv-cmo.def: New file.

---
 gcc/config/riscv/predicates.md |  4 +++
 gcc/config/riscv/riscv-builtins.cc | 16 +
 gcc/config/riscv/riscv-cmo.def | 17 ++
 gcc/config/riscv/riscv-ftypes.def  |  4 +++
 gcc/config/riscv/riscv.md  | 52 ++
 5 files changed, 93 insertions(+)
 create mode 100644 gcc/config/riscv/riscv-cmo.def

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 97cdbdf053b..3fb4d95ab08 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -239,3 +239,7 @@
 (define_predicate "const63_operand"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 63")))
+
+(define_predicate "imm5_operand"
+  (and (match_code "const_int")
+   (match_test "INTVAL (op) < 5")))
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 0658f8d3047..795132a0c16 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -87,6 +87,18 @@ struct riscv_builtin_description {
 
 AVAIL (hard_float, TARGET_HARD_FLOAT)
 
+
+AVAIL (clean32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (clean64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (flush32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (flush64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (inval32, TARGET_ZICBOM && !TARGET_64BIT)
+AVAIL (inval64, TARGET_ZICBOM && TARGET_64BIT)
+AVAIL (zero32,  TARGET_ZICBOZ && !TARGET_64BIT)
+AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
+AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
+AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
+
 /* Construct a riscv_builtin_description from the given arguments.
 
INSN is the name of the associated instruction pattern, without the
@@ -119,6 +131,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
 /* Argument types.  */
 #define RISCV_ATYPE_VOID void_type_node
 #define RISCV_ATYPE_USI unsigned_intSI_type_node
+#define RISCV_ATYPE_SI intSI_type_node
+#define RISCV_ATYPE_DI intDI_type_node
 
 /* RISCV_FTYPE_ATYPESN takes N RISCV_FTYPES-like type codes and lists
their associated RISCV_ATYPEs.  */
@@ -128,6 +142,8 @@ AVAIL (hard_float, TARGET_HARD_FLOAT)
   RISCV_ATYPE_##A, RISCV_ATYPE_##B
 
 static const struct riscv_builtin_description riscv_builtins[] = {
+  #include "riscv-cmo.def"
+
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
   DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float)
 };
diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
new file mode 100644
index 000..01cbf6ad64f
--- /dev/null
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -0,0 +1,17 @@
+// zicbom
+RISCV_BUILTIN (clean_si, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, clean32),
+RISCV_BUILTIN (clean_di, "zicbom_cbo_clean", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, clean64),
+
+RISCV_BUILTIN (flush_si, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, flush32),
+RISCV_BUILTIN (flush_di, "zicbom_cbo_flush", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, flush64),
+
+RISCV_BUILTIN (inval_si, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, inval32),
+RISCV_BUILTIN (inval_di, "zicbom_cbo_inval", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, inval64),
+
+// zicboz
+RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE, zero32),
+RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE, zero64),
+
+// zicbop
+RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, prefetchi32),
+RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, prefetchi64),
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-ftypes.def 
b/gcc/config/riscv/riscv-ftypes.def
index 2214c496f9b..62421292ce7 100644
--- a/gcc/config/riscv/riscv-ftypes.def
+++ b/gcc/config/riscv/riscv-ftypes.def
@@ -28,3 +28,7 @@ along with GCC; see the file COPYING3.  If n

[PATCH V3 3/3] RISC-V:Cache Management Operation instructions testcases

2022-05-08 Thread shiyulong
From: yulong 

This commit adds testcases about CMO instructions.
diff with the previous two versions:
We change the names of builtin about cbo.clean, cbo.flush, cbo.inval, cbo.zero 
and prefetch.i instructions in the testcases.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbom-1.c: New test.
* gcc.target/riscv/cmo-zicbom-2.c: New test.
* gcc.target/riscv/cmo-zicbop-1.c: New test.
* gcc.target/riscv/cmo-zicbop-2.c: New test.
* gcc.target/riscv/cmo-zicboz-1.c: New test.
* gcc.target/riscv/cmo-zicboz-2.c: New test.

---
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c | 21 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 23 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c |  9 
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c |  9 
 6 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicboz-2.c

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
new file mode 100644
index 000..e2ba2183511
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicbom -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_zicbom_cbo_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_zicbom_cbo_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_zicbom_cbo_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
new file mode 100644
index 000..a605e8b1bdc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbom-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zicbom -mabi=ilp32" } */
+
+int foo1()
+{
+return __builtin_riscv_zicbom_cbo_clean();
+}
+
+int foo2()
+{
+return __builtin_riscv_zicbom_cbo_flush();
+}
+
+int foo3()
+{
+return __builtin_riscv_zicbom_cbo_inval();
+}
+
+/* { dg-final { scan-assembler-times "cbo.clean" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.flush" 1 } } */
+/* { dg-final { scan-assembler-times "cbo.inval" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
new file mode 100644
index 000..c5d78c1763d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
new file mode 100644
index 000..6576365b39c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile target { { rv32-*-*}}} */
+/* { dg-options "-march=rv32gc_zicbop -mabi=ilp32" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+int foo1()
+{
+  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
new file mode 100644
index 000..96c1674ef2d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicboz -mabi=lp64" } */
+
+int foo1()
+{
+return __builtin_riscv_zicboz_cbo_zero(

Re: [PATCH, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-05-08 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Thanks for the patch, some comments are inlined.

on 2022/5/9 09:54, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-09 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * rs6000.md (unspec): Add UNSPEC_FMAX and UNSPEC_FMIN.


Nit: one entry for iterator FMINMAX?

>   (fminmax): New.
>   (minmax_op): Likewise.
>   (3): New pattern.  Implemented by UNSPEC_FMAX and
>   UNSPEC_FMIN.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/pr103605.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index fdfbc6566a5..8aae3e80bcd 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5350,6 +5352,25 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr fminmax [(UNSPEC_FMAX "fmax")
> +   (UNSPEC_FMIN "fmin")])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +

Can we use the later one for both?

Like f3.

> +(define_insn "3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "")
> +   (match_operand:SFDF 2 "vsx_register_operand" "")]

Nit: both SD and DF are mapped to constraint wa, just hardcode Fv to wa?

> +   FMINMAX))]
> +"TARGET_VSX"
> +"xsdp %x0,%x1,%x2"
> +[(set_attr "type" "fp")]
> +)

Maybe it's good to put one comment before this pattern to note that 
xs
can satisfy all required semantics of fmin/fmax.

PR103605 also exposes another problem on bif __builtin_vsx_xs{min,max}dp, both 
bifs are
expanded into xs{min,max}cdp instead of xs{min,max}dp starting from power9.

IMHO, it's something we want to fix as well, based on the reasons:
  1) bif names have the corresponding mnemonics, users would expect 1-1 mapping 
here.
  2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10.
  3) according to uarch info, xs{min,max}cdp use the same units and have the 
same latency,
 no benefits to replace with xs{min,max}cdp.

So I wonder if it would be more clear with:
  1) add new define_insn for xs{min,max}dp
  2) use them for new define_expand of fmin/fmax
  3) use them for bif expansion pattern

BR,
Kewen

> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> new file mode 100644
> index 000..a40da064742
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O1 -mvsx" } */
> +/* { dg-final { scan-assembler-times "xsmaxdp" 2 } } */
> +/* { dg-final { scan-assembler-times "xsmindp" 2 } } */
> +
> +#include 
> +
> +double test1 (double d0, double d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +float test2 (float d0, float d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +double test3 (double d0, double d1)
> +{
> +  return fmax (d0, d1);
> +}
> +
> +float test4 (float d0, float d1)
> +{
> +  return fmax (d0, d1);
> +}
>


[PATCH v2] Strip of a vector load which is only used partially.

2022-05-08 Thread liuhongt via Gcc-patches
Here's adjused patch.
Ok for trunk?

Optimize

  _4 = VEC_PERM_EXPR <_1, _1, { 4, 5, 6, 7, 4, 5, 6, 7 }>;
  _5 = BIT_FIELD_REF <_4, 128, 0>;

to

  _5 = BIT_FIELD_REF <_1, 128, 128>;

gcc/ChangeLog:

PR tree-optimization/102583
* tree-ssa-forwprop.cc (simplify_bitfield_ref): Extended to a
contiguous stride in the VEC_PERM_EXPR.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr102583.c: New test.
* gcc.target/i386/pr92645-2.c: Adjust testcase.
* gcc.target/i386/pr92645-3.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/pr102583.c  | 30 
 gcc/testsuite/gcc.target/i386/pr92645-2.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr92645-3.c |  4 +-
 gcc/tree-ssa-forwprop.cc  | 89 ---
 4 files changed, 96 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr102583.c

diff --git a/gcc/testsuite/gcc.target/i386/pr102583.c 
b/gcc/testsuite/gcc.target/i386/pr102583.c
new file mode 100644
index 000..4ef2f296d0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr102583.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times {(?n)vcvtdq2ps[ \t]+32\(%.*%ymm} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vcvtdq2ps[ \t]+16\(%.*%xmm} 1 } } */
+/* { dg-final { scan-assembler-times {(?n)vmovq[ \t]+16\(%.*%xmm} 1 { target { 
! ia32 } } } } */
+/* { dg-final { scan-assembler-not {(?n)vpermd[ \t]+.*%zmm} } } */
+
+typedef int v16si __attribute__((vector_size(64)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v2sf __attribute__((vector_size(8)));
+
+v8sf part (v16si *srcp)
+{
+  v16si src = *srcp;
+  return (v8sf) { (float)src[8], (float) src[9], (float)src[10], 
(float)src[11],
+  (float)src[12], (float)src[13], (float)src[14], (float)src[15] };
+}
+
+v4sf part1 (v16si *srcp)
+{
+  v16si src = *srcp;
+  return (v4sf) { (float)src[4], (float)src[5], (float)src[6], (float)src[7] };
+}
+
+v2sf part2 (v16si *srcp)
+{
+  v16si src = *srcp;
+  return (v2sf) { (float)src[4], (float)src[5] };
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr92645-2.c 
b/gcc/testsuite/gcc.target/i386/pr92645-2.c
index d34ed3aa8e5..f0608de938a 100644
--- a/gcc/testsuite/gcc.target/i386/pr92645-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr92645-2.c
@@ -29,6 +29,6 @@ void odd (v2si *dst, v4si *srcp)
 }
 
 /* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "cddce1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 3 "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 3 "cddce1" { xfail *-*-* 
} } } */
 /* Ideally highpart extraction would elide the permutation as well.  */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "cddce1" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "cddce1" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr92645-3.c 
b/gcc/testsuite/gcc.target/i386/pr92645-3.c
index 9c08c9fb632..691011195c9 100644
--- a/gcc/testsuite/gcc.target/i386/pr92645-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr92645-3.c
@@ -32,6 +32,6 @@ void odd (v4sf *dst, v8si *srcp)
 /* Four conversions, on the smaller vector type, to not convert excess
elements.  */
 /* { dg-final { scan-tree-dump-times " = \\\(vector\\\(4\\\) float\\\)" 4 
"cddce1" } } */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 3 "cddce1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 3 "cddce1" { xfail *-*-* 
} } } */
 /* Ideally highpart extraction would elide the VEC_PERM_EXPR as well.  */
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "cddce1" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "cddce1" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 484491fa1c5..f91f738895d 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2334,8 +2334,10 @@ simplify_bitfield_ref (gimple_stmt_iterator *gsi)
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
   tree op, op0, op1;
-  tree elem_type;
-  unsigned idx, size;
+  tree elem_type, type;
+  tree p, m, tem;
+  unsigned HOST_WIDE_INT nelts;
+  unsigned idx, size, elem_size;
   enum tree_code code;
 
   op = gimple_assign_rhs1 (stmt);
@@ -2353,42 +2355,75 @@ simplify_bitfield_ref (gimple_stmt_iterator *gsi)
   op1 = TREE_OPERAND (op, 1);
   code = gimple_assign_rhs_code (def_stmt);
   elem_type = TREE_TYPE (TREE_TYPE (op0));
-  if (TREE_TYPE (op) != elem_type)
+  type = TREE_TYPE (op);
+  /* Also hanlde vector type.
+   .i.e.
+   _7 = VEC_PERM_EXPR <_1, _1, { 2, 3, 2, 3 }>;
+   _11 = BIT_FIELD_REF <_7, 64, 0>;
+
+   to
+
+   _11 = BIT_FIELD_REF <_1, 64, 64>.  */
+  if (type != elem_type
+  && (!VECTOR_TYPE_P (type) || TREE_TYPE (type) != elem_type))
 return false;
 
-  size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+  elem_size = size = TREE_INT_CST_LOW (TYPE_SIZE (type));
   if (maybe_ne (b

[PATCH] [Middle-end] Enhance final_value_replacement_loop to handle bitwise induction.

2022-05-08 Thread liuhongt via Gcc-patches
This patch will enable below optimization:

 {
-  int bit;
-  long long unsigned int _1;
-  long long unsigned int _2;
-
[local count: 46707768]:
-
-   [local count: 1027034057]:
-  # tmp_11 = PHI 
-  # bit_13 = PHI 
-  _1 = 1 << bit_13;
-  _2 = ~_1;
-  tmp_8 = _2 & tmp_11;
-  bit_9 = bit_13 + -3;
-  if (bit_9 != -3(OVF))
-goto ; [95.65%]
-  else
-goto ; [4.35%]
-
-   [local count: 46707768]:
-  return tmp_8;
+  tmp_12 = tmp_6(D) & 7905747460161236406;
+  return tmp_12;

 }


Boostrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

PR middle-end/103462
* match.pd (bitwise_induction_p): New match.
* tree-scalar-evolution.c (gimple_bitwise_induction_p):
Declare.
(analyze_and_compute_bitwise_induction_effect): New function.
(enum bit_op_kind): New enum.
(final_value_replacement_loop): Enhanced to handle bitwise
induction.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103462-1.c: New test.
* gcc.target/i386/pr103462-2.c: New test.
* gcc.target/i386/pr103462-3.c: New test.
* gcc.target/i386/pr103462-4.c: New test.
* gcc.target/i386/pr103462-5.c: New test.
* gcc.target/i386/pr103462-6.c: New test.
---
 gcc/match.pd   |   7 +
 gcc/testsuite/gcc.target/i386/pr103462-1.c | 111 +
 gcc/testsuite/gcc.target/i386/pr103462-2.c |  45 ++
 gcc/testsuite/gcc.target/i386/pr103462-3.c | 111 +
 gcc/testsuite/gcc.target/i386/pr103462-4.c |  46 ++
 gcc/testsuite/gcc.target/i386/pr103462-5.c | 111 +
 gcc/testsuite/gcc.target/i386/pr103462-6.c |  46 ++
 gcc/tree-scalar-evolution.cc   | 178 -
 8 files changed, 654 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-6.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 6d691d302b3..24ff5f9e6a8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7746,3 +7746,10 @@ and,
   == TYPE_UNSIGNED (TREE_TYPE (@3
&& single_use (@4)
&& single_use (@5
+
+(for bit_op (bit_and bit_ior bit_xor)
+ (match (bitwise_induction_p @0 @2 @3)
+   (bit_op:c (nop_convert1? (bit_not2?@0 (convert3? (lshift integer_onep@1 
@2 @3)))
+
+(match (bitwise_induction_p @0 @2 @3)
+  (bit_not (nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2)) 
@3
diff --git a/gcc/testsuite/gcc.target/i386/pr103462-1.c 
b/gcc/testsuite/gcc.target/i386/pr103462-1.c
new file mode 100644
index 000..1dc4c2acad6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103462-1.c
@@ -0,0 +1,111 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-sccp-details" } */
+/* { dg-final { scan-tree-dump-times {final value replacement} 12 "sccp" } } */
+
+unsigned long long
+__attribute__((noipa))
+foo (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp &= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo1 (unsigned long long tmp)
+{
+  for (int bit = 63; bit >= 0; bit -= 3)
+tmp &= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo2 (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp &= (1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo3 (unsigned long long tmp)
+{
+  for (int bit = 63; bit >= 0; bit -= 3)
+tmp &= (1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo4 (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp |= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo5 (unsigned long long tmp)
+{
+  for (int bit = 63; bit >= 0; bit -= 3)
+tmp |= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo6 (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp |= (1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo7 (unsigned long long tmp)
+{
+  for (int bit = 63; bit >= 0; bit -= 3)
+tmp |= (1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo8 (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp ^= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo9 (unsigned long long tmp)
+{
+  for (int bit = 63; bit >= 0; bit -= 3)
+tmp ^= ~(1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribute__((noipa))
+foo10 (unsigned long long tmp)
+{
+  for (int bit = 0; bit < 64; bit += 3)
+tmp ^= (1ULL << bit);
+  return tmp;
+}
+
+unsigned long long
+__attribut

[PATCH] [i386] Implement permutation with pslldq + psrldq + por when pshufb is not available.

2022-05-08 Thread liuhongt via Gcc-patches
pand/pandn may be used to clear upper/lower bits of the operands, in
that case there will be 4-5 instructions for permutation, and it's
still better than scalar codes.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?


gcc/ChangeLog:

PR target/105354
* config/i386/i386-expand.cc
(expand_vec_perm_pslldq_psrldq_por): New function.
(ix86_expand_vec_perm_const_1): Try
expand_vec_perm_pslldq_psrldq_por for both 3-instruction and
4/5-instruction sequence.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105354-1.c: New test.
* gcc.target/i386/pr105354-2.c: New test.
---
 gcc/config/i386/i386-expand.cc | 109 +
 gcc/testsuite/gcc.target/i386/pr105354-1.c | 130 +
 gcc/testsuite/gcc.target/i386/pr105354-2.c | 110 +
 3 files changed, 349 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105354-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105354-2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index bc806ffa283..49231e964ba 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -20941,6 +20941,108 @@ expand_vec_perm_vpshufb2_vpermq_even_odd (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* Implement permutation with pslldq + psrldq + por when pshufb is not
+   available.  */
+static bool
+expand_vec_perm_pslldq_psrldq_por (struct expand_vec_perm_d *d, bool pandn)
+{
+  unsigned i, nelt = d->nelt;
+  unsigned start1, end1 = -1;
+  machine_mode vmode = d->vmode, imode;
+  int start2 = -1;
+  bool clear_op0, clear_op1;
+  unsigned inner_size;
+  rtx op0, op1, dop1;
+  rtx (*gen_vec_shr) (rtx, rtx, rtx);
+  rtx (*gen_vec_shl) (rtx, rtx, rtx);
+
+  /* pshufb is available under TARGET_SSSE3.  */
+  if (TARGET_SSSE3 || !TARGET_SSE2
+  /* pshufd can be used for V4SI/V2DI under TARGET_SSE2.  */
+  || (vmode != E_V16QImode && vmode != E_V8HImode))
+return false;
+
+  start1 = d->perm[0];
+  for (i = 1; i < nelt; i++)
+{
+  if (d->perm[i] != d->perm[i-1] + 1)
+   {
+ if (start2 == -1)
+   {
+ start2 = d->perm[i];
+ end1 = d->perm[i-1];
+   }
+ else
+   return false;
+   }
+  else if (d->perm[i] >= nelt
+  && start2 == -1)
+   {
+ start2 = d->perm[i];
+ end1 = d->perm[i-1];
+   }
+}
+
+  clear_op0 = end1 != nelt - 1;
+  clear_op1 = start2 % nelt != 0;
+  /* pandn/pand is needed to clear upper/lower bits of op0/op1.  */
+  if (!pandn && (clear_op0 || clear_op1))
+return false;
+
+  if (d->testing_p)
+return true;
+
+  gen_vec_shr = vmode == E_V16QImode ? gen_vec_shr_v16qi : gen_vec_shr_v8hi;
+  gen_vec_shl = vmode == E_V16QImode ? gen_vec_shl_v16qi : gen_vec_shl_v8hi;
+  imode = GET_MODE_INNER (vmode);
+  inner_size = GET_MODE_BITSIZE (imode);
+  op0 = gen_reg_rtx (vmode);
+  op1 = gen_reg_rtx (vmode);
+
+  if (start1)
+emit_insn (gen_vec_shr (op0, d->op0, GEN_INT (start1 * inner_size)));
+  else
+emit_move_insn (op0, d->op0);
+
+  dop1 = d->op1;
+  if (d->one_operand_p)
+dop1 = d->op0;
+
+  int shl_offset = end1 - start1 + 1 - start2 % nelt;
+  if (shl_offset)
+emit_insn (gen_vec_shl (op1, dop1, GEN_INT (shl_offset * inner_size)));
+  else
+emit_move_insn (op1, dop1);
+
+  /* Clear lower/upper bits for op0/op1.  */
+  if (clear_op0 || clear_op1)
+{
+  rtx vec[16];
+  rtx const_vec;
+  rtx clear;
+  for (i = 0; i != nelt; i++)
+   {
+ if (i < (end1 - start1 + 1))
+   vec[i] = gen_int_mode ((HOST_WIDE_INT_1U << inner_size) - 1, imode);
+ else
+   vec[i] = CONST0_RTX (imode);
+   }
+  const_vec = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, vec));
+  const_vec = validize_mem (force_const_mem (vmode, const_vec));
+  clear = force_reg (vmode, const_vec);
+
+  if (clear_op0)
+   emit_move_insn (op0, gen_rtx_AND (vmode, op0, clear));
+  if (clear_op1)
+   emit_move_insn (op1, gen_rtx_AND (vmode,
+ gen_rtx_NOT (vmode, clear),
+ op1));
+}
+
+  emit_move_insn (d->target, gen_rtx_IOR (vmode, op0, op1));
+  return true;
+}
+
 /* A subroutine of expand_vec_perm_even_odd_1.  Implement extract-even
and extract-odd permutations of two V8QI, V8HI, V16QI, V16HI or V32QI
operands with two "and" and "pack" or two "shift" and "pack" insns.
@@ -21853,6 +21955,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_pshufb2 (d))
 return true;
 
+  if (expand_vec_perm_pslldq_psrldq_por (d, false))
+return true;
+
   if (expand_vec_perm_interleave3 (d))
 return true;
 
@@ -21891,6 +21996,10 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d 
*d)
   if (expand_vec_perm_even_odd (d))
 return true;
 
+  /* Generate four or

Re: [PATCH] [i386] Implement permutation with pslldq + psrldq + por when pshufb is not available.

2022-05-08 Thread Hongtao Liu via Gcc-patches
On Mon, May 9, 2022 at 1:22 PM liuhongt via Gcc-patches
 wrote:
>
> pand/pandn may be used to clear upper/lower bits of the operands, in
> that case there will be 4-5 instructions for permutation, and it's
> still better than scalar codes.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
>
> gcc/ChangeLog:
>
> PR target/105354
> * config/i386/i386-expand.cc
> (expand_vec_perm_pslldq_psrldq_por): New function.
> (ix86_expand_vec_perm_const_1): Try
> expand_vec_perm_pslldq_psrldq_por for both 3-instruction and
> 4/5-instruction sequence.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr105354-1.c: New test.
> * gcc.target/i386/pr105354-2.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc | 109 +
>  gcc/testsuite/gcc.target/i386/pr105354-1.c | 130 +
>  gcc/testsuite/gcc.target/i386/pr105354-2.c | 110 +
>  3 files changed, 349 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105354-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105354-2.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index bc806ffa283..49231e964ba 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -20941,6 +20941,108 @@ expand_vec_perm_vpshufb2_vpermq_even_odd (struct 
> expand_vec_perm_d *d)
>return true;
>  }
>
> +/* Implement permutation with pslldq + psrldq + por when pshufb is not
> +   available.  */
> +static bool
> +expand_vec_perm_pslldq_psrldq_por (struct expand_vec_perm_d *d, bool pandn)
> +{
> +  unsigned i, nelt = d->nelt;
> +  unsigned start1, end1 = -1;
> +  machine_mode vmode = d->vmode, imode;
> +  int start2 = -1;
> +  bool clear_op0, clear_op1;
> +  unsigned inner_size;
> +  rtx op0, op1, dop1;
> +  rtx (*gen_vec_shr) (rtx, rtx, rtx);
> +  rtx (*gen_vec_shl) (rtx, rtx, rtx);
> +
> +  /* pshufb is available under TARGET_SSSE3.  */
> +  if (TARGET_SSSE3 || !TARGET_SSE2
> +  /* pshufd can be used for V4SI/V2DI under TARGET_SSE2.  */
> +  || (vmode != E_V16QImode && vmode != E_V8HImode))
> +return false;
> +
> +  start1 = d->perm[0];
> +  for (i = 1; i < nelt; i++)
> +{
> +  if (d->perm[i] != d->perm[i-1] + 1)
> +   {
> + if (start2 == -1)
> +   {
> + start2 = d->perm[i];
> + end1 = d->perm[i-1];
> +   }
> + else
> +   return false;
> +   }
> +  else if (d->perm[i] >= nelt
> +  && start2 == -1)
> +   {
> + start2 = d->perm[i];
> + end1 = d->perm[i-1];
> +   }
> +}
> +
> +  clear_op0 = end1 != nelt - 1;
> +  clear_op1 = start2 % nelt != 0;
> +  /* pandn/pand is needed to clear upper/lower bits of op0/op1.  */
> +  if (!pandn && (clear_op0 || clear_op1))
> +return false;
> +
> +  if (d->testing_p)
> +return true;
> +
> +  gen_vec_shr = vmode == E_V16QImode ? gen_vec_shr_v16qi : gen_vec_shr_v8hi;
> +  gen_vec_shl = vmode == E_V16QImode ? gen_vec_shl_v16qi : gen_vec_shl_v8hi;
> +  imode = GET_MODE_INNER (vmode);
> +  inner_size = GET_MODE_BITSIZE (imode);
> +  op0 = gen_reg_rtx (vmode);
> +  op1 = gen_reg_rtx (vmode);
> +
> +  if (start1)
> +emit_insn (gen_vec_shr (op0, d->op0, GEN_INT (start1 * inner_size)));
> +  else
> +emit_move_insn (op0, d->op0);
> +
> +  dop1 = d->op1;
> +  if (d->one_operand_p)
> +dop1 = d->op0;
> +
> +  int shl_offset = end1 - start1 + 1 - start2 % nelt;
> +  if (shl_offset)
> +emit_insn (gen_vec_shl (op1, dop1, GEN_INT (shl_offset * inner_size)));
> +  else
> +emit_move_insn (op1, dop1);
> +
> +  /* Clear lower/upper bits for op0/op1.  */
> +  if (clear_op0 || clear_op1)
> +{
> +  rtx vec[16];
> +  rtx const_vec;
> +  rtx clear;
> +  for (i = 0; i != nelt; i++)
> +   {
> + if (i < (end1 - start1 + 1))
> +   vec[i] = gen_int_mode ((HOST_WIDE_INT_1U << inner_size) - 1, 
> imode);
> + else
> +   vec[i] = CONST0_RTX (imode);
> +   }
> +  const_vec = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, vec));
> +  const_vec = validize_mem (force_const_mem (vmode, const_vec));
> +  clear = force_reg (vmode, const_vec);
> +
> +  if (clear_op0)
> +   emit_move_insn (op0, gen_rtx_AND (vmode, op0, clear));
> +  if (clear_op1)
> +   emit_move_insn (op1, gen_rtx_AND (vmode,
> + gen_rtx_NOT (vmode, clear),
> + op1));
> +}
> +
> +  emit_move_insn (d->target, gen_rtx_IOR (vmode, op0, op1));
> +  return true;
> +}
> +
>  /* A subroutine of expand_vec_perm_even_odd_1.  Implement extract-even
> and extract-odd permutations of two V8QI, V8HI, V16QI, V16HI or V32QI
> operands with two "and" and "pack" or two "shift" and "pack" insns.
> @@ -21853,6 +21955,9 @@ ix86_expand_vec_perm_const_1 (struct 
> expand_vec_perm_d *d)
>  

[PATCH] Optimize vec_setv8{hi,hf}_0 + pmovzxbq to pmovzxbq.

2022-05-08 Thread liuhongt via Gcc-patches
Clean up of 16-bit uppers is not needed for pmovzxbq/pmovsxbq.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/105072
* config/i386/sse.md (*sse4_1_v2qiv2di2_1):
New define_insn.
(*sse4_1_zero_extendv2qiv2di2_2): New pre_reload
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105072.c: New test.
---
 gcc/config/i386/sse.md   | 45 +---
 gcc/testsuite/gcc.target/i386/pr105072.c | 24 +
 2 files changed, 65 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105072.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7b791def542..47f8b18b82e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22297,15 +22297,52 @@ (define_insn "sse4_1_v2qiv2di2"
(set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_v2qiv2di2_1"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+   (any_extend:V2DI
+(match_operand:V2QI 1 "memory_operand" "m")))]
+  "TARGET_SSE4_1 && "
+  "%vpmovbq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_expand "v2qiv2di2"
   [(set (match_operand:V2DI 0 "register_operand")
(any_extend:V2DI
- (match_operand:V2QI 1 "register_operand")))]
+(match_operand:V2QI 1 "nonimmediate_operand")))]
   "TARGET_SSE4_1"
 {
-  rtx op1 = force_reg (V2QImode, operands[1]);
-  op1 = lowpart_subreg (V16QImode, op1, V2QImode);
-  emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], op1));
+  if (!MEM_P (operands[1]))
+{
+  rtx op1 = force_reg (V2QImode, operands[1]);
+  op1 = lowpart_subreg (V16QImode, op1, V2QImode);
+  emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], op1));
+  DONE;
+}
+})
+
+(define_insn_and_split "*sse4_1_zero_extendv2qiv2di2_2"
+  [(set (match_operand:V2DI 0 "register_operand")
+   (zero_extend:V2DI
+(vec_select:V2QI
+ (subreg:V16QI
+  (vec_merge:V8_128
+   (vec_duplicate:V8_128
+(match_operand: 1 "nonimmediate_operand"))
+   (match_operand:V8_128 2 "const0_operand")
+   (const_int 1)) 0)
+ (parallel [(const_int 0) (const_int 1)]]
+  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  if (!MEM_P (operands[1]))
+operands[1] = force_reg (mode, operands[1]);
+  operands[1] = lowpart_subreg (V2QImode, operands[1], mode);
+  emit_insn (gen_zero_extendv2qiv2di2 (operands[0], operands[1]));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr105072.c 
b/gcc/testsuite/gcc.target/i386/pr105072.c
new file mode 100644
index 000..54e229731b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105072.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-msse4.1 -O2" } */
+/* { dg-final { scan-assembler-times {(?n)pmovzxbq[ \t]+} "4" } } */
+/* { dg-final { scan-assembler-not {(?n)pinsrw[ \t]+} } } */
+
+#include
+
+__m128i foo (void *p){
+  return _mm_cvtepu8_epi64(_mm_loadu_si16(p));
+}
+
+__m128i foo2 (short a){
+  return _mm_cvtepu8_epi64(_mm_set_epi16(0, 0, 0, 0, 0, 0, 0, a));
+}
+
+__m128i
+foo3 (void *p){
+  return _mm_cvtepu8_epi64((__m128i)__extension__(__m128h) {*(_Float16 
const*)p, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f});
+}
+
+__m128i
+foo4 (_Float16 a){
+  return _mm_cvtepu8_epi64((__m128i)__extension__(__m128h) {a, 0.0f, 0.0f, 
0.0f, 0.0f, 0.0f, 0.0f, 0.0f});
+}
-- 
2.18.1



Re: [PATCH] Optimize vec_setv8{hi,hf}_0 + pmovzxbq to pmovzxbq.

2022-05-08 Thread Hongtao Liu via Gcc-patches
On Mon, May 9, 2022 at 2:43 PM liuhongt via Gcc-patches
 wrote:
>
> Clean up of 16-bit uppers is not needed for pmovzxbq/pmovsxbq.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/105072
> * config/i386/sse.md (*sse4_1_v2qiv2di2_1):
> New define_insn.
> (*sse4_1_zero_extendv2qiv2di2_2): New pre_reload
> define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr105072.c: New test.
> ---
>  gcc/config/i386/sse.md   | 45 +---
>  gcc/testsuite/gcc.target/i386/pr105072.c | 24 +
>  2 files changed, 65 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr105072.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 7b791def542..47f8b18b82e 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22297,15 +22297,52 @@ (define_insn "sse4_1_v2qiv2di2"
> (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
> +(define_insn "*sse4_1_v2qiv2di2_1"
> +  [(set (match_operand:V2DI 0 "register_operand" "=v")
> +   (any_extend:V2DI
> +(match_operand:V2QI 1 "memory_operand" "m")))]
> +  "TARGET_SSE4_1 && "
> +  "%vpmovbq\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix_extra" "1")
> +   (set_attr "prefix" "maybe_evex")
> +   (set_attr "mode" "TI")])
> +
>  (define_expand "v2qiv2di2"
>[(set (match_operand:V2DI 0 "register_operand")
> (any_extend:V2DI
> - (match_operand:V2QI 1 "register_operand")))]
> +(match_operand:V2QI 1 "nonimmediate_operand")))]
>"TARGET_SSE4_1"
>  {
> -  rtx op1 = force_reg (V2QImode, operands[1]);
> -  op1 = lowpart_subreg (V16QImode, op1, V2QImode);
> -  emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], op1));
> +  if (!MEM_P (operands[1]))
> +{
> +  rtx op1 = force_reg (V2QImode, operands[1]);
> +  op1 = lowpart_subreg (V16QImode, op1, V2QImode);
> +  emit_insn (gen_sse4_1_v2qiv2di2 (operands[0], op1));
> +  DONE;
> +}
> +})
> +
> +(define_insn_and_split "*sse4_1_zero_extendv2qiv2di2_2"
> +  [(set (match_operand:V2DI 0 "register_operand")
> +   (zero_extend:V2DI
> +(vec_select:V2QI
> + (subreg:V16QI
> +  (vec_merge:V8_128
> +   (vec_duplicate:V8_128
> +(match_operand: 1 "nonimmediate_operand"))
> +   (match_operand:V8_128 2 "const0_operand")
> +   (const_int 1)) 0)
> + (parallel [(const_int 0) (const_int 1)]]
> +  "TARGET_SSE4_1 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  if (!MEM_P (operands[1]))
> +operands[1] = force_reg (mode, operands[1]);
> +  operands[1] = lowpart_subreg (V2QImode, operands[1], mode);
> +  emit_insn (gen_zero_extendv2qiv2di2 (operands[0], operands[1]));
>DONE;
>  })
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr105072.c 
> b/gcc/testsuite/gcc.target/i386/pr105072.c
> new file mode 100644
> index 000..54e229731b8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr105072.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse4.1 -O2" } */
> +/* { dg-final { scan-assembler-times {(?n)pmovzxbq[ \t]+} "4" } } */
> +/* { dg-final { scan-assembler-not {(?n)pinsrw[ \t]+} } } */
> +
> +#include
> +
> +__m128i foo (void *p){
> +  return _mm_cvtepu8_epi64(_mm_loadu_si16(p));
> +}
> +
> +__m128i foo2 (short a){
> +  return _mm_cvtepu8_epi64(_mm_set_epi16(0, 0, 0, 0, 0, 0, 0, a));
> +}
> +
> +__m128i
> +foo3 (void *p){
> +  return _mm_cvtepu8_epi64((__m128i)__extension__(__m128h) {*(_Float16 
> const*)p, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f});
> +}
> +
> +__m128i
> +foo4 (_Float16 a){
> +  return _mm_cvtepu8_epi64((__m128i)__extension__(__m128h) {a, 0.0f, 0.0f, 
> 0.0f, 0.0f, 0.0f, 0.0f, 0.0f});
> +}
> --
> 2.18.1
>


-- 
BR,
Hongtao