RE: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor

2024-01-01 Thread Li, Pan2
Thanks Jeff for the confirmation and suggestions. It looks like not a corner 
case for the option no-signed-zero.
Consider 2 sample function as below with build with option " -march=rv64gcv 
-mabi=lp64d -O2 -fno-signed-zeros".

void
__attribute__ ((noinline))
test_float_zero_assign_0 (float *a)
{
  *a = +0.0;
}

void
__attribute__ ((noinline))
test_float_zero_assign_1 (float *a)
{
  *a = -0.0;
}

For the first one (aka float 0.0) we have rtl as below:
(insn 6 3 0 2 (set (mem:SF (reg/v/f:DI 134 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_double:SF 0.0 [0x0.0p+0])) "test.c":14:6 -1
 (nil))

But for the second one (aka float -0.0 with no-signed-zero) we have rtl as 
below but we expect const_double -0.0 here.
(insn 6 3 7 2 (set (reg:DI 135
(high:DI (symbol_ref/u:DI ("*.LC0") [flags 0x82]))) "test.c":21:6 -1
 (nil))
(insn 7 6 8 2 (set (reg:SF 136)
(mem/u/c:SF (lo_sum:DI (reg:DI 135)
(symbol_ref/u:DI ("*.LC0") [flags 0x82])) [0  S4 A32])) 
"test.c":21:6 -1
 (nil))

I will have a try to fix it in V3.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, December 30, 2023 11:14 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with 
variable factor



On 12/28/23 22:56, Li, Pan2 wrote:
> Thanks Jeff.
> 
> I think I locate where aarch64 performs the trick here.
> 
> 1. In the .final we have rtl like
> 
> (insn:TI 6 8 29 (set (reg:SF 32 v0)
>  (const_double:SF -0.0 [-0x0.0p+0])) 
> "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 
> {*movsf_aarch64}
>   (nil))
> 
> 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. 
> Aka, it will generate movi\t%0.2s, #0 if
> the aarch64_reg_or_fp_zero is true.
> 
> 1640 (define_insn "*mov_aarch64"
> 1641   [(set (match_operand:SFD 0 "nonimmediate_operand")
> 1642   match_operand:SFD 1 "general_operand"))]
> 1643   "TARGET_FLOAT && (register_operand (operands[0], mode)
> 1644 || aarch64_reg_or_fp_zero (operands[1], mode))"
> 1645   {@ [ cons: =0 , 1   ; attrs: type , arch  ]
> 1646  [ w, Y   ; neon_move   , simd  ] movi\t%0.2s, #0
> 
> 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input 
> rtl will return true in line 10873 because of no-signed-zero is given.
> 
> 10863 bool
> 10864 aarch64_float_const_zero_rtx_p (rtx x
> 10865 {
> 10866   /* 0.0 in Decimal Floating Point cannot be represented by #0 or
> 10867  zr as our callers expect, so no need to check the actual
> 10868  value if X is of Decimal Floating Point type.  */
> 10869   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
> 10870 return false;
> 10871
> 10872   if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
> 10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x));
> 10874   return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
> 10875 }
> 
> I think that explain why we have +0.0 in aarch64 here.
Yup.  Thanks a ton for diving into this.  So I think that points us to 
the right fix, specifically we should be turning -0.0 into 0.0 when 
!HONOR_SIGNED_ZEROS rather than xfailing the test.

I think we'd need to adjust reg_or_0_operand and riscv_output_move, 
probably the G constraint as well.   We might also need to adjust 
move_operand and perhaps riscv_legitimize_move.

jeff


Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2024-01-01 Thread waffl3x
I've been at this for a while, and I'm not sure what the proper way to
fix this is.

```
struct S;

struct B {
  void f(this S&) {}
  void g() {}
};

struct S : B {
  using B::f;
  using B::g;
  void f() {}
  void g(this S&) {}
};

int main()
{
  S s{};
  s.f();
  s.g();
}
```

In short, the call to f is ambiguous, but the call to g is not. I
already know where the problem is, but since I share this code in
places that don't know about whether a function was introduced by a
using declaration (cand_parms_match), I don't want to rely on that to
solve the problem.

```
  /* An iobj member function's object parameter can't be an unrelated type, if
 the xobj member function's object parameter is an unrelated type we know
 they must not correspond to each other.  If the iobj member function was
 introduced with a using declaration, then the type of its object parameter
 is still that of the class we are currently adding a member function to,
 so this assumption holds true in that case as well.

 [over.match.funcs.general.4]
 For non-conversion functions that are implicit object member
 functions nominated by a using-declaration in a derived class, the
 function is considered to be a member of the derived class for the purpose
 of defining the type of the implicit object parameter.

 We don't get to bail yet out even if the xobj parameter is by-value as
 elaborated on below.

 This also implicitly handles xobj parameters of type pointer.  */
  if (DECL_CONTEXT (xobj_fn) != TYPE_MAIN_VARIANT (non_reference (xobj_param)))
return false;
```

I feel like what we are actually supposed to be doing to be to the
letter of the standard is to be creating a new function entirely, with
a decl_context of the original class, which sounds omega silly, and
might bring a new set of problems.

I think I might have came up with an unfortunately fairly convoluted
way to solve this just now, but I don't know if it brings another new
set of problems. The assumptions I had when I originally implemented
this in add_method bled through when I broke it out into it's own
function. At the very least I need to better document how the function
is intended to be used, at worst I'll need to consider whether it makes
sense to be reusing this logic if the use cases are subtly different.

I don't think the latter is the case now though, I'm noticing GCC just
has a bug in general with constraints and using declarations.

https://godbolt.org/z/EbGvjfG7E

So it might actually just be better to be rewriting functions that are
introduced by using declarations, I have a feeling that will be what
introduces the least pain.

I'm not sure where exactly GCC is deciding that a function introduced
by a using declaration is different from an otherwise corresponding one
declared directly in that class, but I have a feeling on where it is.
Obviously it's in joust, but I'm not sure the object parameters are
actually being compared.

I'll investigate this bug and get back to you, I imagine fixing it is
going to be key to actually implementing the xobj case without hacks.

Finding both these issues has slowed down my next revision as I noticed
the problem while cleaning up my implementation of CWG2789. I want to
note, I am implementing it as if it specifies corresponding object
arguments, not object arguments of the same type, as we previously
discussed, I believe that to be the right resolution as there are
really bad edge cases with the current wording.

Alex

On Tuesday, December 26th, 2023 at 9:37 AM, Jason Merrill  
wrote:


> 
> 
> On 12/23/23 02:10, waffl3x wrote:
> 
> > On Friday, December 22nd, 2023 at 10:26 AM, Jason Merrill ja...@redhat.com 
> > wrote:
> > 
> > > On 12/22/23 04:01, waffl3x wrote:
> > > 
> > > > int n = 0;
> > > > auto f = [](this Self){
> > > > static_assert(__is_same (decltype(n), int));
> > > > decltype((n)) a; // { dg-error {is not captured} }
> > > > };
> > > > f();
> > > > 
> > > > Could you clarify if this error being removed was intentional. I do
> > > > recall that Patrick Palka wanted to remove this error in his patch, but
> > > > it seemed to me like you stated it would be incorrect to allow it.
> > > > Since the error is no longer present I assume I am misunderstanding the
> > > > exchange.
> > > > 
> > > > In any case, let me know if I need to modify my test case or if this
> > > > error needs to be added back in.
> > > 
> > > Removing the error was correct under
> > > https://eel.is/c++draft/expr.prim#id.unqual-3
> > > Naming n in that lambda would not refer a capture by copy, so the
> > > decltype is the same as outside the lambda.
> > 
> > Alright, I've fixed my tests to reflect that.
> > 
> > I've got defaulting assignment operators working. Defaulting equality
> > and comparison operators seemed to work out of the box somehow, so I
> > just have to make some fleshed out tests for those cases.
> > 
> > There can always be more tests, I have a few ideas for what still

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2024-01-01 Thread waffl3x
That was faster than I expected, the problem is exactly just that we
aren't implementing [over.match.funcs.general.4] at all. The result of
compparms for the 2 functions is false which I believe to be wrong. I
believe we have a few choices here, but no matter what we go with it
will be a bit of an overhaul. I will post a PR on bugzilla in a little
bit as this problem feels somewhat out of the scope of my patch now.

I think what I will do is instead of comparing the xobj parameter to
the DECL_CONTEXT of the xobj function, I will compare it to the type of
the iobj member function's object parameter. If I do it like this, it
will work as expected if we rewrite functions that are introduced with
a using declaration.

This might still cause problems, I will look into how the this pointer
for iobj member functions is determined again. Depending on how it is
determined, it might be possible to change the function signature of
iobj member functions without altering their behavior. It would be
incorrect, and would change the meaning of code, if changing the
function signature changed the type of the this pointer.

Anyhow, this is a fairly big change to consider so I won't pretend I
know what the right call is. But the way I've decided to implement
correspondence checking will be consistent with how GCC currently
(incorrectly) treats constraints on iobj member functions introduced
with a using declaration, so I think doing it this way is the right
choice for now.

Some days feel really unproductive when the majority is investigation
and thinking. This was one of them, but at least I'm confident that my
conclusions are correct. Aren't edge cases fun?

Alex

On Monday, January 1st, 2024 at 8:17 AM, waffl3x  wrote:


> 
> 
> I've been at this for a while, and I'm not sure what the proper way to
> fix this is.
> 
> `struct S; struct B { void f(this S&) {} void g() {} }; struct S : B { using 
> B::f; using B::g; void f() {} void g(this S&) {} }; int main() { S s{}; 
> s.f(); s.g(); }`
> 
> In short, the call to f is ambiguous, but the call to g is not. I
> already know where the problem is, but since I share this code in
> places that don't know about whether a function was introduced by a
> using declaration (cand_parms_match), I don't want to rely on that to
> solve the problem.
> 
> `/* An iobj member function's object parameter can't be an unrelated type, if 
> the xobj member function's object parameter is an unrelated type we know they 
> must not correspond to each other. If the iobj member function was introduced 
> with a using declaration, then the type of its object parameter is still that 
> of the class we are currently adding a member function to, so this assumption 
> holds true in that case as well. [over.match.funcs.general.4] For 
> non-conversion functions that are implicit object member functions nominated 
> by a using-declaration in a derived class, the function is considered to be a 
> member of the derived class for the purpose of defining the type of the 
> implicit object parameter. We don't get to bail yet out even if the xobj 
> parameter is by-value as elaborated on below. This also implicitly handles 
> xobj parameters of type pointer. */ if (DECL_CONTEXT (xobj_fn) != 
> TYPE_MAIN_VARIANT (non_reference (xobj_param))) return false;`
> 
> I feel like what we are actually supposed to be doing to be to the
> letter of the standard is to be creating a new function entirely, with
> a decl_context of the original class, which sounds omega silly, and
> might bring a new set of problems.
> 
> I think I might have came up with an unfortunately fairly convoluted
> way to solve this just now, but I don't know if it brings another new
> set of problems. The assumptions I had when I originally implemented
> this in add_method bled through when I broke it out into it's own
> function. At the very least I need to better document how the function
> is intended to be used, at worst I'll need to consider whether it makes
> sense to be reusing this logic if the use cases are subtly different.
> 
> I don't think the latter is the case now though, I'm noticing GCC just
> has a bug in general with constraints and using declarations.
> 
> https://godbolt.org/z/EbGvjfG7E
> 
> So it might actually just be better to be rewriting functions that are
> introduced by using declarations, I have a feeling that will be what
> introduces the least pain.
> 
> I'm not sure where exactly GCC is deciding that a function introduced
> by a using declaration is different from an otherwise corresponding one
> declared directly in that class, but I have a feeling on where it is.
> Obviously it's in joust, but I'm not sure the object parameters are
> actually being compared.
> 
> I'll investigate this bug and get back to you, I imagine fixing it is
> going to be key to actually implementing the xobj case without hacks.
> 
> Finding both these issues has slowed down my next revision as I noticed
> the problem while cleaning up my impl

[PATCH] config-ml.in: Fix multi-os-dir search

2024-01-01 Thread YunQiang Su
When building multilib libraries, CC/CXX etc are set with an option
-B*/lib/, instead of -B/lib/.
This will make some trouble in some case, for example building
cross toolchain based on Debian's cross packages:

  If we have libc6-dev-i386-amd64-cross packages installed on
  a non-x86 machine. This package will have the files in
  /usr/x86_4-linux-gnu/lib32.  The fellow configure will fail
  when build libgcc for i386, with complains the libc is not
  i386 ones:
 ../configure --enable-multilib --enable-multilib \
--target=x86_64-linux-gnu

Let's insert a "-B*/lib/`CC ${flags} --print-multi-os-directory`"
before "-B*/lib/".

This patch is based on the patch used by Debian now.

ChangeLog

* config-ml.in: Insert an -B option with multi-os-dir into
compiler commands used to build libraries.
---
 config-ml.in | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/config-ml.in b/config-ml.in
index 68854a4f16c..645cac822fd 100644
--- a/config-ml.in
+++ b/config-ml.in
@@ -514,7 +514,12 @@ multi-do:
else \
  if [ -d ../$${dir}/$${lib} ]; then \
flags=`echo $$i | sed -e 's/^[^;]*;//' -e 's/@/ -/g'`; \
-   if (cd ../$${dir}/$${lib}; $(MAKE) $(FLAGS_TO_PASS) \
+   libsuffix_=`$${compiler} $${flags} --print-multi-os-directory`; 
\
+   if (cd ../$${dir}/$${lib}; $(MAKE) $(subst \
+   -B$(build_tooldir)/lib/, \
+   -B$(build_tooldir)/lib/$${libsuffix_}/ \
+   -B$(build_tooldir)/lib/, \
+   $(FLAGS_TO_PASS)) \
CFLAGS="$(CFLAGS) $${flags}" \
CCASFLAGS="$(CCASFLAGS) $${flags}" \
FCFLAGS="$(FCFLAGS) $${flags}" \
@@ -768,6 +773,7 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
# Create a regular expression that matches any string as long
# as ML_POPDIR.
popdir_rx=`echo "${ML_POPDIR}" | sed 's,.,.,g'`
+   multi_osdir=`${CC-gcc} ${flags} --print-multi-os-directory 2>/dev/null`
CC_=
for arg in ${CC}; do
  case $arg in
@@ -775,6 +781,8 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
CC_="${CC_}"`echo "X${arg}" | sed -n 
"s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X-[BIL]${popdir_rx}\\(.*\\)/\1/p"`' ' ;;
  "${ML_POPDIR}"/*)
CC_="${CC_}"`echo "X${arg}" | sed -n 
"s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
+ -B*/lib/)
+   CC_="${CC_}${arg}${multi_osdir} ${arg} " ;;
  *)
CC_="${CC_}${arg} " ;;
  esac
@@ -787,6 +795,8 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
CXX_="${CXX_}"`echo "X${arg}" | sed -n 
"s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X-[BIL]${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
  "${ML_POPDIR}"/*)
CXX_="${CXX_}"`echo "X${arg}" | sed -n 
"s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
+ -B*/lib/)
+   CXX_="${CXX_}${arg}${multi_osdir} ${arg} " ;;
  *)
CXX_="${CXX_}${arg} " ;;
  esac
@@ -799,6 +809,8 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
F77_="${F77_}"`echo "X${arg}" | sed -n 
"s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X-[BIL]${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
  "${ML_POPDIR}"/*)
F77_="${F77_}"`echo "X${arg}" | sed -n 
"s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
+ -B*/lib/)
+   F77_="${F77_}${arg}${multi_osdir} ${arg} " ;;
  *)
F77_="${F77_}${arg} " ;;
  esac
@@ -811,6 +823,8 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
GFORTRAN_="${GFORTRAN_}"`echo "X${arg}" | sed -n 
"s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X-[BIL]${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
  "${ML_POPDIR}"/*)
GFORTRAN_="${GFORTRAN_}"`echo "X${arg}" | sed -n 
"s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
+ -B*/lib/)
+   GFORTRAN_="${GFORTRAN_}${arg}${multi_osdir} ${arg} " ;;
  *)
GFORTRAN_="${GFORTRAN_}${arg} " ;;
  esac
@@ -823,6 +837,8 @@ if [ -n "${multidirs}" ] && [ -z "${ml_norecursion}" ]; then
GOC_="${GOC_}"`echo "X${arg}" | sed -n 
"s/X\\(-[BIL]${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | sed -n 
"s/X-[BIL]${popdir_rx}\\(.*\\)/\\1/p"`' ' ;;
  "${ML_POPDIR}"/*)
GOC_="${GOC_}"`echo "X${arg}" | sed -n 
"s/X\\(${popdir_rx}\\).*/\\1/p"`/${ml_dir}`echo "X${arg}" | se

Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2024-01-01 Thread waffl3x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113191

I've posted the report here, I ended up doing a bit more investigation,
so the contents might interest you. I'm really starting to think that
we want to do a more thorough re-engineering of how using declarations
are handled, instead of handling it with little hacks like the one
added to more_specialized_fn.

As I note in the report, the addition of xobj member functions really
makes [over.match.funcs.general.4] a lot more relevant, and I don't
think we can get away with not following it more closely anymore. I
know I'm wording myself here as if that passage has existed forever,
but I recognize it might be as recent as C++23 that it was added. I
don't mean to imply anything with how I'm wording it, it's just way
easier to express it this way. Especially since we really could get
away with these kinds of hacks if xobj member functions did not exist.
Unfortunately, the type of the implicit object parameter is suddenly
relevant for cases like this.

Anyway, as I've stated a few times, I'm going to implement my function
that checks correspondence of the object parameter of iobj and xobj
member functions assuming that iobj member functions introduced by
using declarations are handled properly. I think that's the best option
for my patch right now.

Well that investigation took the majority of my day. I'm just glad I'm
certain of what direction to take now.

Alex


On Monday, January 1st, 2024 at 8:34 AM, waffl3x  wrote:


> 
> 
> That was faster than I expected, the problem is exactly just that we
> aren't implementing [over.match.funcs.general.4] at all. The result of
> compparms for the 2 functions is false which I believe to be wrong. I
> believe we have a few choices here, but no matter what we go with it
> will be a bit of an overhaul. I will post a PR on bugzilla in a little
> bit as this problem feels somewhat out of the scope of my patch now.
> 
> I think what I will do is instead of comparing the xobj parameter to
> the DECL_CONTEXT of the xobj function, I will compare it to the type of
> the iobj member function's object parameter. If I do it like this, it
> will work as expected if we rewrite functions that are introduced with
> a using declaration.
> 
> This might still cause problems, I will look into how the this pointer
> for iobj member functions is determined again. Depending on how it is
> determined, it might be possible to change the function signature of
> iobj member functions without altering their behavior. It would be
> incorrect, and would change the meaning of code, if changing the
> function signature changed the type of the this pointer.
> 
> Anyhow, this is a fairly big change to consider so I won't pretend I
> know what the right call is. But the way I've decided to implement
> correspondence checking will be consistent with how GCC currently
> (incorrectly) treats constraints on iobj member functions introduced
> with a using declaration, so I think doing it this way is the right
> choice for now.
> 
> Some days feel really unproductive when the majority is investigation
> and thinking. This was one of them, but at least I'm confident that my
> conclusions are correct. Aren't edge cases fun?
> 
> Alex
> 
> On Monday, January 1st, 2024 at 8:17 AM, waffl3x waff...@protonmail.com wrote:
> 
> 
> 
> > I've been at this for a while, and I'm not sure what the proper way to
> > fix this is.
> > 
> > `struct S; struct B { void f(this S&) {} void g() {} }; struct S : B { 
> > using B::f; using B::g; void f() {} void g(this S&) {} }; int main() { S 
> > s{}; s.f(); s.g(); }`
> > 
> > In short, the call to f is ambiguous, but the call to g is not. I
> > already know where the problem is, but since I share this code in
> > places that don't know about whether a function was introduced by a
> > using declaration (cand_parms_match), I don't want to rely on that to
> > solve the problem.
> > 
> > `/* An iobj member function's object parameter can't be an unrelated type, 
> > if the xobj member function's object parameter is an unrelated type we know 
> > they must not correspond to each other. If the iobj member function was 
> > introduced with a using declaration, then the type of its object parameter 
> > is still that of the class we are currently adding a member function to, so 
> > this assumption holds true in that case as well. 
> > [over.match.funcs.general.4] For non-conversion functions that are implicit 
> > object member functions nominated by a using-declaration in a derived 
> > class, the function is considered to be a member of the derived class for 
> > the purpose of defining the type of the implicit object parameter. We don't 
> > get to bail yet out even if the xobj parameter is by-value as elaborated on 
> > below. This also implicitly handles xobj parameters of type pointer. */ if 
> > (DECL_CONTEXT (xobj_fn) != TYPE_MAIN_VARIANT (non_reference (xobj_param))) 
> > return false;`
> > 
> > I feel like what we are actually 

[RFA] [V3] new pass for sign/zero extension elimination

2024-01-01 Thread Jeff Law
I know we're deep into stage3 and about to transition to stage4.  So if 
the consensus is for this to wait, I'll understand


This it the V3 of the ext-dce patch based on Joern's work from last year.

Changes since V2:
  Handle MINUS
  Minor logic cleanup for SUBREGs in ext_dce_process_sets
  Includes Joern's carry_backpropagate work
  Cleaned up and removed some use handling code for STRICT_LOW_PART
  Moved non-local goto special case out of main use handling, similar to
  how we handle CALL_INSN_FUSAGE
  Use df_simple_dataflow rather than custom dataflow handling

There's more cleanups we could be doing here, but the question is do we 
stop commit what we've got and iterate on the trunk or do we defer until 
gcc-15 in which case we iterate on a branch or something.




This still is enabled at -O1 or above, but that's to get as much testing 
as possible.  Assuming the rest is ACK'd for the trunk we'll put it into 
the list of optimizations enabled by -O2.
	PR target/95650
	PR rtl-optimization/96031
	PR rtl-optimization/104387
	PR rtl-optimization/111384

gcc/
	* Makefile.in (OBJS): Add ext-dce.o.
	* common.opt (ext-dce): Add new option.
	* df-scan.cc (df_get_exit_block_use_set): No longer static.
	* df.h (df_get_exit_block_use_set): Prototype.
	* ext-dce.cc: New file.
	* passes.def: Add ext-dce before combine.
	* tree-pass.h (make_pass_ext_dce): Prototype..

gcc/testsuite
	* gcc.target/riscv/core_bench_list.c: New test.
	* gcc.target/riscv/core_init_matrix.c: New test.
	* gcc.target/riscv/core_list_init.c: New test.
	* gcc.target/riscv/matrix_add_const.c: New test.
	* gcc.target/riscv/mem-extend.c: New test.
	* gcc.target/riscv/pr111384.c: New test.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 754eceb23bb..3450eb860c6 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1451,6 +1451,7 @@ OBJS = \
 	explow.o \
 	expmed.o \
 	expr.o \
+	ext-dce.o \
 	fibonacci_heap.o \
 	file-prefix-map.o \
 	final.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index d263a959df3..8bbcaad2ec4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3846,4 +3846,8 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fext-dce
+Common Var(flag_ext_dce, 1) Optimization Init(0)
+Perform dead code elimination on zero and sign extensions with special dataflow analysis.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/df-scan.cc b/gcc/df-scan.cc
index 934c9ca2d81..93c0ba4e15c 100644
--- a/gcc/df-scan.cc
+++ b/gcc/df-scan.cc
@@ -78,7 +78,6 @@ static void df_get_eh_block_artificial_uses (bitmap);
 
 static void df_record_entry_block_defs (bitmap);
 static void df_record_exit_block_uses (bitmap);
-static void df_get_exit_block_use_set (bitmap);
 static void df_get_entry_block_def_set (bitmap);
 static void df_grow_ref_info (struct df_ref_info *, unsigned int);
 static void df_ref_chain_delete_du_chain (df_ref);
@@ -3642,7 +3641,7 @@ df_epilogue_uses_p (unsigned int regno)
 
 /* Set the bit for regs that are considered being used at the exit. */
 
-static void
+void
 df_get_exit_block_use_set (bitmap exit_block_uses)
 {
   unsigned int i;
diff --git a/gcc/df.h b/gcc/df.h
index 402657a7076..abcbb097734 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1091,6 +1091,7 @@ extern bool df_epilogue_uses_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
 extern void df_scan_verify (void);
+extern void df_get_exit_block_use_set (bitmap);
 
 
 /*
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
new file mode 100644
index 000..379264e0bca
--- /dev/null
+++ b/gcc/ext-dce.cc
@@ -0,0 +1,964 @@
+/* RTL dead zero/sign extension (code) elimination.
+   Copyright (C) 2000-2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "insn-config.h"
+#include "emit-rtl.h"
+#include "recog.h"
+#include "cfganal.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+#include "rtl-iter.h"
+#include "df.h"
+#include "print-rtl.h"
+
+/* These should probably move into a C++ class.  */
+static vec livein;
+static bitmap all_blocks;
+static bitmap livenow;
+static bitmap c

Re: Fortran: Use non conflicting file extensions for intermediates [PR81615]

2024-01-01 Thread Harald Anlauf

Hi Thomas!

Am 30.12.23 um 12:08 schrieb Thomas Koenig:

Replying to myself...



I think this also desevers a mention in changes.html.  Here is something
that I came up with.  OK? Or does anybody have suggestions for a better
wording?



Or maybe this is better:

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 4b83037a..d232f631 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -282,8 +282,14 @@ a work-in-progress.

  

-
-
+Fortran
+
+   With the -save-temps option, preprocessed files
+    with the .fii extension will be generated for
+    free-form source files such as .F90 and
+    .fi for fixed-form files such as .F.
+  
+
  


I slightly prefer this variant.

I wonder if it were better to write "generated from" instead of
"generated for".  A native speaker might help here.

While at it: gfortran now accepts "-std=f2023", which implies that
the limit for line-length in free-form has been increased to 1
characters, and statements may have up to 1 million characters.
(See Tobias' commit r14-5553-gb9eba3baf54b4f).

I'd consider this as important as the other change... ;-)

Thanks,
Harald


  







Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-01 Thread 钟居哲
This is Ok from my side.
But before commit this patch, I think we need this patch first:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html 

I will be back to work so I will take a look at other patches today.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2024-01-01 01:43
To: Jun Sha (Joshua); gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
juzhe.zhong; Jin Ma; Xianmiao Qu
Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
XTheadVector.
 
 
On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> This patch adds th. prefix to all XTheadVector instructions by
> implementing new assembly output functions. We only check the
> prefix is 'v', so that no extra attribute is needed.
> 
> gcc/ChangeLog:
> 
> * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> New function to add assembler insn code prefix/suffix.
> * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> 
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
>   gcc/config/riscv/riscv-protos.h|  1 +
>   gcc/config/riscv/riscv.cc  | 14 ++
>   gcc/config/riscv/riscv.h   |  4 
>   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
>   4 files changed, 31 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> 
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 31049ef7523..5ea54b45703 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -102,6 +102,7 @@ struct riscv_address_info {
>   };
>   
>   /* Routines implemented in riscv.cc.  */
> +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> *p);
>   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
>   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
>   extern int riscv_float_const_rtx_index_for_fli (rtx);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 0d1cbc5cb5f..ea1d59d9cf2 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> return lmul;
>   }
>   
> +/* Define ASM_OUTPUT_OPCODE to do anything special before
> +   emitting an opcode.  */
> +const char *
> +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> +{
> +  /* We need to add th. prefix to all the xtheadvector
> + insturctions here.*/
> +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> +  p[0] == 'v')
> +fputs ("th.", asm_out_file);
> +
> +  return p;
Just a formatting nit. The GNU standards break lines before the 
operator, not after.  So
   if (TARGET_XTHEADVECTOR
   && current_output_insn != NULL
   && p[0] == 'v')
 
Note that current_output_insn is "extern rtx_insn *", so use NULL, not 
NULL_RTX.
 
Neither of these nits require a new version for review.  Just fix them.
 
If Juzhe is fine with this, so am I.  We can refine it if necessary later.
 
jeff
 


[committed] RISC-V: Add crypto machine descriptions

2024-01-01 Thread Feng Wang
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:

* config/riscv/iterators.md: Add rotate insn name.
* config/riscv/riscv.md: Add new insns name for crypto vector.
* config/riscv/vector-iterators.md: Add new iterators for crypto vector.
* config/riscv/vector.md: Add the corresponding attr for crypto vector.
* config/riscv/vector-crypto.md: New file.The machine descriptions for 
crypto vector.
---
 gcc/config/riscv/iterators.md|   4 +-
 gcc/config/riscv/riscv.md|  33 +-
 gcc/config/riscv/vector-crypto.md| 654 +++
 gcc/config/riscv/vector-iterators.md |  36 ++
 gcc/config/riscv/vector.md   |  55 ++-
 5 files changed, 761 insertions(+), 21 deletions(-)
 create mode 100755 gcc/config/riscv/vector-crypto.md

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index ecf033f2fa7..f332fba7031 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -304,7 +304,9 @@
 (umax "maxu")
 (clz "clz")
 (ctz "ctz")
-(popcount "cpop")])
+(popcount "cpop")
+(rotate "rol")
+(rotatert "ror")])
 
 ;; ---
 ;; Int Iterators.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 68f7203b676..52c5ce30115 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -428,6 +428,34 @@
 ;; vcompressvector compress instruction
 ;; vmov whole vector register move
 ;; vector   unknown vector instruction
+;; 17. Crypto Vector instructions
+;; vandncrypto vector bitwise and-not instructions
+;; vbrevcrypto vector reverse bits in elements instructions
+;; vbrev8   crypto vector reverse bits in bytes instructions
+;; vrev8crypto vector reverse bytes instructions
+;; vclz crypto vector count leading Zeros instructions
+;; vctz crypto vector count lrailing Zeros instructions
+;; vrol crypto vector rotate left instructions
+;; vror crypto vector rotate right instructions
+;; vwsllcrypto vector widening shift left logical instructions
+;; vclmul   crypto vector carry-less multiply - return low half 
instructions
+;; vclmulh  crypto vector carry-less multiply - return high half 
instructions
+;; vghshcrypto vector add-multiply over GHASH Galois-Field instructions
+;; vgmulcrypto vector multiply over GHASH Galois-Field instrumctions
+;; vaesef   crypto vector AES final-round encryption instructions
+;; vaesem   crypto vector AES middle-round encryption instructions
+;; vaesdf   crypto vector AES final-round decryption instructions
+;; vaesdm   crypto vector AES middle-round decryption instructions
+;; vaeskf1  crypto vector AES-128 Forward KeySchedule generation 
instructions
+;; vaeskf2  crypto vector AES-256 Forward KeySchedule generation 
instructions
+;; vaeszcrypto vector AES round zero encryption/decryption instructions
+;; vsha2ms  crypto vector SHA-2 message schedule instructions
+;; vsha2ch  crypto vector SHA-2 two rounds of compression instructions
+;; vsha2cl  crypto vector SHA-2 two rounds of compression instructions
+;; vsm4kcrypto vector SM4 KeyExpansion instructions
+;; vsm4rcrypto vector SM4 Rounds instructions
+;; vsm3me   crypto vector SM3 Message Expansion instructions
+;; vsm3ccrypto vector SM3 Compression instructions
 (define_attr "type"
   "unknown,branch,jump,jalr,ret,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
@@ -447,7 +475,9 @@
vired,viwred,vfredu,vfredo,vfwredu,vfwredo,
vmalu,vmpop,vmffs,vmsfs,vmiota,vmidx,vimovvx,vimovxv,vfmovvf,vfmovfv,
vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,
-   vgather,vcompress,vmov,vector"
+   
vgather,vcompress,vmov,vector,vandn,vbrev,vbrev8,vrev8,vclz,vctz,vcpop,vrol,vror,vwsll,
+   
vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz,
+   vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c"
   (cond [(eq_attr "got" "load") (const_string "load")
 
 ;; If a doubleword move uses these expensive instructions,
@@ -3786,6 +3816,7 @@
 (include "thead.md")
 (include "generic-ooo.md")
 (include "vector.md")
+(include "vector-crypto.md")
 (include "zicond.md")
 (include "sfb.md")
 (include "zc.md")
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
new file mode 100755
index 000..e40b1543954
--- /dev/null
+++ b/gcc/config/riscv/vector-crypto.md
@@ -0,0 +1,654 @@
+;; Machine description for the RISC-V Vector Crypto  extensions.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+
+

Re: [PATCH v4] RISC-V: Change csr_operand into

2024-01-01 Thread juzhe.zh...@rivai.ai
LGTM assume you have passed the regression.



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-29 12:06
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Change csr_operand into
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.
 
gcc/ChangeLog:
 
* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/vector.md | 8 
1 file changed, 4 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f607d768b26..b5a9055cdc4 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1496,7 +1496,7 @@
(define_insn "@vsetvl"
   [(set (match_operand:P 0 "register_operand" "=r")
- (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+ (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1542,7 +1542,7 @@
;; in vsetvl instruction pattern.
(define_insn "@vsetvl_discard_result"
   [(set (reg:SI VL_REGNUM)
- (unspec:SI [(match_operand:P 0 "csr_operand" "rK")
+ (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK")
(match_operand 1 "const_int_operand" "i")
(match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL))
(set (reg:SI VTYPE_REGNUM)
@@ -1564,7 +1564,7 @@
;; such pattern can allow us gain benefits of these optimizations.
(define_insn_and_split "@vsetvl_no_side_effects"
   [(set (match_operand:P 0 "register_operand" "=r")
- (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+ (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1608,7 +1608,7 @@
   [(set (match_operand:DI 0 "register_operand")
 (sign_extend:DI
   (subreg:SI
- (unspec:DI [(match_operand:P 1 "csr_operand")
+ (unspec:DI [(match_operand:P 1 "vector_length_operand")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")
(match_operand 4 "const_int_operand")
-- 
2.17.1
 
 


Re: [PATCH v4] RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns.

2024-01-01 Thread juzhe.zh...@rivai.ai
LGTM assume you have passed the regression.



juzhe.zh...@rivai.ai
 
From: Jun Sha (Joshua)
Date: 2023-12-29 12:10
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Change csr_operand into vector_length_operand for 
vsetvl patterns.
This patch use vector_length_operand instead of csr_operand for
vsetvl patterns, so that changes for vector will not affect scalar
patterns using csr_operand in riscv.md.
 
gcc/ChangeLog:
 
* config/riscv/vector.md:
Use vector_length_operand for vsetvl patterns.
 
Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
gcc/config/riscv/vector.md | 8 
1 file changed, 4 insertions(+), 4 deletions(-)
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index f607d768b26..b5a9055cdc4 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1496,7 +1496,7 @@
(define_insn "@vsetvl"
   [(set (match_operand:P 0 "register_operand" "=r")
- (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+ (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1542,7 +1542,7 @@
;; in vsetvl instruction pattern.
(define_insn "@vsetvl_discard_result"
   [(set (reg:SI VL_REGNUM)
- (unspec:SI [(match_operand:P 0 "csr_operand" "rK")
+ (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK")
(match_operand 1 "const_int_operand" "i")
(match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL))
(set (reg:SI VTYPE_REGNUM)
@@ -1564,7 +1564,7 @@
;; such pattern can allow us gain benefits of these optimizations.
(define_insn_and_split "@vsetvl_no_side_effects"
   [(set (match_operand:P 0 "register_operand" "=r")
- (unspec:P [(match_operand:P 1 "csr_operand" "rK")
+ (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
   (match_operand 2 "const_int_operand" "i")
   (match_operand 3 "const_int_operand" "i")
   (match_operand 4 "const_int_operand" "i")
@@ -1608,7 +1608,7 @@
   [(set (match_operand:DI 0 "register_operand")
 (sign_extend:DI
   (subreg:SI
- (unspec:DI [(match_operand:P 1 "csr_operand")
+ (unspec:DI [(match_operand:P 1 "vector_length_operand")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")
(match_operand 4 "const_int_operand")
-- 
2.17.1
 
 


Re: [pushed][PATCH] LoongArch: Added TLS Le Relax support.

2024-01-01 Thread chenglulu

Pushed to r14-6879 and modified this issue.


在 2023/12/19 下午8:37, Xi Ruoyao 写道:

On Tue, 2023-12-19 at 19:04 +0800, Lulu Cheng wrote:

+(define_insn "@add_tls_le_relax"
+  [(set (match_operand:P 0 "register_operand" "=r")
+   (unspec:P [(match_operand:P 1 "register_operand" "r")
+  (match_operand:P 2 "register_operand" "r")
+     (match_operand:P 3 "symbolic_operand")]
+   UNSPEC_ADD_TLS_LE_RELAX))]
+  ""
+  "add.\t%0,%1,%2,%le_add_r(%3)"

We need a double "%", i. e. "%%le_add_r".  Or we'll hit:

t.c:11:1: internal compiler error: output_operand: operand number
missing after %-letter


+  [(set_attr "type" "move")]
+)
+




[Committed] RISC-V: Declare STMT_VINFO_TYPE (...) as local variable

2024-01-01 Thread Juzhe-Zhong
Committed.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc: Move STMT_VINFO_TYPE (...) to 
local.

---
 gcc/config/riscv/riscv-vector-costs.cc | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index b41a79429d4..1199b3af067 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -279,10 +279,11 @@ compute_local_live_ranges (
  gimple *stmt = program_point.stmt;
  stmt_vec_info stmt_info = program_point.stmt_info;
  tree lhs = gimple_get_lhs (stmt);
+ enum stmt_vec_info_type type
+   = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
  if (lhs != NULL_TREE && is_gimple_reg (lhs)
  && (!POINTER_TYPE_P (TREE_TYPE (lhs))
- || STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info))
-  != store_vec_info_type))
+ || type != store_vec_info_type))
{
  biggest_mode = get_biggest_mode (biggest_mode,
   TYPE_MODE (TREE_TYPE (lhs)));
@@ -309,9 +310,7 @@ compute_local_live_ranges (
  if (poly_int_tree_p (var)
  || (is_gimple_val (var)
  && (!POINTER_TYPE_P (TREE_TYPE (var))
- || STMT_VINFO_TYPE (
-  vect_stmt_to_vectorize (stmt_info))
-  != load_vec_info_type)))
+ || type != load_vec_info_type)))
{
  biggest_mode
= get_biggest_mode (biggest_mode,
-- 
2.36.3



[committed] RISC-V: Modify copyright year of vector-crypto.md

2024-01-01 Thread Feng Wang
gcc/ChangeLog:
* config/riscv/vector-crypto.md: Modify copyright year.
---
 gcc/config/riscv/vector-crypto.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index e40b1543954..9625014e45e 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -1,5 +1,5 @@
 ;; Machine description for the RISC-V Vector Crypto  extensions.
-;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Copyright (C) 2024 Free Software Foundation, Inc.
 
 ;; This file is part of GCC.
 
-- 
2.17.1



Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-01 Thread joshua
For vsetvl issues, what we want to do here is to directly
remove "t" and "m". 

If we add TARGET_XTHEADVECTOR logic in case "p" in
riscv_print_operand, how can we remove "t" and "m"? If I
use "break", assembly like "th.vsetvli zero,a5,e8,m1,t,m"
will be returned.

if (TARGET_THEADVECTOR)
...
else
else if (code == CONST_INT)
  {
/* Tail && Mask policy.  */
asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
  }


--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月2日(星期二) 10:00
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector


+   if (TARGET_XTHEADVECTOR)
+    return false;



Move it to :
  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
    {
      bool ok = riscv_vector::expand_block_move (dest, src, length);
      if (ok)
  return true;
    }





(define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
-   (match_operand 0 "const_csr_operand")))
+  (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")
+    (match_operand 0 "const_csr_operand"



It's hard to trace. Change it into :


(ior
1. TARGET_THEADVECTOR && rtx_equal_p (op, const0_rtx) 
2. !TAGEET_THEADVECTOR && const_csr_operand)


+  if (TARGET_XTHEADVECTOR)
+  {
+   emit_insn (gen_pred_th_whole_mov (mode, dest, src,
+     RVV_VLMAX, GEN_INT(VLMAX)));
+   return true;
+  }



Move it outside legitimize_move
It should be it:


if (TARGET_THEADVECTOR)
emit_th_move...
DONE;
else if (riscv_vector::legitimize_move (operands[0], &operands[1]))
    DONE; 




vsetvli issues:
I wonder whether we can use ASM_OUTPUT_OPCODE to recognize 
"ta,ma"/"ta,mu"/"tu,ma"/"tu,mu" and replace these 4 variants
by "". So that we don't have tail policy and mask policy in vsetvli ASM string.


Another alternative approach is we can change vsetlvi ASM rule:


"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"


if (TARGET_THEADVECTOR)
...
else
        else if (code == CONST_INT)
          {
            /* Tail && Mask policy.  */
            asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
          }



in riscv.cc.


The benefit is that we can avoid adding all th_vsetvl patterns and invasive 
code changs in VSETVL PASS.




juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2023-12-29 12:21
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_

[PATCH] testsuite: Reduce gcc.dg/torture/inline-mem-cpy-1.c by 11 for simulators

2024-01-01 Thread Hans-Peter Nilsson
Tested mmix-knuth-mmixware (where all torture-variants of
gcc.dg/torture/inline-mem-cpy-1.c now pass) and native
x86_64-pc-linux-gnu.  Also stepped through the test for native,
w/wo. RUN_FRACTION defined to see that it worked as intended.

You may wonder what about the "sibling" tests inline-mem-cmp-1.c and
inline-mem-cpy-cmp-1.c.  Well, they FAIL, but not because of
timeouts(!)  To be continued

Ok to commit?

Or, other suggestions?

-- >8 --
The test inline-mem-cpy-1.c takes 16 minutes at -O0 for the mmix
simulator on a 3.5 year old laptop and thus always times out, despite
the x 2 timeout (i.e. 10 minutes), and times out at all optimization
levels.  For the included file (when run as gcc.dg/memcmp-1.c), the
execution time on the same host is 9 minutes 54 seconds, so just
within 10 minutes timeout limit.  Seems pragmatically best to reduce
the torture-test by a factor of about 10, but there's no obvious small
set of entities to scale down to get the intended effect, and
splitting up the test into several tests seem a bit too intrusive.

Instead, introduce pseudo-random machinery to skip all but each
RUN_FRACTION:th iteration, defaulting to no change when RUN_FRACTION
isn't defined.  Use 11 for RUN_FRACTION, assuming this prime will lead
to even distribution within nested iterations with loops looking like
(0, 1) : (0, 1).  Do this only for the main loop in
test_driver_memcmp; the "outermost" two levels of iterations.

With this, execution time for -O0 as above is down to 1 minute 32
seconds.

* gcc.dg/torture/inline-mem-cpy-1.c: Pass -DRUN_FRACTION=11
when testing in a simulator.
* gcc.dg/memcmp-1.c [RUN_FRACTION]: Add machinery to run only
for each RUN_FRACTION:th iteration.
(main): Call initialize_skip_iteration_count.
(test_driver_memcmp): Check SKIP_ITERATION for each iteration.
---
 gcc/testsuite/gcc.dg/memcmp-1.c   | 35 +++
 .../gcc.dg/torture/inline-mem-cpy-1.c |  1 +
 2 files changed, 36 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index ea837ca0f577..13ef5b3380d0 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -34,6 +34,36 @@ int lib_strncmp(const char *a, const char *b, size_t n)
 
 #define MAX_SZ 600
 
+/* A means to run only a fraction of the tests, beginning at a random
+   count.  */
+#ifdef RUN_FRACTION
+
+#define SKIP_ITERATION skip_iteration ()
+static unsigned int iteration_count;
+
+static _Bool
+skip_iteration (void)
+{
+  _Bool run = ++iteration_count == RUN_FRACTION;
+
+  if (run)
+iteration_count = 0;
+
+  return !run;
+}
+
+static void
+initialize_skip_iteration_count ()
+{
+  srand (2024);
+  iteration_count = (unsigned int) (rand ()) % RUN_FRACTION;
+}
+
+#else
+#define SKIP_ITERATION 0
+#define initialize_skip_iteration_count()
+#endif
+
 #define DEF_RS(ALIGN)  \
 static void test_memcmp_runtime_size_ ## ALIGN (const char *str1, \
const char *str2,  \
@@ -110,6 +140,8 @@ static void test_driver_memcmp (void (test_memcmp)(const 
char *, const char *, i
   int i,j,l;
   for(l=0;lTZONE)?(test_sz-TZONE):0); diff_pos < 
test_sz+TZONE; diff_pos++)
 for(zero_pos = ((test_sz>TZONE)?(test_sz-TZONE):0); zero_pos < 
test_sz+TZONE; zero_pos++)
   {
+   if (SKIP_ITERATION)
+ continue;
memset(buf1, 'A', 2*test_sz);
memset(buf2, 'A', 2*test_sz);
buf2[diff_pos] = 'B';
@@ -490,6 +524,7 @@ DEF_TEST(49,1)
 int
 main(int argc, char **argv)
 {
+  initialize_skip_iteration_count ();
 #ifdef TEST_ALL
 RUN_TEST(1,1)
 RUN_TEST(1,2)
diff --git a/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c 
b/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
index f4952554dd01..f0752349571b 100644
--- a/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
+++ b/gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-finline-stringops=memcpy -save-temps -g0 -fno-lto" } */
+/* { dg-additional-options "-DRUN_FRACTION=11" { target simulator } } */
 /* { dg-timeout-factor 2 } */
 
 #include "../memcmp-1.c"
-- 
2.30.2



[PATCH] RISC-V: Make liveness be aware of rgroup number of LENS[dynamic LMUL]

2024-01-01 Thread Juzhe-Zhong
This patch fixes the following situation:
vl4re16.v   v12,0(a5)
...
vl4re16.v   v16,0(a3)
vs4r.v  v12,0(a5)
...
vl4re16.v   v4,0(a0)
vs4r.v  v16,0(a3)
...
vsetvli a3,zero,e16,m4,ta,ma
...
vmv.v.x v8,t6
vmsgeu.vv   v2,v16,v8
vsub.vv v16,v16,v8
vs4r.v  v16,0(a5)
...
vs4r.v  v4,0(a0)
vmsgeu.vv   v1,v4,v8
...
vsub.vv v4,v4,v8
sllia6,a4,2
vs4r.v  v4,0(a5)
...
vsub.vv v4,v12,v8
vmsgeu.vv   v3,v12,v8
vs4r.v  v4,0(a5)
...

There are many spills which are 'vs4r.v'.  The root cause is that we don't count
vector REG liveness referencing the rgroup controls.

_29 = _25->iatom[0]; is transformed into the following vect statement with 4 
different loop_len (loop_len_74, loop_len_75, loop_len_76, loop_len_77).

  vect__29.11_78 = .MASK_LEN_LOAD (vectp_sb.9_72, 32B, { -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_74, 0);
  vect__29.12_80 = .MASK_LEN_LOAD (vectp_sb.9_79, 32B, { -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_75, 0);
  vect__29.13_82 = .MASK_LEN_LOAD (vectp_sb.9_81, 32B, { -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_76, 0);
  vect__29.14_84 = .MASK_LEN_LOAD (vectp_sb.9_83, 32B, { -1, -1, -1, -1, -1, 
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 }, loop_len_77, 0);

which are the LENS number (LOOP_VINFO_LENS (loop_vinfo).length ()).

Count liveness according to LOOP_VINFO_LENS (loop_vinfo).length () to compute 
liveness more accurately:

vsetivlizero,8,e16,m1,ta,ma
vmsgeu.vi   v19,v14,8
vadd.vi v18,v14,-8
vmsgeu.vi   v17,v1,8
vadd.vi v16,v1,-8
vlm.v   v15,0(a5)
...

Tested no regression, ok for trunk ?

PR target/113112

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (compute_nregs_for_mode): Add 
rgroup info.
(max_number_of_live_regs): Ditto.
(has_unexpected_spills_p): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-5.c: New test.

---
 gcc/config/riscv/riscv-vector-costs.cc| 34 +++
 .../vect/costmodel/riscv/rvv/pr113112-5.c | 24 +
 2 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113112-5.c

diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 1199b3af067..12d3b57aff6 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -373,13 +373,17 @@ compute_local_live_ranges (
E.g. If mode = SImode, biggest_mode = DImode, LMUL = M4.
Then return RVVM4SImode (LMUL = 4, element mode = SImode).  */
 static unsigned int
-compute_nregs_for_mode (machine_mode mode, machine_mode biggest_mode, int lmul)
+compute_nregs_for_mode (loop_vec_info loop_vinfo, machine_mode mode,
+   machine_mode biggest_mode, int lmul)
 {
+  unsigned int rgroup_size = LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+  ? 1
+  : LOOP_VINFO_LENS (loop_vinfo).length ();
   unsigned int mode_size = GET_MODE_SIZE (mode).to_constant ();
   unsigned int biggest_size = GET_MODE_SIZE (biggest_mode).to_constant ();
   gcc_assert (biggest_size >= mode_size);
   unsigned int ratio = biggest_size / mode_size;
-  return MAX (lmul / ratio, 1);
+  return MAX (lmul / ratio, 1) * rgroup_size;
 }
 
 /* This function helps to determine whether current LMUL will cause
@@ -393,7 +397,7 @@ compute_nregs_for_mode (machine_mode mode, machine_mode 
biggest_mode, int lmul)
mode.
  - Third, Return the maximum V_REGs are alive of the loop.  */
 static unsigned int
-max_number_of_live_regs (const basic_block bb,
+max_number_of_live_regs (loop_vec_info loop_vinfo, const basic_block bb,
 const hash_map &live_ranges,
 unsigned int max_point, machine_mode biggest_mode,
 int lmul)
@@ -412,7 +416,7 @@ max_number_of_live_regs (const basic_block bb,
{
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
-   = compute_nregs_for_mode (mode, biggest_mode, lmul);
+   = compute_nregs_for_mode (loop_vinfo, mode, biggest_mode, lmul);
  live_vars_vec[i] += nregs;
  if (live_vars_vec[i] > max_nregs)
{
@@ -687,6 +691,24 @@ update_local_live_ranges (
dump_printf_loc (MSG_NOTE, vect_location,
 "Add perm indice %T, start = 0, end = %d\n",
 sel, max_point);
+ if (!LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+ && LOOP_VINFO_LENS (loop_vinfo).length () > 1)
+   {
+ /* If we are vectorizing a permutation when the rgroup number
+> 1, we will need additional mask to shuffle the second
+vector.  */
+ tree mask = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+

Re:Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-01 Thread joshua
But the riscv_print_operand() function returns void. 
We cannot return instructions like riscv_output_move.
I think the briefest approach is to add some logic in
the vsetvl patterns.

"TARGET_VECTOR"
  { return TARGET_XTHEADVECTOR ? "vsetvli\t%0,%1,e%2,%m3" : 
"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"; }

Only 3 patterns need to be modified and I don't think
it is too invasive.

--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月2日(星期二) 11:10
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"
主 题:Re: Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector


Like riscv_output_move


if (TARGET_THEADVECTOR)
  return vsetvlino tail policy and mask policy.
else
  return 
juzhe.zh...@rivai.ai

 
发件人: joshua
发送时间: 2024-01-02 11:03
收件人: juzhe.zh...@rivai.ai; gcc-patches
抄送: Jim Wilson; palmer; andrew; philipp.tomsich; jeffreyalaw; christoph.muellner
主题: Re:[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

For vsetvl issues, what we want to do here is to directly
remove "t" and "m". 
 
If we add TARGET_XTHEADVECTOR logic in case "p" in
riscv_print_operand, how can we remove "t" and "m"? If I
use "break", assembly like "th.vsetvli zero,a5,e8,m1,t,m"
will be returned.
 
if (TARGET_THEADVECTOR)
...
else
    else if (code == CONST_INT)
  {
    /* Tail && Mask policy.  */
    asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
  }
 
 
--
发件人:juzhe.zh...@rivai.ai 
发送时间:2024年1月2日(星期二) 10:00
收件人:"cooper.joshua"; 
"gcc-patches"
抄 送:Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主 题:Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector
 
 
+   if (TARGET_XTHEADVECTOR)
+    return false;
 
 
 
Move it to :
  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
    {
      bool ok = riscv_vector::expand_block_move (dest, src, length);
      if (ok)
  return true;
    }
 
 
 
 
 
(define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
-   (match_operand 0 "const_csr_operand")))
+  (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")
+    (match_operand 0 "const_csr_operand"
 
 
 
It's hard to trace. Change it into :
 
 
(ior
1. TARGET_THEADVECTOR && rtx_equal_p (op, const0_rtx) 
2. !TAGEET_THEADVECTOR && const_csr_operand)
 
 
+  if (TARGET_XTHEADVECTOR)
+  {
+   emit_insn (gen_pred_th_whole_mov (mode, dest, src,
+     RVV_VLMAX, GEN_INT(VLMAX)));
+   return true;
+  }
 
 
 
Move it outside legitimize_move
It should be it:
 
 
if (TARGET_THEADVECTOR)
emit_th_move...
DONE;
else if (riscv_vector::legitimize_move (operands[0], &operands[1]))
    DONE; 
 
 
 
 
vsetvli issues:
I wonder whether we can use ASM_OUTPUT_OPCODE to recognize 
"ta,ma"/"ta,mu"/"tu,ma"/"tu,mu" and replace these 4 variants
by "". So that we don't have tail policy and mask policy in vsetvli ASM string.
 
 
Another alternative approach is we can change vsetlvi ASM rule:
 
 
"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"
 
 
if (TARGET_THEADVECTOR)
...
else
        else if (code == CONST_INT)
          {
            /* Tail && Mask policy.  */
            asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
          }
 
 
 
in riscv.cc.
 
 
The benefit is that we can avoid adding all th_vsetvl patterns and invasive 
code changs in VSETVL PASS.
 
 
 
 
juzhe.zh...@rivai.ai
 
 
From: Jun Sha (Joshua)
Date: 2023-12-29 12:21
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector
 
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give spe

Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-01 Thread Hongtao Liu
On Fri, Dec 22, 2023 at 6:25 PM Roger Sayle  wrote:
>
>
> This patch resolves the second part of PR target/112992, building upon
> Hongtao Liu's solution to the first part.
>
> The issue addressed by this patch is that when initializing vectors by
> broadcasting integer constants, the compiler has the flexibility to
> select the most appropriate vector mode to perform the broadcast, as
> long as the resulting vector has an identical bit pattern.  For
> example, the following constants are all equivalent:
> V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
> V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
> V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
> So instruction sequences that construct any of these can be used to
> construct the others (with a suitable cast/SUBREG).
>
> On x86_64, it turns out that broadcasts of SImode constants are preferred,
> as DImode constants often require a longer movabs instruction, and
> HImode and QImode broadcasts require multiple uops on some architectures.
> Hence, SImode is always the equal shortest/fastest implementation.
>
> Examples of this improvement, can be seen in the testsuite.
>
> gcc.target/i386/pr102021.c
> Before:
>0:   48 b8 0c 00 0c 00 0cmovabs $0xc000c000c000c,%rax
>7:   00 0c 00
>a:   62 f2 fd 28 7c c0   vpbroadcastq %rax,%ymm0
>   10:   c3  retq
>
> After:
>0:   b8 0c 00 0c 00  mov$0xc000c,%eax
>5:   62 f2 7d 28 7c c0   vpbroadcastd %eax,%ymm0
>b:   c3  retq
>
> and
> gcc.target/i386/pr90773-17.c:
> Before:
>0:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 7 
>7:   b8 0c 00 00 00  mov$0xc,%eax
>c:   62 f2 7d 08 7a c0   vpbroadcastb %eax,%xmm0
>   12:   62 f1 7f 08 7f 02   vmovdqu8 %xmm0,(%rdx)
>   18:   c7 42 0f 0c 0c 0c 0cmovl   $0xc0c0c0c,0xf(%rdx)
>   1f:   c3  retq
>
> After:
>0:   48 8b 15 00 00 00 00mov0x0(%rip),%rdx# 7 
>7:   b8 0c 0c 0c 0c  mov$0xc0c0c0c,%eax
>c:   62 f2 7d 08 7c c0   vpbroadcastd %eax,%xmm0
>   12:   62 f1 7f 08 7f 02   vmovdqu8 %xmm0,(%rdx)
>   18:   c7 42 0f 0c 0c 0c 0cmovl   $0xc0c0c0c,0xf(%rdx)
>   1f:   c3  retq
>
> where according to Agner Fog's instruction tables broadcastd is slightly
> faster on some microarchitectures, for example Knight's Landing.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-12-21  Roger Sayle  
>
> gcc/ChangeLog
> PR target/112992
> * config/i386/i386-expand.cc
> (ix86_convert_const_wide_int_to_broadcast): Allow call to
> ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
> (ix86_broadcast_from_constant): Revert recent change; Return a
> suitable MEMREF independently of mode/target combinations.
> (ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
> to decide whether expansion is possible/preferrable.  Only try
> forcing DImode constants to memory (and trying again) if calling
> ix86_expand_vector_init_duplicate fails with an DImode immediate
> constant.
> (ix86_expand_vector_init_duplicate) : Try using
> V4SImode for suitable immediate constants.
> : Try using V8SImode for suitable constants.
> : Use constant pool for AVX without AVX2.
> : Fail for CONST_INT_P, i.e. use constant pool.
> : Likewise.
> : For CONST_INT_P try using V4SImode via widen.
> : For CONT_INT_P try using V8HImode via widen.
> : Handle CONT_INTs via simplify_binary_operation.
> Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
> : For CONST_INT_P try V8SImode via widen.
> : For CONST_INT_P try V16HImode via widen.
> (ix86_expand_vector_init): Move try using a broadcast for all_same
> with ix86_expand_vector_init_duplicate before using constant pool.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Update test case.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/avx512fp16-13.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
> * gcc.target/i386/pr100865-10a.c: Likewise.
> * gcc.target/i386/pr100865-10b.c: Likewise.
> * gcc.target/i386/pr100865-11c.c: Likewise.
> * gcc.target/i386/pr100865-12c.c: Likewise.
> * gcc.target/i386/pr100865-2.c: Likewise.
> * gcc.target/i386/pr100865-3.c: Likewise.
> * gcc.target/i386/pr100865-4a.c: Likewise.
> * gcc.target/i386/pr100865-4b.c: Likewise.
> * gcc.targ

[Committed] RISC-V: Add simplification of dummy len and dummy mask COND_LEN_xxx pattern

2024-01-01 Thread Juzhe-Zhong
In 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d1eacedc6d9ba9f5522f2c8d49ccfdf7939ad72d
I optimize COND_LEN_xxx pattern with dummy len and dummy mask with too simply 
solution which
causes redundant vsetvli in the following case:

vsetvli a5,a2,e8,m1,ta,ma
vle32.v v8,0(a0)
vsetivlizero,16,e32,m4,tu,mu   > We should apply VLMAX 
instead of a CONST_INT AVL
sllia4,a5,2
vand.vv v0,v8,v16
vand.vv v4,v8,v12
vmseq.viv0,v0,0
sub a2,a2,a5
vneg.v  v4,v8,v0.t
vsetvli zero,a5,e32,m4,ta,ma

The root cause above is the following codes:

is_vlmax_len_p (...)
   return poly_int_rtx_p (len, &value)
&& known_eq (value, GET_MODE_NUNITS (mode))
&& !satisfies_constraint_K (len);---> incorrect check.

Actually, we should not elide the VLMAX situation that has AVL in range of 
[0,31].

After removing the the check above, we will have this following issue:

vsetivlizero,4,e32,m1,ta,ma
vlseg4e32.v v4,(a5)
vlseg4e32.v v12,(a3)
vsetvli a5,zero,e32,m1,tu,ma ---> This is redundant since 
VLMAX AVL = 4 when it is fixed-vlmax
vfadd.vfv3,v13,fa0
vfadd.vfv1,v12,fa1
vfmul.vvv17,v3,v5
vfmul.vvv16,v1,v5

Since all the following operations (vfadd.vf ... etc) are COND_LEN_xxx with 
dummy len and dummy mask,
we add the simplification operations dummy len and dummy mask into VLMAX TA and 
MA policy.

So, after this patch. Both cases are optimal codegen now:

case 1:
vsetvli a5,a2,e32,m1,ta,mu
vle32.v v2,0(a0)
sllia4,a5,2
vand.vv v1,v2,v3
vand.vv v0,v2,v4
sub a2,a2,a5
vmseq.viv0,v0,0
vneg.v  v1,v2,v0.t
vse32.v v1,0(a1)

case 2:
vsetivli zero,4,e32,m1,tu,ma
addi a4,a5,400
vlseg4e32.v v12,(a3)
vfadd.vf v3,v13,fa0
vfadd.vf v1,v12,fa1
vlseg4e32.v v4,(a4)
vfadd.vf v2,v14,fa1
vfmul.vv v17,v3,v5
vfmul.vv v16,v1,v5

This patch is just additional fix of previous approved patch.
Tested on both RV32 and RV64 newlib no regression. Committed.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (is_vlmax_len_p): Remove 
satisfies_constraint_K.
(expand_cond_len_op): Add simplification of dummy len and dummy mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vf_avl-3.c: New test.

---
 gcc/config/riscv/riscv-v.cc| 11 ---
 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-3.c | 11 +++
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-3.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index b4c7e0f0126..3c83be35715 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -74,8 +74,7 @@ is_vlmax_len_p (machine_mode mode, rtx len)
 {
   poly_int64 value;
   return poly_int_rtx_p (len, &value)
-&& known_eq (value, GET_MODE_NUNITS (mode))
-&& !satisfies_constraint_K (len);
+&& known_eq (value, GET_MODE_NUNITS (mode));
 }
 
 /* Helper functions for insn_flags && insn_types */
@@ -3855,7 +3854,13 @@ expand_cond_len_op (unsigned icode, insn_flags op_type, 
rtx *ops, rtx len)
   bool is_vlmax_len = is_vlmax_len_p (mode, len);
 
   unsigned insn_flags = HAS_DEST_P | HAS_MASK_P | HAS_MERGE_P | op_type;
-  if (is_dummy_mask)
+  /* FIXME: We don't support simplification of COND_LEN_NEG (..., dummy len,
+ dummy mask) into NEG_EXPR in GIMPLE FOLD yet.  So, we do such
+ simplification in RISC-V backend and may do that in middle-end in the
+ future.  */
+  if (is_dummy_mask && is_vlmax_len)
+insn_flags |= TDEFAULT_POLICY_P | MDEFAULT_POLICY_P;
+  else if (is_dummy_mask)
 insn_flags |= TU_POLICY_P | MDEFAULT_POLICY_P;
   else if (is_vlmax_len)
 insn_flags |= TDEFAULT_POLICY_P | MU_POLICY_P;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-3.c
new file mode 100644
index 000..116b5b538cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vf_avl-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+void foo (int *src, int *dst, int size) {
+ int i;
+ for (i = 0; i < size; i++)
+  *dst++ = *src & 0x80 ? (*src++ & 0x7f) : -*src++;
+}
+
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*mu} 1 } } */
+/* { dg-final { scan-assembler-times {vsetvli} 1 } } */
-- 
2.36.3



[PATCH v5 1/2] RISC-V: Add crypto vector builtin function.

2024-01-01 Thread Feng Wang
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.

This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).

Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def 
(DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C 
overloaded func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 264 +-
 .../riscv/riscv-vector-builtins-bases.h   |  28 ++
 .../riscv/riscv-vector-builtins-functions.def |  94 +++
 .../riscv/riscv-vector-builtins-shapes.cc |  87 +-
 .../riscv/riscv-vector-builtins-shapes.h  |   4 +
 .../riscv/riscv-vector-builtins-types.def |  25 ++
 gcc/config/riscv/riscv-vector-builtins.cc | 133 -
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 gcc/config/riscv/riscv-vector-builtins.h  |   8 +
 9 files changed, 641 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
 };
 
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  

[PATCH v5 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-01 Thread Feng Wang
Patch v5: Rebase.
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing case s.
Patch v2: Update march info according to the change of riscv-common.c

This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvbb-intrinsic.c: New test.
* gcc.target/riscv/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/zvbc-intrinsic.c: New test.
* gcc.target/riscv/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/zvkb.c: New test.
* gcc.target/riscv/zvkg-intrinsic.c: New test.
* gcc.target/riscv/zvkned-intrinsic.c: New test.
* gcc.target/riscv/zvknha-intrinsic.c: New test.
* gcc.target/riscv/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/zvksed-intrinsic.c: New test.
* gcc.target/riscv/zvksh-intrinsic.c: New test.
---
 .../gcc.target/riscv/zvbb-intrinsic.c | 179 ++
 .../riscv/zvbb_vandn_vx_constraint.c  |  15 ++
 .../gcc.target/riscv/zvbc-intrinsic.c |  62 ++
 .../gcc.target/riscv/zvbc_vx_constraint-2.c   |  14 ++
 .../gcc.target/riscv/zvbc_vx_constraint.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
 .../gcc.target/riscv/zvkg-intrinsic.c |  24 +++
 .../gcc.target/riscv/zvkned-intrinsic.c   | 105 ++
 .../gcc.target/riscv/zvknha-intrinsic.c   |  33 
 .../gcc.target/riscv/zvknhb-intrinsic.c   |  33 
 .../gcc.target/riscv/zvksed-intrinsic.c   |  33 
 .../gcc.target/riscv/zvksh-intrinsic.c|  24 +++
 12 files changed, 549 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvbb_vandn_vx_constraint.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc_vx_constraint-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc_vx_constraint.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvkg-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvkned-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvknha-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvknhb-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvksed-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvksh-intrinsic.c

diff --git a/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
new file mode 100644
index 000..7d436d2a43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include 
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u32m1_m(mask, vs2, vl);
+}
+
+vuint64m1_t test_vbrev8_v_u64m1_tumu(vbool64_t mask, vuint64m1_t maskedoff, 
vuint64m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u64m1_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16m4_t test_vrev8_v_u16m4(vuint16m4_t vs2, size_t vl) {
+  return __riscv_vrev8_v_u16m4(vs2, vl);
+}
+
+vuint8m4_t test_vrev8_v_u8m4_m(vbool2_t mask, vuint8m4_t vs2, s

Re: [PATCH v5 1/2] RISC-V: Add crypto vector builtin function.

2024-01-01 Thread juzhe.zh...@rivai.ai
+/* Static information about a set of crypto vector functions.  */
+struct crypto_function_group_info
+{
+  struct function_group_info rvv_function_group_info;
+  /* Whether the function is available.  */
+  unsigned int (*avail) (void);
+};

What is this used for ?


juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-02 15:47
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v5 1/2] RISC-V: Add crypto vector builtin function.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
.../riscv/riscv-vector-builtins-bases.cc  | 264 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-functions.def |  94 +++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 133 -
gcc/config/riscv/riscv-vector-builtins.def|   1 +
gcc/config/riscv/riscv-vector-builtins.h  |   8 +
9 files changed, 641 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn