date:20240929

Re: [RFC/RFA][PATCH v2 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-09-29 Thread Jeff Law





On 8/28/24 2:14 AM, Mariam Arutunian wrote:



Yes, I applied some of the changes I made in AArch64 here as well, where 
possible and worked.
Also, in AArch64, I used force_reg in some cases, but here, I didn't 
change, as you had told me to use riscv_expand_* instead of force_reg.
I don't recall avoiding force_reg, I do vaguely recall wanting the 
expanders to be used rather than using emit_move_insn directly.


Anyway, just wanted to check in that if there were any lessons learned 
from the aarch64 work that we should apply to riscv.  Sounds like 
nothing major.  Thanks!


jeff

Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

2024-09-29 Thread Jeff Law

On 5/23/24 3:55 AM, Di Zhao OS wrote:

-Original Message-
From: Jeff Law 
Sent: Wednesday, May 22, 2024 11:14 PM
To: Di Zhao OS ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] [tree-optimization/110279] fix testcase pr110279-1.c

On 5/22/24 5:46 AM, Di Zhao OS wrote:

The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

Tested on x86_64-pc-linux-gnu.

Thanks
Di Zhao

gcc/testsuite/ChangeLog:

  * gcc.dg/pr110279-1.c: add target selector.

Rather than list targets explicitly in the test, wouldn't it be better
to have a common routine that could be used in other cases where we have
a test that requires FMA?

So something similar to check_effective_target_scalar_all_fma?

Jeff

Here is an updated version of the patch. Sorry I'm not very familiar
with the testsuite commands.

gcc/testsuite/ChangeLog:

 * gcc.dg/pr110279-1.c: add target selector.
This is OK.   Thanks for your patience.  SOrry it's taken so long.  Just 
never seemed to get near the top of my todo list.

Ideally we'll see maintainers adjust the scalar_all_fma selector and if 
we see other targets that have some fma, but not the full set, then 
we'll see a new target selector for the subsets of fma.

jeff

Re: [PATCH] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-29 Thread Yangyu Chen



> On Sep 30, 2024, at 10:34, Yangyu Chen  wrote:
>> 
>> 
>> On Sep 30, 2024, at 02:49, Jeff Law  wrote:
>> On 9/9/24 6:11 AM, Yangyu Chen wrote:
>>> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
>>> ISA. As a result, certain functions cannot be optimized with inlining
>>> when specific options, such as __attribute__((target("arch=+v"))) .
>>> This can lead to potential performance issues when building
>>> retargetable binaries for RISC-V.
>>> To address this, I have implemented the riscv_can_inline_p function.
>>> This addition enables inlining when the callee either has no special
>>> options or when the some options match, and also ensuring that the
>>> callee's ISA is a subset of the caller's. I also check some other
>>> options when there is no always_inline set.
>>> gcc/ChangeLog:
>>>* config/riscv/riscv.cc (riscv_can_inline_p): New function.
>>>(TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
>> I'd kind of hoped not to mess with this as I suspect keeping this up-to-date 
>> is going to end up being somewhat painful and I doubt we're going to see 
>> that many cases where folks are using target attributes and where inlining 
>> is critical for performance.
>> 
>> What's really driving your desire to change this?  Is this showing up in 
>> real code as a performance issue?
>> 
> 
> Yeah. I'm working on function multi-versioning support and found
> that allowing the callee to be inlined in the caller will result
> in performance gains. Without this patch, we will need to CALL the
> function since we can't inline them when two functions have different
> attributes. A simple evaluation of coremarks shows a 13% performance
> degradation.
> 

Specially, we can reproduce the result on BananaPi-F3 Hardware:

Use this GCC branch with my patch:
https://github.com/cyyself/gcc/tree/rv_can_inline

And compile the coremark on this branch:
https://github.com/cyyself/coremark/tree/rva22_v_hotspot

With command `make CC=riscv64-unknown-linux-gnu-gcc compile`

With my patch, we will get the coremark scored `Iterations/Sec   :
5992.917461`. But without this patch after `git reset HEAD^` and
recompile the GCC and then coremark, we will get `Iterations/Sec :
5235.602094`, which is 12.6% slower.

Thanks,
Yangyu Chen

>>> +
>>> +  if (caller_opts->x_riscv_tune_string
>>> +  && callee_opts->x_riscv_tune_string
>>> +  && strcmp (caller_opts->x_riscv_tune_string,
>>> + callee_opts->x_riscv_tune_string) != 0)
>>> +return false;
>> Tuning shouldn't affect correctness of inlining.  I'd just drop this clause 
>> if this patch goes forward.
> 
> OK. I will fix it in the next revision.
> 
>> Jeff
>

Re: [PATCH] RISC-V: Add an implicit dependency for Zawrs

2024-09-29 Thread Kito Cheng

LGTM, and let me know if you need my help to commit that :)

On Mon, Sep 30, 2024 at 9:37 AM Xiao Zeng  wrote:
>
> There is a description in 
> :
>
> "The instructions in the Zawrs extension are only useful in conjunction
> with the LR instruction, which is provided by the Zalrsc component
> of the A extension."
>
> It can be concluded that: zawrs -> zalrsc.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: zawrs -> zalrsc.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/predef-38.c: New test.
> * gcc.target/riscv/predef-39.c: New test.
>
> Signed-off-by: Xiao Zeng 
> ---
>  gcc/common/config/riscv/riscv-common.cc|  1 +
>  gcc/testsuite/gcc.target/riscv/predef-38.c | 31 ++
>  gcc/testsuite/gcc.target/riscv/predef-39.c | 31 ++
>  3 files changed, 63 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-38.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-39.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index bd42fd01532..a6abd903b98 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -96,6 +96,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>
>{"zabha", "zaamo"},
>{"zacas", "zaamo"},
> +  {"zawrs", "zalrsc"},
>
>{"zcmop", "zca"},
>
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-38.c 
> b/gcc/testsuite/gcc.target/riscv/predef-38.c
> new file mode 100644
> index 000..986c02b451a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/predef-38.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv32i_zawrs -mabi=ilp32 -mcmodel=medlow 
> -misa-spec=20191213" } */
> +
> +int main () {
> +
> +#ifndef __riscv_arch_test
> +#error "__riscv_arch_test"
> +#endif
> +
> +#if __riscv_xlen != 32
> +#error "__riscv_xlen"
> +#endif
> +
> +#if !defined(__riscv_i)
> +#error "__riscv_i"
> +#endif
> +
> +#if !defined(__riscv_zawrs)
> +#error "__riscv_zawrs"
> +#endif
> +
> +#if !defined(__riscv_zalrsc)
> +#error "__riscv_zalrsc"
> +#endif
> +
> +#if defined(__riscv_a)
> +#error "__riscv_a"
> +#endif
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-39.c 
> b/gcc/testsuite/gcc.target/riscv/predef-39.c
> new file mode 100644
> index 000..558164de8c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/predef-39.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv64i_zawrs -mabi=lp64 -mcmodel=medlow 
> -misa-spec=20191213" } */
> +
> +int main () {
> +
> +#ifndef __riscv_arch_test
> +#error "__riscv_arch_test"
> +#endif
> +
> +#if __riscv_xlen != 64
> +#error "__riscv_xlen"
> +#endif
> +
> +#if !defined(__riscv_i)
> +#error "__riscv_i"
> +#endif
> +
> +#if !defined(__riscv_zawrs)
> +#error "__riscv_zawrs"
> +#endif
> +
> +#if !defined(__riscv_zalrsc)
> +#error "__riscv_zalrsc"
> +#endif
> +
> +#if defined(__riscv_a)
> +#error "__riscv_a"
> +#endif
> +
> +  return 0;
> +}
> --
> 2.17.1
>

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-29 Thread Richard Biener

On Sun, Sep 29, 2024 at 5:13 PM Florian Weimer  wrote:
>
> Sometimes this is a user error, sometimes it is more of an ICE.
> In either case, more information about the conflict is helpful.
>
> I used to this to get a better idea about what is going on with
> PR116887.  The original diagnostics look like this:
>
> dl-find_object.c: In function ‘_dlfo_process_initial’:
> dl-find_object.c:507:1: error: section type conflict with 
> ‘_dlfo_nodelete_mappings’
>   507 | _dlfo_process_initial (void)
>   | ^
> dl-find_object.c:73:40: note: ‘_dlfo_nodelete_mappings’ was declared here
>73 | static struct dl_find_object_internal *_dlfo_nodelete_mappings
>   |^~~
>
>
> I don't quite understand what is going on (the symbol that's being
> flagged for conflict is somewhat unstable), but at least the new
> diagnostics show that the sectio name, and maybe the flags are useful,
> too:
>
> /tmp/bug.i:6798:1: error: section ‘.data.rel.ro’ type conflict with 
> ‘_dlfo_main’
>  6798 | }
>   | ^
> /tmp/bug.i:6190:39: note: ‘_dlfo_main’ was declared here
>  6190 | static struct dl_find_object_internal _dlfo_main __attribute__ 
> ((section (".data.rel.ro")));
>   |   ^~
> /tmp/bug.i:6798:1: error: previous section type: WRITE|NOTYPE|DECLARED|NAMED
>  6798 | }
>   | ^
> /tmp/bug.i:6798:1: error: new section type: WRITE|NOTYPE|NAMED|RELRO
>
> I still need help with producing a non-error, non-anchored diagnostic.
> I don't know how to do that.
>
> Thanks,
> Florian
>
> * varasm.cc (section_flags_to_string):  New function.
> (get_section): Include name of section in diagnostics.
> Print old and new section flags, as rendered by
> section_flags_to_string.
>
> ---
>  gcc/varasm.cc | 80 
> ---
>  1 file changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4426e7ce6c6..deba15933aa 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
> We also output the assembler code for constants stored in memory
> and are responsible for combining constants with the same value.  */
>
> +#define INCLUDE_STRING
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -276,6 +277,65 @@ get_noswitch_section (unsigned int flags, 
> noswitch_section_callback callback)
>return sect;
>  }
>
> +/* Return a string describing the section flags.  */
> +
> +static std::string
> +section_flags_to_string (unsigned int flags)
> +{
> +  if (flags == 0)
> +return "UNNAMED";
> +  std::string result;
> +  auto append = [&result] (bool f, const char *name)
> +  {
> +if (f)
> +  {
> +   if (!result.empty ())
> + result += '|';
> +   result += name;
> +  }
> +  };
> +
> +  append (flags & SECTION_CODE, "CODE");
> +  append (flags & SECTION_WRITE, "WRITE");
> +  append (flags & SECTION_DEBUG, "DEBUG");
> +  append (flags & SECTION_LINKONCE, "LINKONCE");
> +  append (flags & SECTION_SMALL, "SMALL");
> +  append (flags & SECTION_BSS, "BSS");
> +  append (flags & SECTION_MERGE, "MERGE");
> +  append (flags & SECTION_STRINGS, "STRINGS");
> +  append (flags & SECTION_OVERRIDE, "OVERRIDE");
> +  append (flags & SECTION_TLS, "TLS");
> +  append (flags & SECTION_NOTYPE, "NOTYPE");
> +  append (flags & SECTION_DECLARED, "DECLARED");
> +  append (flags & SECTION_NAMED, "NAMED");
> +  append (flags & SECTION_NOSWITCH, "NOSWITCH");
> +  append (flags & SECTION_COMMON, "COMMON");
> +  append (flags & SECTION_RELRO, "RELRO");
> +  append (flags & SECTION_EXCLUDE, "EXCLUDE");
> +  append (flags & SECTION_RETAIN, "RETAIN");
> +  append (flags & SECTION_LINK_ORDER, "LINK_ORDER");

I'm not sure printing these internal flags is of help to the user.

> +
> +  unsigned int entsize = flags & SECTION_ENTSIZE;
> +  if (entsize != 0)
> +{
> +  if (!result.empty ())
> +   result += ',';
> +  result += "size=";
> +  result += std::to_string (entsize);
> +}
> +
> +  unsigned int mach_dep = flags / SECTION_MACH_DEP;
> +  if (mach_dep != 0)
> +{
> +  if (!result.empty ())
> +   result += ',';
> +  result += "mach=";
> +  result += std::to_string (mach_dep);
> +}
> +
> +  return result;
> +}
> +
>  /* Return the named section structure associated with NAME.  Create
> a new section with the given fields if no such structure exists.
> When NOT_EXISTING, then fail if the section already exists.  Return
> @@ -312,6 +372,9 @@ get_section (const char *name, unsigned int flags, tree 
> decl,
> internal_error ("section already exists: %qs", name);
>
>sect = *slot;
> +  unsigned int original_flags = sect->common.flags;
> +  unsigned int new_flags = flags;
> +
>/* It is fine if one of the sections has SECTION_NOTYPE as long

Re: [RFC/RFA] [PATCH v4 10/12] Verify detected CRC loop with symbolic execution and LFSR matching

2024-09-29 Thread Richard Biener

On Sun, Sep 29, 2024 at 8:01 PM Jeff Law  wrote:
>
>
>
> On 9/13/24 5:06 AM, Mariam Arutunian wrote:
> > Symbolically execute potential CRC loops and check whether the loop
> > actually calculates CRC (uses LFSR matching).
> > Calculated CRC and created LFSR are compared on each iteration of the
> > potential CRC loop.
> >
> >gcc/
> >
> >  * Makefile.in (OBJS): Add crc-verification.o.
> >  * crc-verification.cc: New file.
> >  * crc-verification.h: New file.
> >  * gimple-crc-optimization.cc (loop_calculates_crc): New function.
> >  (is_output_crc): Likewise.
> >  (swap_crc_and_data_if_needed): Likewise.
> >  (validate_crc_and_data): Likewise.
> >  (optimize_crc_loop): Likewise.
> >  (table_based_may_be_generated): Likewise.
> >  (get_output_phi): Likewise.
> >  (execute): Add check whether potential CRC loop calculates CRC.
> >
> >gcc/sym-exec/
> >
> >  * sym-exec-state.cc (create_reversed_lfsr): New function.
> >  (create_forward_lfsr): Likewise.
> >  (last_set_bit): Likewise.
> >  (create_lfsr): Likewise.
> >  * sym-exec-state.h (is_bit_vector): Reorder, make the function
> > public and static.
> >  (create_reversed_lfsr): New static function declaration.
> >  (create_forward_lfsr): New static function declaration.
> >
> > Signed-off-by: Mariam Arutunian  > >
> > Mentored-by: Jeff Law mailto:j...@ventanamicro.com>>
> There's a bit of TARGET_XXX stuff in here too that needs to be removed
> as I noted in the review of a prior patch in this series:
>
> > +/* Returns true if table-based CRC may be generated,
> > +   otherwise, return false.  */
> > +
> > +bool
> > +crc_optimization::table_based_may_be_generated ()
> > +{
> > +  #ifdef TARGET_CRC32
> > +if (TARGET_CRC32
> > +   && (m_polynomial == 0x1EDC6F41 || m_polynomial == 0x04C11DB7))
> > +  return false;
> > +  #endif
> > +/* If CLMUL-based CRC can be generated, return false.  */
> > +if (is_CLMUL_supported ()
> > +   && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (m_crc_arg))).to_constant ()
> > +   < GET_MODE_SIZE (word_mode)))
> > +  return false;
> > +  return true;
> > +}
>
> Removing this probably will result in minor simplifications elsewhere in
> this file.
>
>
> This is a bit puzzling:
>
> > +bool
> > +assign_calc_vals_to_header_phis (const vec &prev_states,
> > +state *curr_state,
> > +basic_block bb)
> > +{
> > +  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
> > +   gsi_next (&gsi))
> > +{
> > +  gphi *phi = gsi.phi ();
> > +  tree lhs = gimple_phi_result (phi);
> > +
> > +  /* Don't consider virtual operands.  */
> > +  if (virtual_operand_p (lhs))
> > +   continue;
> > +
> > +  if (TREE_CODE (PHI_ARG_DEF (phi, phi->nargs - 1)) == INTEGER_CST)
>
> The order of elements in the PHI is undefined, it looks like this
> assumes the constant is always last.  So it probably needs to be
> generalized a bit.
>
> Similarly in assign_known_vals_to_header_phis.

As in the other case I noted some time ago you generally want to use
gimple_phi_arg_def_from_edge and use either the latch or the preheader
edge to identify a specifc PHI argument.

> Mostly looks OK.  I suspect with the issues noted above fixed that this
> will be good to go.
>
>
> Jeff

Re: [PATCH] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-29 Thread Kito Cheng

Hi Yang-Yu:

>
> Specially, we can reproduce the result on BananaPi-F3 Hardware:
>
> Use this GCC branch with my patch:
> https://github.com/cyyself/gcc/tree/rv_can_inline
>
> And compile the coremark on this branch:
> https://github.com/cyyself/coremark/tree/rva22_v_hotspot
>
> With command `make CC=riscv64-unknown-linux-gnu-gcc compile`
>
> With my patch, we will get the coremark scored `Iterations/Sec   :
> 5992.917461`. But without this patch after `git reset HEAD^` and
> recompile the GCC and then coremark, we will get `Iterations/Sec :
> 5235.602094`, which is 12.6% slower.

Could you add a test case to demonstrate that ?

>  /* Callee's ISA should be a subset of the caller's ISA.  */

This check is necessary, but this way may not scalable for longer term,
I mean people may forgot to update this part when adding new extension
variables,
so I would suggest add a new function to construct a riscv_subset_list
from options
e.g.

diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index dace4de6575..e8b7c0f194b 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -103,6 +103,7 @@ public:
  riscv_subset_list *clone () const;

  static riscv_subset_list *parse (const char *, location_t);
+  static riscv_subset_list *parse (struct gcc_options *opts);
  const char *parse_single_ext (const char *, bool exact_single_p = true);

  const riscv_subset_t *begin () const {return m_head;};

And then use riscv_subset_list to do the checking

Re: [PATCH v4] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-09-29 Thread Kito Cheng

Hi Yang-Yu

Yeah, I was planning to send another version to update the interface
to the latest, but I'm just too busy (too lazy?) to update, anyway
will send new revision soon,

However...one of our folk are also working on
target_clone/target_versions, not sure what your current status is?
Maybe we can collaborate on that?

On Sun, Sep 29, 2024 at 11:20 PM Jeff Law  wrote:
>
>
>
> On 9/29/24 3:16 AM, Yangyu Chen wrote:
> > Good job. I'm currently working on RISC-V target_clone and
> > target_versions support in GCC and found this patch is needed as my
> > prerequisite.
> >
> > However, I found this tagged as "dropped" on Patchwork. What happened?
> I think Kito was going to revamp this stuff IIRC.
>
> jeff

Re: [PATCH] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-29 Thread Yangyu Chen




> On Sep 30, 2024, at 02:49, Jeff Law  wrote:
> 
> 
> 
> On 9/9/24 6:11 AM, Yangyu Chen wrote:
>> Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
>> ISA. As a result, certain functions cannot be optimized with inlining
>> when specific options, such as __attribute__((target("arch=+v"))) .
>> This can lead to potential performance issues when building
>> retargetable binaries for RISC-V.
>> To address this, I have implemented the riscv_can_inline_p function.
>> This addition enables inlining when the callee either has no special
>> options or when the some options match, and also ensuring that the
>> callee's ISA is a subset of the caller's. I also check some other
>> options when there is no always_inline set.
>> gcc/ChangeLog:
>> * config/riscv/riscv.cc (riscv_can_inline_p): New function.
>> (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
> I'd kind of hoped not to mess with this as I suspect keeping this up-to-date 
> is going to end up being somewhat painful and I doubt we're going to see that 
> many cases where folks are using target attributes and where inlining is 
> critical for performance.
> 
> What's really driving your desire to change this?  Is this showing up in real 
> code as a performance issue?
> 

Yeah. I'm working on function multi-versioning support and found
that allowing the callee to be inlined in the caller will result
in performance gains. Without this patch, we will need to CALL the
function since we can't inline them when two functions have different
attributes. A simple evaluation of coremarks shows a 13% performance
degradation.

> 
> 
> 
> 
> 
>> +
>> +  if (caller_opts->x_riscv_tune_string
>> +  && callee_opts->x_riscv_tune_string
>> +  && strcmp (caller_opts->x_riscv_tune_string,
>> + callee_opts->x_riscv_tune_string) != 0)
>> +return false;
> Tuning shouldn't affect correctness of inlining.  I'd just drop this clause 
> if this patch goes forward.
> 

OK. I will fix it in the next revision.

> 
> 
> 
> Jeff

[testcase] Fix absfloat16.c testcase

2024-09-29 Thread Kugan Vivekanandarajah

Hi,

This patch Fixes absfloat16.c testcase to have the dg-add-options float16 at 
the correct order. Due to this mixup, this test is failing for some arm 
variants.

Is this OK for trunk?

Thanks,
Kugan



0001-Fix-absfloat16.c-testcase.patch
Description: 0001-Fix-absfloat16.c-testcase.patch

Re: [PATCH] SH: Document extended asm operand modifers

2024-09-29 Thread Oleg Endo




On Sun, 2024-09-29 at 17:27 -0600, Jeff Law wrote:
> 
> On 9/29/24 5:12 PM, Oleg Endo wrote:
> > 
> > On Sun, 2024-09-29 at 10:40 -0600, Jeff Law wrote:
> > > 
> > > On 9/20/24 3:48 PM, Pietro Monteiro wrote:
> > > > From: Pietro Monteiro 
> > > > 
> > > > SH: Document extended asm operand modifers
> > > > 
> > > > gcc/ChangeLog:
> > > > * doc/extend.texi (SH Operand Modifiers): New.
> > > > 
> > > > Signed-off-by: Pietro Monteiro 
> > > > ---
> > > > Tested by running "make info pdf html" and looking at the pdf and html 
> > > > output. I used the comment on "gcc/config/sh.cc:sh_print_operand()", 
> > > > SH's TARGET_PRINT_OPERAND function, as a guide.
> > > THanks.  I pushed this to the trunk.
> > > jeff
> > > 
> > 
> > I was hesitating on this, because I wasn't sure if it makes sense to have
> > all those officially documented.  But anyway, thanks!
> I'd been hesitating for the same reason.  But finally concluded that 
> worst case we end up documenting something internal and we have to take 
> it out or support it longer than expected.
> 
Yeah, right.  Anyway, now it's in.  All good. :)

Best regards,
Oleg Endo

Re: [PATCH] aarch64: Expand CTZ to RBIT + CLZ for SVE [PR109498]

2024-09-29 Thread Soumya AR

Reworked the patch to substitute immediate register values in the test case with
regular expressions. Apologies for the oversight.

Thanks,
Soumya



> On 24 Sep 2024, at 8:53 AM, Soumya AR  wrote:
>
> Currently, we vectorize CTZ for SVE by using the following operation:
> .CTZ (X) = (PREC - 1) - .CLZ (X & -X)
>
> Instead, this patch expands CTZ to RBIT + CLZ for SVE, as suggested in 
> PR109498.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Soumya AR 
>
> gcc/ChangeLog:
> PR target/109498
> * config/aarch64/aarch64-sve.md (ctz2): Added pattern to expand
>CTZ to RBIT + CLZ for SVE.
>
> gcc/testsuite/ChangeLog:
> PR target/109498
> * gcc.target/aarch64/sve/ctz.c: New test.
>



0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch


0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch
Description: 0001-aarch64-Expand-CTZ-to-RBIT-CLZ-for-SVE-PR109498.patch

Re: [PATCH] SH: Document extended asm operand modifers

2024-09-29 Thread Oleg Endo



On Sun, 2024-09-29 at 10:40 -0600, Jeff Law wrote:
> 
> On 9/20/24 3:48 PM, Pietro Monteiro wrote:
> > From: Pietro Monteiro 
> > 
> > SH: Document extended asm operand modifers
> > 
> > gcc/ChangeLog:
> > * doc/extend.texi (SH Operand Modifiers): New.
> > 
> > Signed-off-by: Pietro Monteiro 
> > ---
> > Tested by running "make info pdf html" and looking at the pdf and html 
> > output. I used the comment on "gcc/config/sh.cc:sh_print_operand()", SH's 
> > TARGET_PRINT_OPERAND function, as a guide.
> THanks.  I pushed this to the trunk.
> jeff
> 

I was hesitating on this, because I wasn't sure if it makes sense to have
all those officially documented.  But anyway, thanks!

Best regards,
Oleg Endo

Re: [PATCH] SH: Document extended asm operand modifers

2024-09-29 Thread Jeff Law





On 9/29/24 5:12 PM, Oleg Endo wrote:


On Sun, 2024-09-29 at 10:40 -0600, Jeff Law wrote:


On 9/20/24 3:48 PM, Pietro Monteiro wrote:

From: Pietro Monteiro 

SH: Document extended asm operand modifers

gcc/ChangeLog:
* doc/extend.texi (SH Operand Modifiers): New.

Signed-off-by: Pietro Monteiro 
---
Tested by running "make info pdf html" and looking at the pdf and html output. I used the 
comment on "gcc/config/sh.cc:sh_print_operand()", SH's TARGET_PRINT_OPERAND function, as 
a guide.

THanks.  I pushed this to the trunk.
jeff



I was hesitating on this, because I wasn't sure if it makes sense to have
all those officially documented.  But anyway, thanks!
I'd been hesitating for the same reason.  But finally concluded that 
worst case we end up documenting something internal and we have to take 
it out or support it longer than expected.


Jeff

[PATCH] RISC-V: Add an implicit dependency for Zawrs

2024-09-29 Thread Xiao Zeng

There is a description in 
:

"The instructions in the Zawrs extension are only useful in conjunction
with the LR instruction, which is provided by the Zalrsc component
of the A extension."

It can be concluded that: zawrs -> zalrsc.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: zawrs -> zalrsc.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-38.c: New test.
* gcc.target/riscv/predef-39.c: New test.

Signed-off-by: Xiao Zeng 
---
 gcc/common/config/riscv/riscv-common.cc|  1 +
 gcc/testsuite/gcc.target/riscv/predef-38.c | 31 ++
 gcc/testsuite/gcc.target/riscv/predef-39.c | 31 ++
 3 files changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-38.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-39.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index bd42fd01532..a6abd903b98 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -96,6 +96,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"zabha", "zaamo"},
   {"zacas", "zaamo"},
+  {"zawrs", "zalrsc"},
 
   {"zcmop", "zca"},
 
diff --git a/gcc/testsuite/gcc.target/riscv/predef-38.c 
b/gcc/testsuite/gcc.target/riscv/predef-38.c
new file mode 100644
index 000..986c02b451a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-38.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv32i_zawrs -mabi=ilp32 -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 32
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_zawrs)
+#error "__riscv_zawrs"
+#endif
+
+#if !defined(__riscv_zalrsc)
+#error "__riscv_zalrsc"
+#endif
+
+#if defined(__riscv_a)
+#error "__riscv_a"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-39.c 
b/gcc/testsuite/gcc.target/riscv/predef-39.c
new file mode 100644
index 000..558164de8c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-39.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64i_zawrs -mabi=lp64 -mcmodel=medlow 
-misa-spec=20191213" } */
+
+int main () {
+
+#ifndef __riscv_arch_test
+#error "__riscv_arch_test"
+#endif
+
+#if __riscv_xlen != 64
+#error "__riscv_xlen"
+#endif
+
+#if !defined(__riscv_i)
+#error "__riscv_i"
+#endif
+
+#if !defined(__riscv_zawrs)
+#error "__riscv_zawrs"
+#endif
+
+#if !defined(__riscv_zalrsc)
+#error "__riscv_zalrsc"
+#endif
+
+#if defined(__riscv_a)
+#error "__riscv_a"
+#endif
+
+  return 0;
+}
-- 
2.17.1

[PATCH v2] RISC-V: Optimize branches with shifted immediate operands

2024-09-29 Thread Jovan Vukic

Based on the feedback I received, the patch now uses a custom code iterator
instead of relying on match_operator.

Since both operands (2 and 3) have trailing zeros, an additional check
was introduced to verify if they remain SMALL_OPERANDs after shifting by
the smaller number of trailing zeros.

To avoid complex inline check, I introduced two new macros, ensuring that the
check is encapsulated and reusable, as suggested in the original feedback.

It was pointed out that this check should be added because expressions like
(x & 0x81000) == 0x8100 or (x & 0x8100) == 0x81000 might lead to
unrecognized instructions during splitting. However, both expressions
immediately evaluate to false (https://godbolt.org/z/Y11EGMb4f) due to
the bit arrangement in constants 0x8100 and 0x81000.

As a result, this pattern won't be applied in such cases, so it cannot
cause any issues. Therefore, it wasn't necessary to include this as a
negative test in the testsuite.

Despite my efforts, I couldn't find another example for a negative test.
All similar cases evaluated to false immediately due to the bit arrangement.
I mentioned this in a follow-up comment on the previous patch version.
Regardless, the necessary check has been added, so if a negative case
exists, an unrecognized instruction will not be created.

2024-09-30  Jovan Vukic  

PR target/115921

gcc/ChangeLog:

* config/riscv/iterators.md (any_eq): New code iterator.
* config/riscv/riscv.h (COMMON_TRAILING_ZEROS): New macro.
(SMALL_AFTER_COMMON_TRAILING_SHIFT): Ditto.
* config/riscv/riscv.md 
(*branch_shiftedarith__shifted):
New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/branch-1.c: Additional tests.

CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/iterators.md |  3 +++
 gcc/config/riscv/riscv.h  | 12 +
 gcc/config/riscv/riscv.md | 32 +++
 gcc/testsuite/gcc.target/riscv/branch-1.c | 16 +---
 4 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 2844cb02ff0..288f11464a2 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -233,6 +233,7 @@
 (define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
+(define_code_iterator any_eq [eq ne])
 
 ; atomics code iterator
 (define_code_iterator any_atomic [plus ior xor and])
@@ -283,6 +284,8 @@
 (le "le")
 (gt "gt")
 (lt "lt")
+(eq "eq")
+(ne "ne")
 (ior "ior")
 (xor "xor")
 (and "and")
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 3aecb43f831..c782499b2bf 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -667,6 +667,18 @@ enum reg_class
 /* True if bit BIT is set in VALUE.  */
 #define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
 
+/* Returns the smaller (common) number of trailing zeros for VAL1 and VAL2.  */
+#define COMMON_TRAILING_ZEROS(VAL1, VAL2)  \
+  (ctz_hwi (VAL1) < ctz_hwi (VAL2) \
+   ? ctz_hwi (VAL1)\
+   : ctz_hwi (VAL2))
+
+/* Returns true if both VAL1 and VAL2 are SMALL_OPERANDs after shifting by
+   the common number of trailing zeros.  */
+#define SMALL_AFTER_COMMON_TRAILING_SHIFT(VAL1, VAL2)  \
+  (SMALL_OPERAND ((VAL1) >> COMMON_TRAILING_ZEROS (VAL1, VAL2))
\
+   && SMALL_OPERAND ((VAL2) >> COMMON_TRAILING_ZEROS (VAL1, VAL2)))
+
 /* Stack layout; function entry, exit and calling.  */
 
 #define STACK_GROWS_DOWNWARD 1
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 0410d990ec5..afc534d050f 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3129,6 +3129,38 @@
 }
 [(set_attr "type" "branch")])
 
+(define_insn_and_split "*branch_shiftedarith__shifted"
+  [(set (pc)
+   (if_then_else (any_eq
+   (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+ (match_operand 2 "shifted_const_arith_operand" "i"))
+   (match_operand 3 "shifted_const_arith_operand" "i"))
+(label_ref (match_operand 0 "" ""))
+(pc)))
+   (clobber (match_scratch:X 4 "=&r"))
+   (clobber

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-29 Thread David Malcolm

On Sun, 2024-09-29 at 17:12 +0200, Florian Weimer wrote:
> Sometimes this is a user error, sometimes it is more of an ICE.
> In either case, more information about the conflict is helpful.
> 
> I used to this to get a better idea about what is going on with
> PR116887.  The original diagnostics look like this:
> 
> dl-find_object.c: In function ‘_dlfo_process_initial’:
> dl-find_object.c:507:1: error: section type conflict with
> ‘_dlfo_nodelete_mappings’
>   507 | _dlfo_process_initial (void)
>   | ^
> dl-find_object.c:73:40: note: ‘_dlfo_nodelete_mappings’ was declared
> here
>    73 | static struct dl_find_object_internal
> *_dlfo_nodelete_mappings
>   |   
> ^~~
> 
> 
> I don't quite understand what is going on (the symbol that's being
> flagged for conflict is somewhat unstable), but at least the new
> diagnostics show that the sectio name, and maybe the flags are
> useful,
> too:
> 
> /tmp/bug.i:6798:1: error: section ‘.data.rel.ro’ type conflict with
> ‘_dlfo_main’
>  6798 | }
>   | ^
> /tmp/bug.i:6190:39: note: ‘_dlfo_main’ was declared here
>  6190 | static struct dl_find_object_internal _dlfo_main
> __attribute__ ((section (".data.rel.ro")));
>   |   ^~
> /tmp/bug.i:6798:1: error: previous section type:
> WRITE|NOTYPE|DECLARED|NAMED
>  6798 | }
>   | ^
> /tmp/bug.i:6798:1: error: new section type: WRITE|NOTYPE|NAMED|RELRO
> 
> I still need help with producing a non-error, non-anchored
> diagnostic.
> I don't know how to do that.

I'm not quite sure what you mean by "non-error" and "non-anchored". 

By "non-error", do you mean that this should this be a warning?  If so,
use warning_at.  You can use 0 for the option_id whilst prototyping. 
Or use "inform" to get a note.

By "non-anchored", do you mean "not associated with any particular
source location"?  If so, use error_at/warning_at and use
UNKNOWN_LOCATION for the location_t ("error" and "warning" implicitly
use the "input_location" global variable; it's usually best to instead
specify a location, or use UNKNOWN_LOCATION for "global" problems).

Or am I misunderstanding?

Hope this is helpful
Dave




> 
> Thanks,
> Florian
> 
>   * varasm.cc (section_flags_to_string):  New function.
>   (get_section): Include name of section in diagnostics.
>   Print old and new section flags, as rendered by
>   section_flags_to_string.
> 
> ---
>  gcc/varasm.cc | 80
> ---
>  1 file changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4426e7ce6c6..deba15933aa 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
>     We also output the assembler code for constants stored in memory
>     and are responsible for combining constants with the same value. 
> */
>  
> +#define INCLUDE_STRING
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -276,6 +277,65 @@ get_noswitch_section (unsigned int flags,
> noswitch_section_callback callback)
>    return sect;
>  }
>  
> +/* Return a string describing the section flags.  */
> +
> +static std::string
> +section_flags_to_string (unsigned int flags)
> +{
> +  if (flags == 0)
> +    return "UNNAMED";
> +  std::string result;
> +  auto append = [&result] (bool f, const char *name)
> +  {
> +    if (f)
> +  {
> + if (!result.empty ())
> +   result += '|';
> + result += name;
> +  }
> +  };
> +
> +  append (flags & SECTION_CODE, "CODE");
> +  append (flags & SECTION_WRITE, "WRITE");
> +  append (flags & SECTION_DEBUG, "DEBUG");
> +  append (flags & SECTION_LINKONCE, "LINKONCE");
> +  append (flags & SECTION_SMALL, "SMALL");
> +  append (flags & SECTION_BSS, "BSS");
> +  append (flags & SECTION_MERGE, "MERGE");
> +  append (flags & SECTION_STRINGS, "STRINGS");
> +  append (flags & SECTION_OVERRIDE, "OVERRIDE");
> +  append (flags & SECTION_TLS, "TLS");
> +  append (flags & SECTION_NOTYPE, "NOTYPE");
> +  append (flags & SECTION_DECLARED, "DECLARED");
> +  append (flags & SECTION_NAMED, "NAMED");
> +  append (flags & SECTION_NOSWITCH, "NOSWITCH");
> +  append (flags & SECTION_COMMON, "COMMON");
> +  append (flags & SECTION_RELRO, "RELRO");
> +  append (flags & SECTION_EXCLUDE, "EXCLUDE");
> +  append (flags & SECTION_RETAIN, "RETAIN");
> +  append (flags & SECTION_LINK_ORDER, "LINK_ORDER");
> +
> +  unsigned int entsize = flags & SECTION_ENTSIZE;
> +  if (entsize != 0)
> +    {
> +  if (!result.empty ())
> + result += ',';
> +  result += "size=";
> +  result += std::to_string (entsize);
> +    }
> +
> +  unsigned int mach_dep = flags / SECTION_MACH_DEP;
> +  if (mach_dep != 0)
> +    {
> +  if (!result.empty ())
> + result += ',';
> +  result += "mach=";
> +  result += std::to_string (mach_dep);
> +    }
> +
> +  return result;
> +}

Re: [PATCH v4] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-09-29 Thread Yangyu Chen

On Sep 30, 2024, at 14:18, Kito Cheng  wrote:
> 
> Hi Yang-Yu
> 
> Yeah, I was planning to send another version to update the interface
> to the latest, but I'm just too busy (too lazy?) to update, anyway
> will send new revision soon,
> 
> However...one of our folk are also working on
> target_clone/target_versions, not sure what your current status is?
> Maybe we can collaborate on that?

My current status is have some basic implementation of some functions
including:
- TARGET_OPTION_FUNCTION_VERSIONS
- TARGET_COMPARE_VERSION_PRIORITY
- TARGET_GENERATE_VERSION_DISPATCHER_BODY
- TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
- TARGET_MANGLE_DECL_ASSEMBLER_NAME

And generated a very simple code with some manual effort.

I’m glad to see you have a folk working on target_clone/target_versions.
I think we can collaborate on that.

I am doing a research project that relies on target_clone support
on RISC-V, so I want a working target_clone GCC to work and want
to devote my time to it.

Thanks,
Yangyu Chen

> 
> On Sun, Sep 29, 2024 at 11:20 PM Jeff Law  wrote:
>> 
>> 
>> 
>> On 9/29/24 3:16 AM, Yangyu Chen wrote:
>>> Good job. I'm currently working on RISC-V target_clone and
>>> target_versions support in GCC and found this patch is needed as my
>>> prerequisite.
>>> 
>>> However, I found this tagged as "dropped" on Patchwork. What happened?
>> I think Kito was going to revamp this stuff IIRC.
>> 
>> jeff

[PATCH] libstdc++-v3: Fix signed-overflow warning for newlib/ctype_base.h, PR116895

2024-09-29 Thread Hans-Peter Nilsson

FWIW, I see "typedef char mask;" also for bionic and
openbsd.  Tested for cris-elf.

Ok to commit?

-- >8 --
There are 100+ regressions when running the g++ testsuite for newlib
targets (probably excepting ARM-based ones) e.g cris-elf after commit
r15-3859-g63a598deb0c9fc "libstdc++: #ifdef out #pragma GCC
system_header", which effectively no longer silences warnings for
gcc-installed system headers.  Some of these regressions are fixed by
r15-3928.  For the remaining ones, there's in g++.log:

FAIL: g++.old-deja/g++.robertl/eb79.C  -std=c++26 (test for excess errors)
Excess errors:
/gccobj/cris-elf/libstdc++-v3/include/cris-elf/bits/ctype_base.h:50:53: \
 warning: overflow in conversion from 'int' to 'std::ctype_base::mask' \
 {aka 'char'} changes value from '151' to '-105' [-Woverflow]

This is because the _B macro in newlib's ctype.h (from where the
"_" macros come) is bit 7, the sign-bit of 8-bit types:

#define _B  0200

Using it in an int-expression that is then truncated to 8 bits will
"change" the value to negative for a default-signed char.  If this
code was created from scratch, it should have been an unsigned type,
however it's not advisable to change the type of mask as this affects
the API.  The least ugly option seems to be to silence the warning by
explict casts in the initializer, and for consistency, doing it for
all members.

PR libstdc++/116895
* config/os/newlib/ctype_base.h: Avoid signed-overflow warnings by
explicitly casting initializer expressions to mask.
---
 libstdc++-v3/config/os/newlib/ctype_base.h | 24 +++---
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/config/os/newlib/ctype_base.h 
b/libstdc++-v3/config/os/newlib/ctype_base.h
index 309fdeea7731..5ec43a0c6803 100644
--- a/libstdc++-v3/config/os/newlib/ctype_base.h
+++ b/libstdc++-v3/config/os/newlib/ctype_base.h
@@ -41,19 +41,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // NB: Offsets into ctype::_M_table force a particular size
 // on the mask type. Because of this, we don't use an enum.
 typedef char   mask;
-static const mask upper= _U;
-static const mask lower= _L;
-static const mask alpha= _U | _L;
-static const mask digit= _N;
-static const mask xdigit   = _X | _N;
-static const mask space= _S;
-static const mask print= _P | _U | _L | _N | _B;
-static const mask graph= _P | _U | _L | _N;
-static const mask cntrl= _C;
-static const mask punct= _P;
-static const mask alnum= _U | _L | _N;
+static const mask upper= mask (_U);
+static const mask lower= mask (_L);
+static const mask alpha= mask (_U | _L);
+static const mask digit= mask (_N);
+static const mask xdigit   = mask (_X | _N);
+static const mask space= mask (_S);
+static const mask print= mask (_P | _U | _L | _N | _B);
+static const mask graph= mask (_P | _U | _L | _N);
+static const mask cntrl= mask (_C);
+static const mask punct= mask (_P);
+static const mask alnum= mask (_U | _L | _N);
 #if __cplusplus >= 201103L
-static const mask blank= space;
+static const mask blank= mask (space);
 #endif
   };
 
-- 
2.30.2

Re: [PATCH v4] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-09-29 Thread Yangyu Chen

Good job. I'm currently working on RISC-V target_clone and 
target_versions support in GCC and found this patch is needed as my 
prerequisite.


However, I found this tagged as "dropped" on Patchwork. What happened?

Re: [PATCH] doc: Document struct-layout-1.exp for ABI checks

2024-09-29 Thread Jakub Jelinek

On Sun, Sep 08, 2024 at 08:48:57AM +0300, Dimitar Dimitrov wrote:
> This test helped discover PR116621, so it is worth being documented.
> 
> gcc/ChangeLog:
> 
>   * doc/sourcebuild.texi: Document struct-layout-1.exp.
> 
> Signed-off-by: Dimitar Dimitrov 

LGTM.

Jakub

Re: [PATCH] doc: Document struct-layout-1.exp for ABI checks

2024-09-29 Thread Dimitar Dimitrov

On Sun, Sep 08, 2024 at 08:48:57AM +0300, Dimitar Dimitrov wrote:
> This test helped discover PR116621, so it is worth being documented.
> 
> gcc/ChangeLog:
> 
>   * doc/sourcebuild.texi: Document struct-layout-1.exp.
> 
> Signed-off-by: Dimitar Dimitrov 
> ---
>  gcc/doc/sourcebuild.texi | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 


Ping.

Regards,
Dimitar

Re: [PATCH] gdbhooks: Handle references to vec* in VecPrinter

2024-09-29 Thread Jeff Law





On 9/23/24 4:33 AM, Alex Coplan wrote:

On 30/08/2024 18:11, Alex Coplan wrote:

Hi,

vec.h has this method:

   template
   inline T *
   vec_safe_push (vec *&v, const T &obj CXX_MEM_STAT_INFO)

where v is a reference to a pointer to vec.  This matches the regex for
VecPrinter, so gdbhooks.py attempts to print it but chokes on the reference.
I see the following:

   #1  0x02b84b7b in vec_safe_push (v=Traceback (most
   recent call last):
 File "$SRC/gcc/gcc/gdbhooks.py", line 486, in to_string
   return '0x%x' % intptr(self.gdbval)
 File "$SRC/gcc/gcc/gdbhooks.py", line 168, in intptr
   return long(gdbval) if sys.version_info.major == 2 else int(gdbval)
   gdb.error: Cannot convert value to long.

This patch makes VecPrinter handle such references by stripping them
(dereferencing) at the top of the relevant functions.

I thought about trying to make VecPrinter.{to_string,children} robust
against non-pointer values (i.e. actual vec structs) as the current
calls to intptr will fail on those.  However, I then realised that the
current regex only matches pointer types:

   pp.add_printer_for_regex(r'vec<(\S+), (\S+), (\S+)> \*',
'vec',
VecPrinter)

That is somewhat at odds with the (pre-existing) code in
VecPrinter.children which appears to attempt to handle non-pointer
types.  ISTM either we should drop the handling for non-pointer types
(since the regex requires a pointer) or (perhaps more usefully) relax
the regex to allow matching a plain vec<...> struct and fix the member
functions to handle those properly.

Any thoughts on that, Dave?  Is the current patch OK as an intermediate
step (manually tested by verifying both a vec*& and vec* print OK)?


Gentle ping on this.

OK

Jeff

Re: [PATCH] MATCH: Simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, -1.0/1.0)` [PR112472]

2024-09-29 Thread Jeff Law





On 9/24/24 2:57 AM, Eikansh Gupta wrote:

This patch simplify `(trunc)copysign ((extend)x, CST)` to `copysign (x, 
-1.0/1.0)`
depending on the sign of CST. Previously, it was simplified to `copysign (x, 
CST)`.
It can be optimized as the sign of the CST matters, not the value.

The patch also simplify `(trunc)abs (extend x)` to `abs (x)`.

PR tree-optimization/112472

gcc/ChangeLog:

* match.pd ((trunc)copysign ((extend)x, -CST) --> copysign (x, -1.0)): 
New pattern.
((trunc)abs (extend x) --> abs (x)): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr112472.c: New test.
One might argue that -0.0/+0.0 are even better than -1.0/1.0 as the 0.0 
variants are cheaper to construct.


You'd have to check that we honor signed zeros

jeff

Re: [PATCH v2] RISC-V: Improve code generation for select of consecutive constants

2024-09-29 Thread Jeff Law





On 9/27/24 1:36 AM, Jovan Vukic wrote:

Based on the valuable feedback I received, I decided to implement the patch
in the RTL pipeline. Since a similar optimization already exists in
simplify_binary_operation_1, I chose to generalize my original approach
and place it directly below that code.

The expression (X xor C1) + C2 is simplified to X xor (C1 xor C2) under
the conditions described in the patch. This is a more general optimization,
but it still applies to the RISC-V case, which was my initial goal:

long f1(long x, long y) {
 return (x > y) ? 2 : 3;
}


Before the patch, the generated assembly is:

f1(long, long):
 sgt a0,a0,a1
 xoria0,a0,1
 addia0,a0,2
 ret

After the patch, the generated assembly is:

f1(long, long):
 sgt a0,a0,a1
 xoria0,a0,3
 ret


The patch optimizes cases like x LT/GT y ? 2 : 3 (and x GE/LE y ? 3 : 2),
as initially intended. Since this optimization is more general, I noticed
it also optimizes cases like x < CONST ? 3 : 2 when CONST < 0. I’ve added
tests for these cases as well.

A bit of logic behind the patch: The equality A + B == A ^ B + 2 * (A & B)
always holds true. This can be simplified to A ^ B if 2 * (A & B) == 0.
In our case, we have A == X ^ C1, B == C2 and X is either 0 or 1.

2024-09-27  Jovan Vukic  

 PR target/108038

gcc/ChangeLog:

 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New
 simplification.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/slt-1.c: New test.
THanks.  I fixed a tabs vs space issue in the ChangeLog and pushed this 
to the trunk.


jeff

Re: [patch,avr] doc: Adjust more web links

2024-09-29 Thread Jeff Law





On 9/17/24 4:44 AM, Georg-Johann Lay wrote:

This patch updates more web links from nongnu to Github.
The http://www.nongnu.org/avr links still worked, but the
"super project" seems to be deserted.  Instead, it now links:

* https://avrdudes.github.io/avr-libc/avr-libc-user-manual/ 
install_tools.html
* https://github.com/sprintersb/atest?tab=readme-ov-file#running-the- 
avr-gcc-testsuite-using-the-avrtest-simulator


Ok for trunk?

Johann

--

     AVR: doc/install.texi - Update avr specific installation notes.

     gcc/
     * doc/install.texi (Host/Target specific installation notes 
for GCC)

     [avr]: Update web links to AVR-LibC and AVR Options.
     Remove outdated note about Binutils.

OK.
jeff

[RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-29 Thread Florian Weimer

Sometimes this is a user error, sometimes it is more of an ICE.
In either case, more information about the conflict is helpful.

I used to this to get a better idea about what is going on with
PR116887.  The original diagnostics look like this:

dl-find_object.c: In function ‘_dlfo_process_initial’:
dl-find_object.c:507:1: error: section type conflict with 
‘_dlfo_nodelete_mappings’
  507 | _dlfo_process_initial (void)
  | ^
dl-find_object.c:73:40: note: ‘_dlfo_nodelete_mappings’ was declared here
   73 | static struct dl_find_object_internal *_dlfo_nodelete_mappings
  |^~~


I don't quite understand what is going on (the symbol that's being
flagged for conflict is somewhat unstable), but at least the new
diagnostics show that the sectio name, and maybe the flags are useful,
too:

/tmp/bug.i:6798:1: error: section ‘.data.rel.ro’ type conflict with ‘_dlfo_main’
 6798 | }
  | ^
/tmp/bug.i:6190:39: note: ‘_dlfo_main’ was declared here
 6190 | static struct dl_find_object_internal _dlfo_main __attribute__ 
((section (".data.rel.ro")));
  |   ^~
/tmp/bug.i:6798:1: error: previous section type: WRITE|NOTYPE|DECLARED|NAMED
 6798 | }
  | ^
/tmp/bug.i:6798:1: error: new section type: WRITE|NOTYPE|NAMED|RELRO

I still need help with producing a non-error, non-anchored diagnostic.
I don't know how to do that.

Thanks,
Florian

* varasm.cc (section_flags_to_string):  New function.
(get_section): Include name of section in diagnostics.
Print old and new section flags, as rendered by
section_flags_to_string.

---
 gcc/varasm.cc | 80 ---
 1 file changed, 76 insertions(+), 4 deletions(-)

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 4426e7ce6c6..deba15933aa 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
We also output the assembler code for constants stored in memory
and are responsible for combining constants with the same value.  */
 
+#define INCLUDE_STRING
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -276,6 +277,65 @@ get_noswitch_section (unsigned int flags, 
noswitch_section_callback callback)
   return sect;
 }
 
+/* Return a string describing the section flags.  */
+
+static std::string
+section_flags_to_string (unsigned int flags)
+{
+  if (flags == 0)
+return "UNNAMED";
+  std::string result;
+  auto append = [&result] (bool f, const char *name)
+  {
+if (f)
+  {
+   if (!result.empty ())
+ result += '|';
+   result += name;
+  }
+  };
+
+  append (flags & SECTION_CODE, "CODE");
+  append (flags & SECTION_WRITE, "WRITE");
+  append (flags & SECTION_DEBUG, "DEBUG");
+  append (flags & SECTION_LINKONCE, "LINKONCE");
+  append (flags & SECTION_SMALL, "SMALL");
+  append (flags & SECTION_BSS, "BSS");
+  append (flags & SECTION_MERGE, "MERGE");
+  append (flags & SECTION_STRINGS, "STRINGS");
+  append (flags & SECTION_OVERRIDE, "OVERRIDE");
+  append (flags & SECTION_TLS, "TLS");
+  append (flags & SECTION_NOTYPE, "NOTYPE");
+  append (flags & SECTION_DECLARED, "DECLARED");
+  append (flags & SECTION_NAMED, "NAMED");
+  append (flags & SECTION_NOSWITCH, "NOSWITCH");
+  append (flags & SECTION_COMMON, "COMMON");
+  append (flags & SECTION_RELRO, "RELRO");
+  append (flags & SECTION_EXCLUDE, "EXCLUDE");
+  append (flags & SECTION_RETAIN, "RETAIN");
+  append (flags & SECTION_LINK_ORDER, "LINK_ORDER");
+
+  unsigned int entsize = flags & SECTION_ENTSIZE;
+  if (entsize != 0)
+{
+  if (!result.empty ())
+   result += ',';
+  result += "size=";
+  result += std::to_string (entsize);
+}
+
+  unsigned int mach_dep = flags / SECTION_MACH_DEP;
+  if (mach_dep != 0)
+{
+  if (!result.empty ())
+   result += ',';
+  result += "mach=";
+  result += std::to_string (mach_dep);
+}
+
+  return result;
+}
+
 /* Return the named section structure associated with NAME.  Create
a new section with the given fields if no such structure exists.
When NOT_EXISTING, then fail if the section already exists.  Return
@@ -312,6 +372,9 @@ get_section (const char *name, unsigned int flags, tree 
decl,
internal_error ("section already exists: %qs", name);
 
   sect = *slot;
+  unsigned int original_flags = sect->common.flags;
+  unsigned int new_flags = flags;
+
   /* It is fine if one of the sections has SECTION_NOTYPE as long as
  the other has none of the contrary flags (see the logic at the end
  of default_section_type_flags, below).  */
@@ -355,17 +418,26 @@ get_section (const char *name, unsigned int flags, tree 
decl,
  && decl != sect->named.decl)
{
  if (decl != NULL && DECL_P (decl))
-   error ("%+qD causes a section type conflict with

Re: [PATCH v4] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-09-29 Thread Jeff Law





On 9/29/24 3:16 AM, Yangyu Chen wrote:
Good job. I'm currently working on RISC-V target_clone and 
target_versions support in GCC and found this patch is needed as my 
prerequisite.


However, I found this tagged as "dropped" on Patchwork. What happened?

I think Kito was going to revamp this stuff IIRC.

jeff

Re: [PATCH] MATCH: Simplify `min(a, b) op max(a, b)` to `a op b` [PR109401]

2024-09-29 Thread Jeff Law





On 9/25/24 2:30 AM, Eikansh Gupta wrote:

This patch simplify `min(a,b) op max(a,b)` to `a op b`. This optimization
will work for all the binary commutative operations. So, the `op` here can
be one of {plus, mult, bit_and, bit_xor, bit_ior, eq, ne, min, max}.

PR tree-optimization/109878
PR 109401

gcc/ChangeLog:

* match.pd (min(a,b) op max(a,b) -> a op b): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr109401.c: New test.
 * gcc.dg/tree-ssa/pr109401-1.c: New test.
---
  gcc/match.pd   |  7 +++
  gcc/testsuite/gcc.dg/tree-ssa/pr109401-1.c | 35 +
  gcc/testsuite/gcc.dg/tree-ssa/pr109401.c   | 61 ++
  3 files changed, 103 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109401-1.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109401.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 940292d0d49..4f0453344d6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4463,6 +4463,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  @0
  @2)))
  
+/* min (a, b) op max (a, b) -> a op b */

+(for op (plus mult bit_and bit_xor bit_ior eq ne min max)
+ (simplify
+  (op:c (min:c @0 @1) (max:c @0 @1))
+   (if (!HONOR_NANS (@0))
+(op @0 @1
+

Does it work in the other order  ie max (a, b) op (min (a, b) -> a op b?


What about other OP cases such as subtraction, div/mod?


Jeff

Re: [PATCH v1 2/3] RISC-V: Implement scalar SAT_SUB for signed integer

2024-09-29 Thread Jeff Law





On 9/24/24 8:55 PM, pan2...@intel.com wrote:

From: Pan Li 

This patch would like to implement the sssub form 1.  Aka:

Form 1:
   #define DEF_SAT_S_SUB_FMT_1(T, UT, MIN, MAX) \
   T __attribute__((noinline))  \
   sat_s_sub_##T##_fmt_1 (T x, T y) \
   {\
 T minus = (UT)x - (UT)y;   \
 return (x ^ y) >= 0\
   ? minus  \
   : (minus ^ x) >= 0   \
 ? minus\
 : x < 0 ? MIN : MAX;   \
   }

DEF_SAT_S_SUB_FMT_1(int8_t, uint8_t, INT8_MIN, INT8_MAX)

Before this patch:
   10   │ sat_s_sub_int8_t_fmt_1:
   11   │ subwa5,a0,a1
   12   │ slliw   a5,a5,24
   13   │ sraiw   a5,a5,24
   14   │ xor a1,a0,a1
   15   │ xor a4,a0,a5
   16   │ and a1,a1,a4
   17   │ blt a1,zero,.L4
   18   │ mv  a0,a5
   19   │ ret
   20   │ .L4:
   21   │ sraia0,a0,63
   22   │ xoria5,a0,127
   23   │ mv  a0,a5
   24   │ ret

After this patch:
   10   │ sat_s_sub_int8_t_fmt_1:
   11   │ sub a4,a0,a1
   12   │ xor a5,a0,a4
   13   │ xor a1,a0,a1
   14   │ and a5,a5,a1
   15   │ srlia5,a5,7
   16   │ andia5,a5,1
   17   │ sraia0,a0,63
   18   │ xoria3,a0,127
   19   │ neg a0,a5
   20   │ addia5,a5,-1
   21   │ and a3,a3,a0
   22   │ and a0,a4,a5
   23   │ or  a0,a0,a3
   24   │ slliw   a0,a0,24
   25   │ sraiw   a0,a0,24
   26   │ ret

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_sssub): Add new func
decl for expanding signed SAT_SUB.
* config/riscv/riscv.cc (riscv_expand_sssub): Add new func impl
for expanding signed SAT_SUB.
* config/riscv/riscv.md (sssub3): Add new pattern sssub
for scalar signed integer.

OK
jeff

Re: [PATCH] [PATCH] Avoid integer overflow in gcc.dg/cpp/charconst-3.c (PR testsuite/116806)

2024-09-29 Thread Jeff Law





On 9/22/24 7:00 AM, Mikael Pettersson wrote:

The intermediate expression (unsigned char) '\234' * scale overflows
int on int16 targets, causing the test case to fail there.  Fixed by
performing the arithmetic in unsigned type, as suggested by Andrew Pinski.

Regression tested on x86_64-pc-linux-gnu, and on an out-of-tree 16-bit
target with simulator.  Manually checked the generated code for pdp11
and xstormy16.

Ok for trunk? (I don't have commit rights so I'd need help committing it.)

gcc/testsuite/

PR testsuite/116806
* gcc.dg/cpp/charconst-3.c: Perform arithmetic in unsigned
type to avoid integer overflow.

THanks.  I've pushed this to the trunk.
jeff

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-29 Thread Andrew Pinski

On Sun, Sep 29, 2024 at 8:13 AM Florian Weimer  wrote:
>
> Sometimes this is a user error, sometimes it is more of an ICE.
> In either case, more information about the conflict is helpful.
>
> I used to this to get a better idea about what is going on with
> PR116887.  The original diagnostics look like this:
>
> dl-find_object.c: In function ‘_dlfo_process_initial’:
> dl-find_object.c:507:1: error: section type conflict with 
> ‘_dlfo_nodelete_mappings’
>   507 | _dlfo_process_initial (void)
>   | ^
> dl-find_object.c:73:40: note: ‘_dlfo_nodelete_mappings’ was declared here
>73 | static struct dl_find_object_internal *_dlfo_nodelete_mappings
>   |^~~
>
>
> I don't quite understand what is going on (the symbol that's being
> flagged for conflict is somewhat unstable), but at least the new
> diagnostics show that the sectio name, and maybe the flags are useful,
> too:
>
> /tmp/bug.i:6798:1: error: section ‘.data.rel.ro’ type conflict with 
> ‘_dlfo_main’
>  6798 | }
>   | ^
> /tmp/bug.i:6190:39: note: ‘_dlfo_main’ was declared here
>  6190 | static struct dl_find_object_internal _dlfo_main __attribute__ 
> ((section (".data.rel.ro")));
>   |   ^~
> /tmp/bug.i:6798:1: error: previous section type: WRITE|NOTYPE|DECLARED|NAMED
>  6798 | }
>   | ^
> /tmp/bug.i:6798:1: error: new section type: WRITE|NOTYPE|NAMED|RELRO
>
> I still need help with producing a non-error, non-anchored diagnostic.
> I don't know how to do that.
>
> Thanks,
> Florian
>
> * varasm.cc (section_flags_to_string):  New function.
> (get_section): Include name of section in diagnostics.
> Print old and new section flags, as rendered by
> section_flags_to_string.
>
> ---
>  gcc/varasm.cc | 80 
> ---
>  1 file changed, 76 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4426e7ce6c6..deba15933aa 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
> We also output the assembler code for constants stored in memory
> and are responsible for combining constants with the same value.  */
>
> +#define INCLUDE_STRING
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -276,6 +277,65 @@ get_noswitch_section (unsigned int flags, 
> noswitch_section_callback callback)
>return sect;
>  }
>
> +/* Return a string describing the section flags.  */
> +
> +static std::string
> +section_flags_to_string (unsigned int flags)
> +{
> +  if (flags == 0)
> +return "UNNAMED";
> +  std::string result;
> +  auto append = [&result] (bool f, const char *name)
> +  {
> +if (f)
> +  {
> +   if (!result.empty ())
> + result += '|';
> +   result += name;
> +  }
> +  };
> +
> +  append (flags & SECTION_CODE, "CODE");

I notice you capture result and it seems like you could also capture
flags and change this to:
append (SECTION_CODE, "CODE");

Thanks,
Andrew Pinski

> +  append (flags & SECTION_WRITE, "WRITE");
> +  append (flags & SECTION_DEBUG, "DEBUG");
> +  append (flags & SECTION_LINKONCE, "LINKONCE");
> +  append (flags & SECTION_SMALL, "SMALL");
> +  append (flags & SECTION_BSS, "BSS");
> +  append (flags & SECTION_MERGE, "MERGE");
> +  append (flags & SECTION_STRINGS, "STRINGS");
> +  append (flags & SECTION_OVERRIDE, "OVERRIDE");
> +  append (flags & SECTION_TLS, "TLS");
> +  append (flags & SECTION_NOTYPE, "NOTYPE");
> +  append (flags & SECTION_DECLARED, "DECLARED");
> +  append (flags & SECTION_NAMED, "NAMED");
> +  append (flags & SECTION_NOSWITCH, "NOSWITCH");
> +  append (flags & SECTION_COMMON, "COMMON");
> +  append (flags & SECTION_RELRO, "RELRO");
> +  append (flags & SECTION_EXCLUDE, "EXCLUDE");
> +  append (flags & SECTION_RETAIN, "RETAIN");
> +  append (flags & SECTION_LINK_ORDER, "LINK_ORDER");
> +
> +  unsigned int entsize = flags & SECTION_ENTSIZE;
> +  if (entsize != 0)
> +{
> +  if (!result.empty ())
> +   result += ',';
> +  result += "size=";
> +  result += std::to_string (entsize);
> +}
> +
> +  unsigned int mach_dep = flags / SECTION_MACH_DEP;
> +  if (mach_dep != 0)
> +{
> +  if (!result.empty ())
> +   result += ',';
> +  result += "mach=";
> +  result += std::to_string (mach_dep);
> +}
> +
> +  return result;
> +}
> +
>  /* Return the named section structure associated with NAME.  Create
> a new section with the given fields if no such structure exists.
> When NOT_EXISTING, then fail if the section already exists.  Return
> @@ -312,6 +372,9 @@ get_section (const char *name, unsigned int flags, tree 
> decl,
> internal_error ("section already exists: %qs", name);
>
>sect = *slot;
> +  unsigned int original_flags = sect->common.flags;
> +  unsigned int new_flags = fl

Re: [PATCH] SH: Document extended asm operand modifers

2024-09-29 Thread Jeff Law





On 9/20/24 3:48 PM, Pietro Monteiro wrote:

From: Pietro Monteiro 

SH: Document extended asm operand modifers

gcc/ChangeLog:
* doc/extend.texi (SH Operand Modifiers): New.

Signed-off-by: Pietro Monteiro 
---
Tested by running "make info pdf html" and looking at the pdf and html output. I used the 
comment on "gcc/config/sh.cc:sh_print_operand()", SH's TARGET_PRINT_OPERAND function, as 
a guide.

THanks.  I pushed this to the trunk.
jeff

Re: [RFC/RFA] [PATCH v4 01/12] Implement internal functions for efficient CRC computation

2024-09-29 Thread Jeff Law





On 9/13/24 5:05 AM, Mariam Arutunian wrote:
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster 
CRC generation.

One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

gcc/

     * doc/md.texi (crc@var{m}@var{n}4,
     crc_rev@var{m}@var{n}4): Document.
     * expr.cc (calculate_crc): New function.
     (assemble_crc_table): Likewise.
     (generate_crc_table): Likewise.
     (calculate_table_based_CRC): Likewise.
     (emit_crc): Likewise.
     (expand_crc_table_based): Likewise.
     (gen_common_operation_to_reflect): Likewise.
     (reflect_64_bit_value): Likewise.
     (reflect_32_bit_value): Likewise.
     (reflect_16_bit_value): Likewise.
     (reflect_8_bit_value): Likewise.
     (generate_reflecting_code_standard): Likewise.
     (expand_reversed_crc_table_based): Likewise.
     * expr.h (generate_reflecting_code_standard): New function declaration.
     (expand_crc_table_based): Likewise.
     (expand_reversed_crc_table_based): Likewise.
     * internal-fn.cc: (crc_direct): Define.
     (direct_crc_optab_supported_p): Likewise.
     (expand_crc_optab_fn): New function
     * internal-fn.def (CRC, CRC_REV): New internal functions.
     * optabs.def (crc_optab, crc_rev_optab): New optabs.

Looks pretty good.  Just one question/comment:



+void
+emit_crc (machine_mode crc_mode, rtx* crc, rtx* op0)
+{
+  if (GET_MODE_BITSIZE (crc_mode).to_constant () == 32
+  && GET_MODE_BITSIZE (word_mode) == 64)
+{
+  rtx a_low = gen_lowpart (crc_mode, *crc);
+  *crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
+}
+  rtx tgt = *op0;
+  if (word_mode != crc_mode)
+tgt = simplify_gen_subreg (word_mode, *op0, crc_mode, 0);
+  emit_move_insn (tgt, *crc);
+}
It seems to me that first clause ought to apply any time word mode is 
larger than crc mode rather than making it check 32/64 magic constants.



Jeff

Re: [RFC/RFA][PATCH v4 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-09-29 Thread Jeff Law





On 9/13/24 5:05 AM, Mariam Arutunian wrote:
This patch introduces new built-in functions to GCC for computing bit- 
forward and bit-reversed CRCs.

These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by 
the presence of a CRC optab),

the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating 
code for a table-based CRC calculation.


The built-ins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16, 
__builtin_rev_crc32_data32
__builtin_rev_crc64_data8, __builtin_rev_crc64_data16, 
  __builtin_rev_crc64_data32, __builtin_rev_crc64_data64,

__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16,  __builtin_crc64_data32, 
__builtin_crc64_data64


Each built-in takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.

To validate the correctness of these built-ins, this patch also includes 
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC 
computation options

that automatically adapt to the capabilities of the target hardware.

Not complete. May continue the work if these built-ins are needed.

gcc/

  * builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
  (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
           (BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
           * builtins.cc (associated_internal_fn): Handle 
BUILT_IN_CRC8_DATA8,

           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32.

           (expand_builtin_crc_table_based): New function.
           (expand_builtin): Handle BUILT_IN_CRC8_DATA8,
           BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
           BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16, 
BUILT_IN_CRC32_DATA32,
           BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16, 
BUILT_IN_CRC64_DATA32,

           BUILT_IN_CRC64_DATA64,
           BUILT_IN_REV_CRC8_DATA8,
           BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
           BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16, 
BUILT_IN_REV_CRC32_DATA32,
           BUILT_IN_REV_CRC64_DATA8, BUILT_IN_REV_CRC64_DATA16, 
BUILT_IN_REV_CRC64_DATA32,

           BUILT_IN_REV_CRC64_DATA64.
           * builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
           (BUILT_IN_CRC16_DATA8): Likewise.
           (BUILT_IN_CRC16_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA8): Likewise.
           (BUILT_IN_CRC32_DATA16): Likewise.
           (BUILT_IN_CRC32_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA8): Likewise.
           (BUILT_IN_CRC64_DATA16): Likewise.
           (BUILT_IN_CRC64_DATA32): Likewise.
           (BUILT_IN_CRC64_DATA64): Likewise.
           (BUILT_IN_REV_CRC8_DATA8): New builtin.
           (BUILT_IN_REV_CRC16_DATA8): Likewise.
           (BUILT_IN_REV_CRC16_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA8): Likewise.
           (BUILT_IN_REV_CRC32_DATA16): Likewise.
           (BUILT_IN_REV_CRC32_DATA32): Likewise.
           (BUILT_IN_REV_CRC64_DATA8): Likewise.
           (BUILT_IN_REV_CRC64_DATA16): Likewise.
           (BUILT_IN_REV_CRC64_DATA32): Likewise.
           (BUILT_IN_REV_CRC64_DATA64): Likewise.
           * builtins.h (expand_builtin_crc_table_based): New function 
declaration.

           * doc/extend.texti (__builtin_rev_crc8_data8,
  __builtin_rev_crc16_data16, __builtin_rev_crc16_data8,
           __builtin_rev_crc32_data32, __builtin_rev_crc32_data8,
           __builtin_rev_crc32_data16, __builtin_rev_crc64_data64,
           __builtin_rev_crc64_data8, __builtin_rev_crc64_data16,
           __builtin_rev_crc64_data32, __builtin_crc8_data8,
           __builtin_crc16_data16, __builtin_crc16_data8,
           __builtin_crc32_data32, __builtin_crc32_data8,
           __builtin_crc32_data16, __builtin_crc64_data64,
           __builtin_crc64_data8, __builtin_crc64_data16,
           _

[patch,gensupport] Support (symbol_ref { code }) in insn attributes.

2024-09-29 Thread Georg-Johann Lay


When md_reader is reading insn attributes from md files, it accepts
expressions like  (symbol_ref { code })  and  (symbol_ref "{ code }")
as valid expressions.  However, when generating insn-attrtab.cc, it
would print the {} block as an expression, which is not valid C++ syntax.
   The conclusion is that to date, no backend is using that syntax, and
hence we can assign a new semantic to such {} blocks:

|  The value of a (symbol_ref { code }) expression is the value returned
|  by the {} code block (when taken as the body of a C++ function).

No changes to the md scanner are required; all what's needed is to print
the code from {} blocks in such a way that it works as the body of a C++
function / lambda expression.  To that end, md_reader::fprint_c_condition()
now issues code as follows to insn-attrtab.cc:

   return [&]() -> attr_desc.cxx_type { code } ();

where the capture provides code access to local variables like "insn".
Without this patch, the issued code was basically  return { code };

The patch bootstraps + regtests with no changes.

Ok for trunk?

Johann

p.s. Some months ago I discussed the feature with Richard, and he said
that the printer should make no attempts whatsoever to detect invalid
syntax (like for example a block without return statement), and all
diagnosis should be performed by the compiler when insn-attrtab.cc is
being compiled.

--

md_reader: Support (symbol_ref { code }) in insn attributes.

When md_reader is reading insn attributes from md files, it accepts
expressions like  (symbol_ref { code })  and  (symbol_ref "{ code }")
as valid expressions.  However, when generating insn-attrtab.cc, it
would print the {} block as an expression, which is not valid C++ syntax.
   The conclusion is that to date, no backend is using that syntax, and
hence we can assign a new semantic to such {} blocks:

|  The value of a (symbol_ref { code }) expression is the value returned
|  by the {} code block (when taken as the body of a C++ function).

No changes to the md scanner are required; all what's needed is to print
the code from {} blocks in such a way that it works as the body of a C++
function / lambda expression.  To that end, md_reader::fprint_c_condition()
now issues code as follows to insn-attrtab.cc:

   return [&]() -> attr_desc.cxx_type { code } ();

where the capture provides code access to local variables like "insn".
Without this patch, the issued code was basically  return { code };

gcc/
* doc/md.texi (Attribute Expressions) : Document it.
* genattrtab.cc (write_attr_value) [SYMBOL_REF]: Pass down
attr->cxx_type to rtx_reader_ptr->fprint_c_condition().
(write_test_expr) [MATCH_TEST]: Pass down "bool" as cxx_type
to rtx_reader_ptr->fprint_c_condition().
* genconditions.cc (write_one_condition): Same.
* genrecog.cc (print_test) [rtx_test::C_TEST]: Same.
* read-md.cc (md_reader::fprint_c_condition): Add cxx_type
argument and pass it down to recursive invocations.
When cond starts with a '{', then use cond as the body of a C++
lambda expression that's evaluated in place.
(md_reader::print_c_condition): Add cxx_type argument.
* read-md.h (md_reader::fprint_c_condition)
(md_reader::print_c_condition): Add cxx_type argument.md_reader: Support (symbol_ref { code }) in insn attributes.

When md_reader is reading insn attributes from md files, it accepts
expressions like  (symbol_ref { code })  and  (symbol_ref "{ code }")
as valid expressions.  However, when generating insn-attrtab.cc, it
would print the {} block as an expression, which is not valid C++ syntax.
   The conclusion is that to date, no backend is using that syntax, and
hence we can assign a new semantic to such {} blocks:

|  The value of a (symbol_ref { code }) expression is the value returned
|  by the {} code block (when taken as the body of a C++ function).

No changes to the md scanner are required; all what's needed is to print
the code from {} blocks in such a way that it works as the body of a C++
function / lambda expression.  To that end, md_reader::fprint_c_condition()
now issues code as follows to insn-attrtab.cc:

   return [&]() -> attr_desc.cxx_type { code } ();

where the capture provides code access to local variables like "insn".
Without this patch, the issued code was basically  return { code };

gcc/
* doc/md.texi (Attribute Expressions) : Document it.
* genattrtab.cc (write_attr_value) [SYMBOL_REF]: Pass down
attr->cxx_type to rtx_reader_ptr->fprint_c_condition().
(write_test_expr) [MATCH_TEST]: Pass down "bool" as cxx_type
to rtx_reader_ptr->fprint_c_condition().
* genconditions.cc (write_one_condition): Same.
* genrecog.cc (print_test) [rtx_test::C_TEST]: Same.
* read-md.cc (md_reader::fprint_c_condition): Add cxx_type
argument and pass it down to recursive invocations.
When cond starts with a '{', then use co

Re: [RFC/RFA][PATCH v4 05/12] i386: Implement new expander for efficient CRC computation

2024-09-29 Thread Jeff Law





On 9/13/24 5:05 AM, Mariam Arutunian wrote:


This patch introduces two new expanders for the i386 backend,
dedicated to generating optimized code for CRC computations.
The new expanders are designed to leverage specific hardware 
capabilities to achieve faster CRC calculations,
particularly using the pclmulqdq or crc32 instructions when supported by 
the target architecture.


Expander 1: Bit-Forward CRC (crc4)
For targets that support both pclmulqdq instruction (TARGET_PCLMUL) and 
are 64-bit (TARGET_64BIT),
the expander will generate code that uses the pclmulqdq instruction for 
CRC computation.


Expander 2: Bit-Reversed CRC (crc_rev4)
The expander first checks if the target supports the CRC32 instruction 
set (TARGET_CRC32)

and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
it emits calls to the corresponding crc32 instruction (crc32b, crc32w, 
or crc32l depending on the data size).
If the target does not support crc32 but supports pclmulqdq, it then 
uses the pclmulqdq instruction for bit-reversed CRC computation.


Otherwise table-based CRC is generated.

   gcc/config/i386/

     * i386-protos.h (ix86_expand_crc_using_pclmul): New extern function 
declaration.

     (ix86_expand_reversed_crc_using_pclmul):  Likewise.
     * i386.cc (ix86_expand_crc_using_pclmul): New function.
     (ix86_expand_reversed_crc_using_pclmul):  Likewise.
     * i386.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
     (SWI124dup): New iterator.
     (crc4): New expander for bit-forward CRC.
     (crc_rev4): New expander for reversed CRC.

   gcc/testsuite/gcc.target/i386/

     * crc-crc32-data16.c: New test.
     * crc-crc32-data32.c: Likewise.
     * crc-crc32-data8.c: Likewise.
     * crc-1-pclmul.c: Likewise.
     * crc-10-pclmul.c: Likewise.
     * crc-12-pclmul.c: Likewise.
     * crc-13-pclmul.c: Likewise.
     * crc-14-pclmul.c: Likewise.
     * crc-17-pclmul.c: Likewise.
     * crc-18-pclmul.c: Likewise.
     * crc-21-pclmul.c: Likewise.
     * crc-22-pclmul.c: Likewise.
     * crc-23-pclmul.c: Likewise.
     * crc-4-pclmul.c: Likewise.
     * crc-5-pclmul.c: Likewise.
     * crc-6-pclmul.c: Likewise.
     * crc-7-pclmul.c: Likewise.
     * crc-8-pclmul.c: Likewise.
     * crc-9-pclmul.c: Likewise.
     * crc-CCIT-data16-pclmul.c: Likewise.
     * crc-CCIT-data8-pclmul.c: Likewise.
     * crc-coremark-16bitdata-pclmul.c: Likewise.

This is OK.

jeff

Re: [RFC/RFA] [PATCH v4 08/12] Add a new pass for naive CRC loops detection.

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:
This patch adds a new compiler pass aimed at identifying naive CRC 
implementations,
characterized by the presence of a loop calculating a CRC (polynomial 
long division).

Upon detection of a potential CRC, the pass prints an informational message.

Performs CRC optimization if optimization level is >= 2,
besides optimizations for size and if fno_gimple_crc_optimization given.

This pass is added for the detection and optimization of naive CRC 
implementations,

improving the efficiency of CRC-related computations.

This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be 
provided in subsequent patches.


   gcc/

     * Makefile.in (OBJS): Add gimple-crc-optimization.o.
     * common.opt (foptimize-crc): New option.
     * common.opt.urls: Regenerate to add foptimize-crc.
     * doc/invoke.texi (-foptimize-crc): Add documentation.
     * gimple-crc-optimization.cc: New file.
     * opts.cc (default_options_table): Add OPT_foptimize_crc.
     (enable_fdo_optimizations): Enable optimize_crc.
     * passes.def (pass_crc_optimization): Add new pass.
     * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
     * tree-pass.h (make_pass_crc_optimization): New extern function 
declaration.


Signed-off-by: Mariam Arutunian >

Mentored-by: Jeff Law mailto:j...@ventanamicro.com>>
Minor NIT, looks like you added extraenous nwline at the end of 
enable_fdo_optimizations.



+/* Checks whether the CLMUL (Carry-less Multiplication) instruction
+   is supported by evaluating specific target architecture flags.  */
+static bool
+is_CLMUL_supported ()
+{
+  #ifdef TARGET_ZBC
+if (TARGET_ZBC)
+  return true;
+  #endif
+  #ifdef TARGET_ZBKC
+ if (TARGET_ZBKC)
+   return true;
+  #endif
+  #ifdef TARGET_AES
+if (TARGET_AES)
+  return true;
+  #endif
+  #ifdef TARGET_PCLMUL
+if (TARGET_PCLMUL)
+  return true;
+  #endif
+return false;
+}

[ ... ]


+   /* If optimize_size is specified, execute the pass only if CLMUL or CRC
+  instruction is supported.  */ 
+   if (is_CLMUL_supported ())

+ return true;
+   #ifdef TARGET_CRC32 
+ if (TARGET_CRC32) 
+   return true;

+   #endif
Can't do that in a target independent file.  If you find yourself 
writing TARGET_* in a target independent file, then that's a sign we 
need to make some adjustments.


I think to do what you're trying to do here we'd really need to make crc 
and clmul first class citizens with an optab that we could query.


So my suggestion is to drop this code for now and open a bug report to 
track the desire to make optabs for clmul/crc32 and once implemented use 
the optabs-query interface to allow generation of clmul/crc32 based 
sequences when optimize_size is on.  In effect that makes this a 
follow-up item rather than a prerequisite for integration.


I saw a couple things in the early checks for a possible CRC loop that 
could be better, but it's really just meant to be a reasonable first 
pass filter, so I don't think they were worth calling out.




+  /* In CRC implementations with constant polynomial maximum 12 use_def
+ statements may occur.
+ TODO: Find a better solution.  */
+  if (use_defs.length () > 12)
+return false;
Presumably you're just trying to avoid useless work here?  It might be 
useful to indicate where the "12" comes from as a comment.



So basically OK.  You need to avoid the TARGET_* checks and given we 
don't have an interface to check for crc or clmul instruction 
availability, I suggest we defer that problem.


Jeff

Re: [RFC PATCH] More detailed diagnostics for section type conflicts

2024-09-29 Thread Florian Weimer

* Andrew Pinski:

>> +  append (flags & SECTION_CODE, "CODE");
>
> I notice you capture result and it seems like you could also capture
> flags and change this to:
> append (SECTION_CODE, "CODE");

Thanks, I've made the change locally.

Florian

Re: [RFC/RFA] [PATCH v4 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-09-29 Thread Jeff Law





On 9/13/24 5:05 AM, Mariam Arutunian wrote:

  If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
  calculation.         Otherwise, if the target is ZBKB, generates 
table-based

  CRC, but for reversing inputs and the output uses bswap and brev8
  instructions.         Add new tests to check CRC generation for ZBC, 
ZBKC and

  ZBKB targets.

   gcc/

      * expr.cc (gf2n_poly_long_div_quotient): New function.
      * expr.h (gf2n_poly_long_div_quotient):  New function declaration.
      * hwint.cc (reflect_hwi): New function.
      * hwint.h (reflect_hwi): New function declaration.

   gcc/config/riscv/

      * bitmanip.md (crc_rev4): New expander for 
reversed CRC.

      (crc4): New expander for bit-forward CRC.
      * iterators.md (SUBX1, ANYI1): New iterators.
      * riscv-protos.h (generate_reflecting_code_using_brev): New 
function declaration.

      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.cc (generate_reflecting_code_using_brev): New function.
      (expand_crc_using_clmul): Likewise.
      (expand_reversed_crc_using_clmul): Likewise.
      * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

   gcc/testsuite/gcc.target/riscv/

         * crc-1-zbc.c: New test.
     * crc-1-zbkc.c: Likewise.
         * crc-10-zbc.c: Likewise.
         * crc-10-zbkc.c: Likewise.
         * crc-12-zbc.c: Likewise.
         * crc-12-zbkc.c: Likewise.
         * crc-13-zbc.c: Likewise.
         * crc-13-zbkc.c: Likewise.
         * crc-14-zbc.c: Likewise.
         * crc-14-zbkc.c: Likewise.
         * crc-17-zbc.c: Likewise.
         * crc-17-zbkc.c: Likewise.
         * crc-18-zbc.c: Likewise.
         * crc-18-zbkc.c: Likewise.
         * crc-21-zbc.c: Likewise.
         * crc-21-zbkc.c: Likewise.
         * crc-22-rv64-zbc.c: Likewise.
         * crc-22-rv64-zbkb.c: Likewise.
         * crc-22-rv64-zbkc.c: Likewise.
         * crc-23-zbc.c: Likewise.
         * crc-23-zbkc.c: Likewise.
         * crc-4-zbc.c: Likewise.
         * crc-4-zbkb.c: Likewise.
         * crc-4-zbkc.c: Likewise.
         * crc-5-zbc.c: Likewise.
         * crc-5-zbkb.c: Likewise.
         * crc-5-zbkc.c: Likewise.
         * crc-6-zbc.c: Likewise.
         * crc-6-zbkc.c: Likewise.
         * crc-7-zbc.c: Likewise.
         * crc-7-zbkc.c: Likewise.
         * crc-8-zbc.c: Likewise.
         * crc-8-zbkb.c: Likewise.
         * crc-8-zbkc.c: Likewise.
         * crc-9-zbc.c: Likewise.
         * crc-9-zbkc.c: Likewise.
         * crc-CCIT-data16-zbc.c: Likewise.
         * crc-CCIT-data16-zbkc.c: Likewise.
         * crc-CCIT-data8-zbc.c: Likewise.
         * crc-CCIT-data8-zbkc.c: Likewise.
         * crc-coremark-16bitdata-zbc.c: Likewise.
         * crc-coremark-16bitdata-zbkc.c: Likewise.

Minor nit.  Looks like you added a spurious blank line in hwint.h.

You used camelcase for the "reflectedValue" in hwint.cc.  Just use 
"reflectedvalue".


With those fixed this is OK.

jeff

Re: [RFC/RFA] [PATCH v4 04/12] RISC-V: Add CRC built-ins tests for the target ZBC.

2024-09-29 Thread Jeff Law





On 9/13/24 5:05 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.target/riscv/

     * crc-builtin-zbc32.c: New file.
     * crc-builtin-zbc64.c: Likewise.

OK
jeff

Re: [RFC/RFA][PATCH v4 06/12] aarch64: Implement new expander for efficient CRC computation

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:

This patch introduces two new expanders for the aarch64 backend,
dedicated to generate optimized code for CRC computations.
The new expanders are designed to leverage specific hardware 
capabilities to achieve faster CRC calculations,
particularly using the crc32, crc32c and pmull instructions when 
supported by the target architecture.


Expander 1: Bit-Forward CRC (crc4)
For targets that support pmul instruction (TARGET_AES),
the expander will generate code that uses the pmull (crypto_pmulldi) 
instruction for CRC computation.


Expander 2: Bit-Reversed CRC (crc_rev4)
The expander first checks if the target supports the CRC32* instruction 
set (TARGET_CRC32)
and the polynomial in use is 0x1EDC6F41 (iSCSI) or 0x04C11DB7 (HDLC). If 
the conditions are met,
it emits calls to the corresponding crc32* instruction (depending on the 
data size and the polynomial).
If the target does not support crc32* but supports pmull, it then uses 
the pmull (crypto_pmulldi) instruction for bit-reversed CRC computation.

Otherwise table-based CRC is generated.

   gcc/config/aarch64/

     * aarch64-protos.h (aarch64_expand_crc_using_pmull): New extern 
function declaration.

     (aarch64_expand_reversed_crc_using_pmull):  Likewise.
     * aarch64.cc (aarch64_expand_crc_using_pmull): New function.
     (aarch64_expand_reversed_crc_using_pmull):  Likewise.
     * aarch64.md (crc_rev4): New expander for 
reversed CRC.

     (crc4): New expander for bit-forward CRC.
     * iterators.md (crc_data_type): New mode attribute.

   gcc/testsuite/gcc.target/aarch64/

     * crc-1-pmul.c: New test.
     * crc-10-pmul.c: Likewise.
     * crc-12-pmul.c: Likewise.
     * crc-13-pmul.c: Likewise.
     * crc-14-pmul.c: Likewise.
     * crc-17-pmul.c: Likewise.
     * crc-18-pmul.c: Likewise.
     * crc-21-pmul.c: Likewise.
     * crc-22-pmul.c: Likewise.
     * crc-23-pmul.c: Likewise.
     * crc-4-pmul.c: Likewise.
     * crc-5-pmul.c: Likewise.
     * crc-6-pmul.c: Likewise.
     * crc-7-pmul.c: Likewise.
     * crc-8-pmul.c: Likewise.
     * crc-9-pmul.c: Likewise.
     * crc-CCIT-data16-pmul.c: Likewise.
     * crc-CCIT-data8-pmul.c: Likewise.
     * crc-coremark-16bitdata-pmul.c: Likewise.
     * crc-crc32-data16.c: Likewise.
     * crc-crc32-data32.c: Likewise.
     * crc-crc32-data8.c: Likewise.
     * crc-crc32c-data16.c: Likewise.
     * crc-crc32c-data32.c: Likewise.
     * crc-crc32c-data8.c: Likewise.
I believe you and Richard S have iterated on this.  So assuming all 
prior issues Richard raised have been addressed, this is OK.

jeff

Re: [RFC/RFA] [PATCH v4 07/12] aarch64: Add CRC built-ins test for the target AES.

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.target/aarch64/

     * crc-builtin-pmul64.c: New test.

OK
jeff

[COMMITTED] testsuite: XFAIL gfortran.dg/initialization_25.f90 properly (again)

2024-09-29 Thread Sam James

dg-error needs an argument for "why" / a comment.

gcc/testsuite/ChangeLog:
PR fortran/116858

* gfortran.dg/initialization_25.f90: Fix dg-error arguments.
---
 gcc/testsuite/gfortran.dg/initialization_25.f90 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/initialization_25.f90 
b/gcc/testsuite/gfortran.dg/initialization_25.f90
index cae3bbd27d53..805814a6 100644
--- a/gcc/testsuite/gfortran.dg/initialization_25.f90
+++ b/gcc/testsuite/gfortran.dg/initialization_25.f90
@@ -8,5 +8,5 @@
 ! XFAIL is for PR35779
 
 !   INTEGER :: J1
-!   INTEGER,PARAMETER :: I2(10) = (/(J1,J1=its_bad,1,-1)/) ! { dg-error "does 
not reduce" { xfail *-*-* } }
+!   INTEGER,PARAMETER :: I2(10) = (/(J1,J1=its_bad,1,-1)/) ! { dg-error "does 
not reduce" "PR35779" { xfail *-*-* } }
 END
-- 
2.46.2

Re: [RFC/RFA] [PATCH v4 09/12] Add symbolic execution support.

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:

Gives an opportunity to execute the code on bit level,
assigning symbolic values to the variables which don't have initial values.
Supports only CRC specific operations.

Example:

uint8_t crc;
uint8_t pol = 1;
crc = crc ^ pol;

during symbolic execution crc's value will be:
crc(8), crc(7), ... crc(1), crc(0) ^ 1

   gcc/

     * Makefile.in (OBJS): Add sym-exec/sym-exec-expression.o,
     sym-exec/sym-exec-state.o, sym-exec/sym-exec-condition.o.
     * configure (sym-exec): New subdir.

   gcc/sym-exec/

     * sym-exec-condition.cc: New file.
     * sym-exec-condition.h: New file.
     * sym-exec-expression-is-a-helper.h: New file.
     * sym-exec-expression.cc: New file.
     * sym-exec-expression.h: New file.
     * sym-exec-state.cc: New file.
     * sym-exec-state.h: New file.

Patch not attached.

jeff

[patch,avr] Use (symbol_ref { code }) in insn length computation.

2024-09-29 Thread Georg-Johann Lay


This patch uses the new gensupport feature (review pending) that
allows to provide a block of C++ code in a symbol_ref sub-expression
instead of just a C++ expression:

https://gcc.gnu.org/pipermail/gcc-patches/2024-September/664093.html

Ok for trunk (provided the gensupport part is upstream) ?

Johann


--

AVR: Use (symbol_ref { code }) in attribute "length" computation.

The symbol_ref sub-expression of an insn attribute may host the
body of a C++ function provided (symbol_ref { code }) is used.

gcc/
* config/avr/avr-protos.h (avr_len_op8_set_ZN): Remove.
* config/avr/avr.cc (avr_len_op8_set_ZN): Remove.
* config/avr/avr.md (*op8.for.cczn.): Compute attribute
"length" with the help of a symbol_ref that hosts the body
of a C++ function.
diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 96708eb4db5..71514a37812 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -64,7 +64,6 @@ extern const char *avr_out_extr_not (rtx_insn *, rtx*, int*);
 extern const char *avr_out_plus_set_ZN (rtx*, int*);
 extern const char *avr_out_plus_set_N (rtx*, int*);
 extern const char *avr_out_op8_set_ZN (rtx_code, rtx*, int*);
-extern int avr_len_op8_set_ZN (rtx_code, rtx*);
 extern bool avr_op8_ZN_operator (rtx);
 extern const char *avr_out_cmp_ext (rtx*, rtx_code, int*);
 
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c0bf1320fdd..bbc9b278511 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -8958,18 +8958,6 @@ avr_out_op8_set_ZN (rtx_code code, rtx *xop, int *plen)
 }
 
 
-/* Used in the "length" attribute of insn "*op8.for.cczn.".  */
-
-int
-avr_len_op8_set_ZN (rtx_code code, rtx *xop)
-{
-  int len;
-  (void) avr_out_op8_set_ZN (code, xop, &len);
-
-  return len;
-}
-
-
 /* Output bit operation (IOR, AND, XOR) with register XOP[0] and compile
time constant XOP[2]:
 
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index aae8a696a63..84023866b30 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -7337,7 +7337,12 @@ (define_insn "*op8.for.cczn."
 return avr_out_op8_set_ZN (, operands, nullptr);
   }
   [(set (attr "length")
-(symbol_ref "avr_len_op8_set_ZN (, operands)"))])
+(symbol_ref
+ {
+   int len;
+   avr_out_op8_set_ZN (, operands, &len);
+   return len;
+ }))])
 
 
 ;; Test a single bit in a QI/HI/SImode register.

Re: [RFC/RFA] [PATCH v4 10/12] Verify detected CRC loop with symbolic execution and LFSR matching

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:
Symbolically execute potential CRC loops and check whether the loop 
actually calculates CRC (uses LFSR matching).
Calculated CRC and created LFSR are compared on each iteration of the 
potential CRC loop.


   gcc/

     * Makefile.in (OBJS): Add crc-verification.o.
     * crc-verification.cc: New file.
     * crc-verification.h: New file.
     * gimple-crc-optimization.cc (loop_calculates_crc): New function.
     (is_output_crc): Likewise.
     (swap_crc_and_data_if_needed): Likewise.
     (validate_crc_and_data): Likewise.
     (optimize_crc_loop): Likewise.
     (table_based_may_be_generated): Likewise.
     (get_output_phi): Likewise.
     (execute): Add check whether potential CRC loop calculates CRC.

   gcc/sym-exec/

     * sym-exec-state.cc (create_reversed_lfsr): New function.
     (create_forward_lfsr): Likewise.
     (last_set_bit): Likewise.
     (create_lfsr): Likewise.
     * sym-exec-state.h (is_bit_vector): Reorder, make the function 
public and static.

     (create_reversed_lfsr): New static function declaration.
     (create_forward_lfsr): New static function declaration.

Signed-off-by: Mariam Arutunian >

Mentored-by: Jeff Law mailto:j...@ventanamicro.com>>
There's a bit of TARGET_XXX stuff in here too that needs to be removed 
as I noted in the review of a prior patch in this series:



+/* Returns true if table-based CRC may be generated,
+   otherwise, return false.  */
+   
+bool 
+crc_optimization::table_based_may_be_generated ()
+{
+  #ifdef TARGET_CRC32

+if (TARGET_CRC32
+   && (m_polynomial == 0x1EDC6F41 || m_polynomial == 0x04C11DB7))
+  return false;
+  #endif
+/* If CLMUL-based CRC can be generated, return false.  */
+if (is_CLMUL_supported ()
+   && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (m_crc_arg))).to_constant ()
+   < GET_MODE_SIZE (word_mode)))
+  return false;
+  return true;
+} 


Removing this probably will result in minor simplifications elsewhere in 
this file.



This is a bit puzzling:


+bool
+assign_calc_vals_to_header_phis (const vec &prev_states,
+state *curr_state,
+basic_block bb)
+{
+  for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  gphi *phi = gsi.phi ();
+  tree lhs = gimple_phi_result (phi);
+
+  /* Don't consider virtual operands.  */
+  if (virtual_operand_p (lhs))
+   continue;
+
+  if (TREE_CODE (PHI_ARG_DEF (phi, phi->nargs - 1)) == INTEGER_CST)


The order of elements in the PHI is undefined, it looks like this 
assumes the constant is always last.  So it probably needs to be 
generalized a bit.


Similarly in assign_known_vals_to_header_phis.

Mostly looks OK.  I suspect with the issues noted above fixed that this 
will be good to go.



Jeff

Re: [RFC/RFA][PATCH v4 12/12] Add tests for CRC detection and generation.

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:

   gcc/testsuite/gcc.dg/

     * crc-from-fedora-packages-(1-32).c New tests.
     * crc-linux-(1-5).c: Likewise.
     * crc-not-crc-(1-26).c: Likewise.
     * crc-side-instr-(1-17).c: Likewise.

   gcc/testsuite/gcc.dg/torture/

     * crc-(1-29).c: New tests.
     * crc-CCIT-data16-xorOutside_InsideFor.c: Likewise.
     * crc-CCIT-data16.c: Likewise.
     * crc-CCIT-data8.c: Likewise.
     * crc-coremark16-data16.c: Likewise.
     * crc-coremark32-data16.c: Likewise.
     * crc-coremark32-data32.c: Likewise.
     * crc-coremark32-data8.c: Likewise.
     * crc-coremark64-data64.c: Likewise.
     * crc-coremark8-data8.c: Likewise.
     * crc-crc32-data16.c: Likewise.
     * crc-crc32-data24.c: Likewise.
     * crc-crc32-data8.c: Likewise.
     * crc-crc32.c: Likewise.
     * crc-crc64-data32.c: Likewise.
     * crc-crc64-data64.c: Likewise.
     * crc-crc8-data8-loop-xorInFor.c: Likewise.
     * crc-crc8-data8-loop-xorOutsideFor.c: Likewise.
     * crc-crc8-data8-xorOustideFor.c: Likewise.
     * crc-crc8.c: Likewise.

OK.

So I think that's the whole kit.

Can you resend the missing patch (which I think was the bulk of sym-exec 
code?


I think we're just about ready for integration.

Jeff

Re: [RFC/RFA] [PATCH v4 11/12] Replace the original CRC loops with a faster CRC calculation.

2024-09-29 Thread Jeff Law





On 9/13/24 5:06 AM, Mariam Arutunian wrote:

After the loop exit an internal function call (CRC, CRC_REV) is added,
and its result is assigned to the output CRC variable (the variable 
where the calculated CRC is stored

after the loop execution).
The removal of the loop is left to CFG cleanup and DCE.

   gcc/

     * gimple-crc-optimization.cc (optimize_crc_loop): New function.
     (execute): Add optimize_crc_loop function call.
I believe Richi had a few comments on this.  Once those are addressed, 
this is OK.


jeff

Re: [PATCH] cselib: Discard useless locs of preserved VALUEs [PR116627]

2024-09-29 Thread Jeff Law





On 9/11/24 3:26 PM, Jakub Jelinek wrote:

Hi!

remove_useless_values iteratively discards useless locs (locs of
cselib_val which refer to non-preserved VALUEs with no locations),
which in turn can make further values useless until no further VALUEs
are made useless and then discards the useless VALUEs.

Preserved VALUEs (something done during var-tracking only I think)
live in a different hash table, cselib_preserved_hash_table rather
than cselib_hash_table.  cselib_find_slot first looks up slot in
cselib_preserved_hash_table and only if not found looks it up in
cselib_hash_table (and INSERTs only into the latter), whereas preservation
of a VALUE results in move of a cselib_val from the latter to the former
hash table.

The testcase in the PR (apparently too fragile, it only reproduces on 14
branch with various flags on a single arch, not on trunk) ICEs, because
we have a preserved VALUE (QImode with (const_int 0) as one of the locs).
In a different BB SImode r2 is looked up, a non-preserved VALUE is created
for it, and the r13-2916 added code attempts to lookup also SUBREGs of that
in narrower modes, among those QImode, so adds to that SImode r2
non-preserve VALUE a new loc of (subreg:QI (value:SI) 0).  That SImode
value is considered useless, so remove_useless_value discards it, but
nothing discarded it from the preserved VALUE's loc_list, so when looking
something up in the hash table we ICE trying to derevence CSELIB_VAL
of the discarded VALUE.

I think we need to discuard useless locs even from the preserved VALUEs.
That IMHO shouldn't create any further useless VALUEs, the preserved
VALUEs are never useless, so we don't need to iterate with it, can do it
just once, but IMHO it needs to be done because actually
discard_useless_values.

The following patch does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-11  Jakub Jelinek  

PR target/116627
* cselib.cc (remove_useless_values): Discard useless locs
even from preserved cselib_vals in cselib_preserved_hash_table
hash table.

OK
jeff

Re: [PATCH] RISC-V: Implement TARGET_CAN_INLINE_P

2024-09-29 Thread Jeff Law





On 9/9/24 6:11 AM, Yangyu Chen wrote:

Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_can_inline_p): New function.
 (TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
I'd kind of hoped not to mess with this as I suspect keeping this 
up-to-date is going to end up being somewhat painful and I doubt we're 
going to see that many cases where folks are using target attributes and 
where inlining is critical for performance.


What's really driving your desire to change this?  Is this showing up in 
real code as a performance issue?








+
+  if (caller_opts->x_riscv_tune_string
+  && callee_opts->x_riscv_tune_string
+  && strcmp (caller_opts->x_riscv_tune_string,
+callee_opts->x_riscv_tune_string) != 0)
+return false;
Tuning shouldn't affect correctness of inlining.  I'd just drop this 
clause if this patch goes forward.





Jeff

51 matches

Mail list logo