[PATCH] mh-mingw: drop unused BOOT_CXXFLAGS variable

2023-07-21 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

gcc's build system has BOOT_CFLAGS and various STAGE_C{,XX}FLAGS
variables. BOOT_CXXFLAGS is not handled anywhere.

config/

* mh-mingw: Drop assignment of unused BOOT_CXXFLAGS variable.
---
 config/mh-mingw | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/mh-mingw b/config/mh-mingw
index e91367a7112..f5fb064813f 100644
--- a/config/mh-mingw
+++ b/config/mh-mingw
@@ -1,7 +1,6 @@
 # Add -D__USE_MINGW_ACCESS to enable the built compiler to work on Windows
 # Vista (see PR33281 for details).
 BOOT_CFLAGS += -D__USE_MINGW_ACCESS -Wno-pedantic-ms-format
-BOOT_CXXFLAGS += -D__USE_MINGW_ACCESS -Wno-pedantic-ms-format
 CFLAGS += -D__USE_MINGW_ACCESS
 CXXFLAGS += -D__USE_MINGW_ACCESS
 STAGE1_CXXFLAGS += -D__USE_MINGW_ACCESS
-- 
2.41.0



[PATCH] Fix a typo

2023-07-21 Thread Haochen Jiang via Gcc-patches
Hi all,

This patch fix a typo which will not cause any behavior difference.

Commited as obvious change.

Thx,
Haochen

gcc/ChangeLog:

* config/i386/i386.opt: Fix a typo.
---
 gcc/config/i386/i386.opt | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index db9956885e2..1cc8563477a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1289,11 +1289,6 @@ Target Mask(ISA2_SM3) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
 SM3 built-in functions and code generation.
 
-mvpinsrvpextr
-Target Mask(ISA2_VPINSRVPEXTR) Var(ix86_isa_flags2) Save
-Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F,
-AVX512VL and VPINSRVPEXTR built-in functions and code generation.
-
 msha512
 Target Mask(ISA2_SHA512) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and
-- 
2.31.1



Improve loop dumping

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
we have flow_loop_dump and print_loop. While print_loop was extended to dump
stuff from loop structure we added over years (loop info), flow_loop_dump was 
not.
-fdump-tree-all files contains flow_loop_dump which makes it hard to see what
metadata we have attached to loop.

This patch unifies dumping of these fields from both functions.  For example 
for:
int a[100];
main()
{
for (int i = 0;  i < 10; i++)
a[i]=i;
}
we now print:
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5
;;
;; Loop 1
;;  header 4, latch 3
;;  depth 1, outer 0, finite_p
;;  upper_bound 10
;;  likely_upper_bound 10
;;  estimate 10
;;  iterations by profile: 10.001101 (unreliable)

finite_p, upper_bound, likely_upper_bound estimate and iterations by profile is 
new.

Bootstrap/regtest on x86_64 in progress. OK if it passes?

Honza

gcc/ChangeLog:

* cfgloop.cc (flow_loop_dump): Use print_loop_info.
* cfgloop.h (print_loop_info): Declare.
* tree-cfg.cc (print_loop_info): Break out from ...; add
printing of missing fields and profile
(print_loop): ... here.

diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 020e5734d95..9ca85e648a7 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -135,17 +135,12 @@ flow_loop_dump (const class loop *loop, FILE *file,
   fprintf (file, "\n");
 }
 
-  fprintf (file, ";;  depth %d, outer %ld\n",
+  fprintf (file, ";;  depth %d, outer %ld",
   loop_depth (loop), (long) (loop_outer (loop)
  ? loop_outer (loop)->num : -1));
+  print_loop_info (file, loop, ";;  ");
 
-  bool reliable;
-  sreal iterations;
-  if (loop->num && expected_loop_iterations_by_profile (loop, &iterations, 
&reliable))
-fprintf (file, ";;  profile-based iteration count: %f %s\n",
-iterations.to_double (), reliable ? "(reliable)" : "(unreliable)");
-
-  fprintf (file, ";;  nodes:");
+  fprintf (file, "\n;;  nodes:");
   bbs = get_loop_body (loop);
   for (i = 0; i < loop->num_nodes; i++)
 fprintf (file, " %d", bbs[i]->index);
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 4d2fd4b6af5..269694c7962 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -411,6 +411,7 @@ extern unsigned expected_loop_iterations (class loop *);
 extern rtx doloop_condition_get (rtx_insn *);
 
 void mark_loop_for_removal (loop_p);
+void print_loop_info (FILE *file, const class loop *loop, const char *);
 
 /* Induction variable analysis.  */
 
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 7ccc2a5a5a7..a6c97a04662 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -8479,6 +8479,55 @@ print_loops_bb (FILE *file, basic_block bb, int indent, 
int verbosity)
 }
 }
 
+/* Print loop information.  */
+
+void
+print_loop_info (FILE *file, const class loop *loop, const char *prefix)
+{
+  if (loop->can_be_parallel)
+fprintf (file, ", can_be_parallel");
+  if (loop->warned_aggressive_loop_optimizations)
+fprintf (file, ", warned_aggressive_loop_optimizations");
+  if (loop->dont_vectorize)
+fprintf (file, ", dont_vectorize");
+  if (loop->force_vectorize)
+fprintf (file, ", force_vectorize");
+  if (loop->in_oacc_kernels_region)
+fprintf (file, ", in_oacc_kernels_region");
+  if (loop->finite_p)
+fprintf (file, ", finite_p");
+  if (loop->unroll)
+fprintf (file, "\n%sunroll %d", prefix, loop->unroll);
+  if (loop->nb_iterations)
+{
+  fprintf (file, "\n%sniter ", prefix);
+  print_generic_expr (file, loop->nb_iterations);
+}
+
+  if (loop->any_upper_bound)
+{
+  fprintf (file, "\n%supper_bound ", prefix);
+  print_decu (loop->nb_iterations_upper_bound, file);
+}
+  if (loop->any_likely_upper_bound)
+{
+  fprintf (file, "\n%slikely_upper_bound ", prefix);
+  print_decu (loop->nb_iterations_likely_upper_bound, file);
+}
+
+  if (loop->any_estimate)
+{
+  fprintf (file, "\n%sestimate ", prefix);
+  print_decu (loop->nb_iterations_estimate, file);
+}
+  bool reliable;
+  sreal iterations;
+  if (loop->num && expected_loop_iterations_by_profile (loop, &iterations, 
&reliable))
+fprintf (file, "\n%siterations by profile: %f %s", prefix,
+iterations.to_double (), reliable ? "(reliable)" : "(unreliable)");
+
+}
+
 static void print_loop_and_siblings (FILE *, class loop *, int, int);
 
 /* Pretty print LOOP on FILE, indented INDENT spaces.  Following
@@ -8511,27 +8560,7 @@ print_loop (FILE *file, class loop *loop, int indent, 
int verbosity)
 fprintf (file, ", latch = %d", loop->latch->index);
   else
 fprintf (file, ", multiple latches");
-  fprintf (file, ", niter = ");
-  print_generic_expr (file, loop->nb_iterations);
-
-  if (loop->any_upper_bound)
-{
-  fprintf (file, ", upper_bound = ");
-  print_decu (loop->nb_iterations_upper_bound, file);
-}
-  if (loop->any_likely_upper_bound)
-{
-  fprintf (file, ", likely_upper_bound = ");
-  print_decu (loop->nb_it

Re: [PATCH] mh-mingw: drop unused BOOT_CXXFLAGS variable

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 9:48 AM Sergei Trofimovich via Gcc-patches
 wrote:
>
> From: Sergei Trofimovich 
>
> gcc's build system has BOOT_CFLAGS and various STAGE_C{,XX}FLAGS
> variables. BOOT_CXXFLAGS is not handled anywhere.

OK.

> config/
>
> * mh-mingw: Drop assignment of unused BOOT_CXXFLAGS variable.
> ---
>  config/mh-mingw | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/config/mh-mingw b/config/mh-mingw
> index e91367a7112..f5fb064813f 100644
> --- a/config/mh-mingw
> +++ b/config/mh-mingw
> @@ -1,7 +1,6 @@
>  # Add -D__USE_MINGW_ACCESS to enable the built compiler to work on Windows
>  # Vista (see PR33281 for details).
>  BOOT_CFLAGS += -D__USE_MINGW_ACCESS -Wno-pedantic-ms-format
> -BOOT_CXXFLAGS += -D__USE_MINGW_ACCESS -Wno-pedantic-ms-format
>  CFLAGS += -D__USE_MINGW_ACCESS
>  CXXFLAGS += -D__USE_MINGW_ACCESS
>  STAGE1_CXXFLAGS += -D__USE_MINGW_ACCESS
> --
> 2.41.0
>


Re: Improve loop dumping

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 9:57 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> we have flow_loop_dump and print_loop. While print_loop was extended to dump
> stuff from loop structure we added over years (loop info), flow_loop_dump was 
> not.
> -fdump-tree-all files contains flow_loop_dump which makes it hard to see what
> metadata we have attached to loop.
>
> This patch unifies dumping of these fields from both functions.  For example 
> for:
> int a[100];
> main()
> {
> for (int i = 0;  i < 10; i++)
> a[i]=i;
> }
> we now print:
> ;; Loop 0
> ;;  header 0, latch 1
> ;;  depth 0, outer -1
> ;;  nodes: 0 1 2 3 4 5
> ;;
> ;; Loop 1
> ;;  header 4, latch 3
> ;;  depth 1, outer 0, finite_p
> ;;  upper_bound 10
> ;;  likely_upper_bound 10
> ;;  estimate 10
> ;;  iterations by profile: 10.001101 (unreliable)
>
> finite_p, upper_bound, likely_upper_bound estimate and iterations by profile 
> is new.
>
> Bootstrap/regtest on x86_64 in progress. OK if it passes?

OK.

> Honza
>
> gcc/ChangeLog:
>
> * cfgloop.cc (flow_loop_dump): Use print_loop_info.
> * cfgloop.h (print_loop_info): Declare.
> * tree-cfg.cc (print_loop_info): Break out from ...; add
> printing of missing fields and profile
> (print_loop): ... here.
>
> diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
> index 020e5734d95..9ca85e648a7 100644
> --- a/gcc/cfgloop.cc
> +++ b/gcc/cfgloop.cc
> @@ -135,17 +135,12 @@ flow_loop_dump (const class loop *loop, FILE *file,
>fprintf (file, "\n");
>  }
>
> -  fprintf (file, ";;  depth %d, outer %ld\n",
> +  fprintf (file, ";;  depth %d, outer %ld",
>loop_depth (loop), (long) (loop_outer (loop)
>   ? loop_outer (loop)->num : -1));
> +  print_loop_info (file, loop, ";;  ");
>
> -  bool reliable;
> -  sreal iterations;
> -  if (loop->num && expected_loop_iterations_by_profile (loop, &iterations, 
> &reliable))
> -fprintf (file, ";;  profile-based iteration count: %f %s\n",
> -iterations.to_double (), reliable ? "(reliable)" : 
> "(unreliable)");
> -
> -  fprintf (file, ";;  nodes:");
> +  fprintf (file, "\n;;  nodes:");
>bbs = get_loop_body (loop);
>for (i = 0; i < loop->num_nodes; i++)
>  fprintf (file, " %d", bbs[i]->index);
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 4d2fd4b6af5..269694c7962 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -411,6 +411,7 @@ extern unsigned expected_loop_iterations (class loop *);
>  extern rtx doloop_condition_get (rtx_insn *);
>
>  void mark_loop_for_removal (loop_p);
> +void print_loop_info (FILE *file, const class loop *loop, const char *);
>
>  /* Induction variable analysis.  */
>
> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 7ccc2a5a5a7..a6c97a04662 100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -8479,6 +8479,55 @@ print_loops_bb (FILE *file, basic_block bb, int 
> indent, int verbosity)
>  }
>  }
>
> +/* Print loop information.  */
> +
> +void
> +print_loop_info (FILE *file, const class loop *loop, const char *prefix)
> +{
> +  if (loop->can_be_parallel)
> +fprintf (file, ", can_be_parallel");
> +  if (loop->warned_aggressive_loop_optimizations)
> +fprintf (file, ", warned_aggressive_loop_optimizations");
> +  if (loop->dont_vectorize)
> +fprintf (file, ", dont_vectorize");
> +  if (loop->force_vectorize)
> +fprintf (file, ", force_vectorize");
> +  if (loop->in_oacc_kernels_region)
> +fprintf (file, ", in_oacc_kernels_region");
> +  if (loop->finite_p)
> +fprintf (file, ", finite_p");
> +  if (loop->unroll)
> +fprintf (file, "\n%sunroll %d", prefix, loop->unroll);
> +  if (loop->nb_iterations)
> +{
> +  fprintf (file, "\n%sniter ", prefix);
> +  print_generic_expr (file, loop->nb_iterations);
> +}
> +
> +  if (loop->any_upper_bound)
> +{
> +  fprintf (file, "\n%supper_bound ", prefix);
> +  print_decu (loop->nb_iterations_upper_bound, file);
> +}
> +  if (loop->any_likely_upper_bound)
> +{
> +  fprintf (file, "\n%slikely_upper_bound ", prefix);
> +  print_decu (loop->nb_iterations_likely_upper_bound, file);
> +}
> +
> +  if (loop->any_estimate)
> +{
> +  fprintf (file, "\n%sestimate ", prefix);
> +  print_decu (loop->nb_iterations_estimate, file);
> +}
> +  bool reliable;
> +  sreal iterations;
> +  if (loop->num && expected_loop_iterations_by_profile (loop, &iterations, 
> &reliable))
> +fprintf (file, "\n%siterations by profile: %f %s", prefix,
> +iterations.to_double (), reliable ? "(reliable)" : 
> "(unreliable)");
> +
> +}
> +
>  static void print_loop_and_siblings (FILE *, class loop *, int, int);
>
>  /* Pretty print LOOP on FILE, indented INDENT spaces.  Following
> @@ -8511,27 +8560,7 @@ print_loop (FILE *file, class loop *loop, int indent, 
> int verbosity)
>  fprintf (file, ", latch = %d", loop->latch->index);
>else
>  fprintf (file, ", multiple latches");

Re: [PATCH] cleanup: Change condition order

2023-07-21 Thread Lehua Ding
Commited, thanks Richard.


Bootstrap and regression passed.




-- Original --
From:   
 "Richard Biener"   
 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625067.html
> 
> len mask stuff should be checked before mask.
> 
> So I reorder all condition order to check LEN MASK stuff before MASK.
> 
> This is the last clean up patch.
> 
> Boostrap and Regression is on the way.

OK.

> gcc/ChangeLog:
> 
>* tree-vect-stmts.cc (check_load_store_for_partial_vectors): Change 
condition order.
>(vectorizable_operation): Ditto.
> 
> ---
>  gcc/tree-vect-stmts.cc | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index d5b4f020332..2fe856db9ab 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1635,17 +1635,17 @@ check_load_store_for_partial_vectors 
(loop_vec_info loop_vinfo, tree vectype,
>    internal_fn len_ifn = (is_load
>    ? 
IFN_MASK_LEN_GATHER_LOAD
>    : 
IFN_MASK_LEN_SCATTER_STORE);
> -  if (internal_gather_scatter_fn_supported_p 
(ifn, vectype,
> +  if (internal_gather_scatter_fn_supported_p 
(len_ifn, vectype,
>     
gs_info->memory_type,
>     
gs_info->offset_vectype,
>     
gs_info->scale))
> -  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> -     scalar_mask);
> -  else if 
(internal_gather_scatter_fn_supported_p (len_ifn, vectype,
> +  vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> +  else if 
(internal_gather_scatter_fn_supported_p (ifn, vectype,
>   
   gs_info->memory_type,
>   
   gs_info->offset_vectype,
>   
   gs_info->scale))
> -  vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, 1);
> +  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype,
> +     scalar_mask);
>    else
>   {
>     if (dump_enabled_p ())
> @@ -6596,16 +6596,16 @@ vectorizable_operation (vec_info *vinfo,
>     && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P 
(loop_vinfo)
>     && mask_out_inactive)
>   {
> -    if (cond_fn != IFN_LAST
> -    && 
direct_internal_fn_supported_p (cond_fn, vectype,
> +    if (cond_len_fn != IFN_LAST
> +    && 
direct_internal_fn_supported_p (cond_len_fn, vectype,
>    OPTIMIZE_FOR_SPEED))
> -      vect_record_loop_mask (loop_vinfo, masks, ncopies * 
vec_num,
> -     vectype, NULL);
> -    else if (cond_len_fn != IFN_LAST
> -     && direct_internal_fn_supported_p 
(cond_len_fn, vectype,
> -    
OPTIMIZE_FOR_SPEED))
>       vect_record_loop_len (loop_vinfo, lens, 
ncopies * vec_num, vectype,
>     1);
> +    else if (cond_fn != IFN_LAST
> +     && direct_internal_fn_supported_p 
(cond_fn, vectype,
> +    
OPTIMIZE_FOR_SPEED))
> +      vect_record_loop_mask (loop_vinfo, masks, ncopies * 
vec_num,
> +     vectype, NULL);
>     else
>       {
>     if (dump_enabled_p ())
> 

-- 
Richard Biener 

Re: [PATCH] cleanup: make all cond_len_* and mask_len_* consistent on the order of mask and len

2023-07-21 Thread Lehua Ding
Commited, thanks Richard.


Bootstrap and regression passed.





-- Original --
From:   
 "Richard Biener"   
 
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625121.html
> 
> Hi, Richard and Richi.
> 
> This patch is to align the order of mask and len.
> 
> Currently, According to this piece code:
>if (final_len && final_mask)
>  call = gimple_build_call_internal (
>    IFN_LEN_MASK_GATHER_LOAD, 7, 
dataref_ptr,
>    vec_offset, scale, zero, final_mask, 
final_len,
>    bias);
> 
> You can see the order of mask and len, is {mask,len,bias}.
> "mask" comes before "len". The reason of this order is that we want to
> reuse the current codes of MASK_GATHER_LOAD/MASK_SCATTER_STORE.
> 
> Same situation for COND_LEN_*, we want to reuse the codes of COND_*.
> 
> Reusing codes from the existing MASK_* or COND_* can allow us not to
> change the codes too much and make the codes elegant and easy to maintain 
&& read.
> 
> To avoid any confusions of auto-vectorization patterns that includes both 
mask and len,
> 
> this patch align the order of mask and len for both Gimple IR and RTL 
pattern into
> 
> {mask, len, bias} to make everything cleaner and more elegant.
> 
> Bootstrap and Regression is on the way.

OK.

> gcc/ChangeLog:
> 
>* config/riscv/autovec.md: Align order of mask and len.
>* config/riscv/riscv-v.cc (expand_load_store): Ditto.
>(expand_gather_scatter): Ditto.
>* doc/md.texi: Ditto.
>* internal-fn.cc (add_len_and_mask_args): Ditto.
>(add_mask_and_len_args): Ditto.
>(expand_partial_load_optab_fn): Ditto.
>(expand_partial_store_optab_fn): Ditto.
>(expand_scatter_store_optab_fn): Ditto.
>(expand_gather_load_optab_fn): Ditto.
>(internal_fn_len_index): Ditto.
>(internal_fn_mask_index): Ditto.
>(internal_len_load_store_bias): Ditto.
>* tree-vect-stmts.cc (vectorizable_store): Ditto.
>(vectorizable_load): Ditto.
> 
> ---
>  gcc/config/riscv/autovec.md | 96 
++---
>  gcc/config/riscv/riscv-v.cc | 12 ++---
>  
gcc/doc/md.texi
 | 36 +++---
>  
gcc/internal-fn.cc  | 50 
+--
>  gcc/tree-vect-stmts.cc  |  8 ++--
>  5 files changed, 101 insertions(+), 101 deletions(-)
> 
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 7eb96d42c18..d899922586a 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -25,9 +25,9 @@
>  (define_expand "mask_len_load

Re: [PATCH v4] Introduce attribute sym

2023-07-21 Thread Alexandre Oliva via Gcc-patches
On Jul 20, 2023, Richard Biener  wrote:

> I wonder if we could have shared some of the cgraph/varasm bits
> with the symver attribute handling?  It's just a new 'sym' but
> without the version part?

Possibly.  process_common_attributes could be a good place to create the
alias decl, like symver does.  But that wouldn't cover clones of C++
ctors and dtors, that get variants of the named sym, nor sym attributes
attached to C++ classes.  Aside from these special cases, it is an alias
declaration, without much else to do, which is not very much unlike
symver, but the named sym alias needs to be introduced in the symtab
early enough that other (non-sym) alias declarations can refer to it,
which symver doesn't need to worry about.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[committed] RISC-V: Fix redundant variable declaration.

2023-07-21 Thread Juzhe-Zhong
Notice there is mistakes for RISC-V I made in the last patch.
Fix it. Sorry about that.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_gather_scatter): Remove redundant 
variables.

---
 gcc/config/riscv/riscv-v.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index eb92451948e..31575428ecb 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3212,7 +3212,7 @@ prepare_gather_scatter (machine_mode vec_mode, 
machine_mode idx_mode,
 void
 expand_gather_scatter (rtx *ops, bool is_load)
 {
-  rtx ptr, vec_offset, vec_reg, len, mask;
+  rtx ptr, vec_offset, vec_reg;
   bool zero_extend_p;
   int scale_log2;
   rtx mask = ops[5];
-- 
2.36.1



Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Wed, Jul 12 2023, Lehua Ding wrote:
> Hi,
>
> This tiny patch add --append option to mklog.py that support add generated
> ChangeLog to the corresponding patch file. With this option there is no need
> to manually copy the generated ChangeLog to the patch file. e.g.:
>
> Run `mklog.py -a /path/to/this/patch` will add the generated ChangeLog
>
> ```
> contrib/ChangeLog:
>
>   * mklog.py:
> ```

this patch caused flake8 to complain about contrib/mklog.py:

$ flake8 contrib/mklog.py
contrib/mklog.py:377:80: E501 line too long (85 > 79 characters)
contrib/mklog.py:388:26: E127 continuation line over-indented for visual indent
contrib/mklog.py:388:36: W605 invalid escape sequence '\s'
contrib/mklog.py:388:40: W605 invalid escape sequence '\s'
contrib/mklog.py:388:44: W605 invalid escape sequence '\s'
contrib/mklog.py:388:47: W605 invalid escape sequence '\|'
contrib/mklog.py:388:49: W605 invalid escape sequence '\s'
contrib/mklog.py:388:51: W605 invalid escape sequence '\d'
contrib/mklog.py:388:54: W605 invalid escape sequence '\s'
contrib/mklog.py:388:58: W605 invalid escape sequence '\-'

Can you please have a look and ideally fix the issues?

Thanks,

Martin


>
> to the right place of the /path/to/this/patch file.
>
> Best,
> Lehua
>
> contrib/ChangeLog:
>
>   * mklog.py: Add --append option.
>
> ---
>  contrib/mklog.py | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/contrib/mklog.py b/contrib/mklog.py
> index 777212c98d7..26230b9b4f2 100755
> --- a/contrib/mklog.py
> +++ b/contrib/mklog.py
> @@ -358,6 +358,8 @@ if __name__ == '__main__':
>   'file')
>  parser.add_argument('--update-copyright', action='store_true',
>  help='Update copyright in ChangeLog files')
> +parser.add_argument('-a', '--append', action='store_true',
> +help='Append the generate ChangeLog to the patch 
> file')
>  args = parser.parse_args()
>  if args.input == '-':
>  args.input = None
> @@ -370,7 +372,30 @@ if __name__ == '__main__':
>  else:
>  output = generate_changelog(data, args.no_functions,
>  args.fill_up_bug_titles, args.pr_numbers)
> -if args.changelog:
> +if args.append:
> +if (not args.input):
> +raise Exception("`-a or --append` option not support 
> standard input")
> +lines = []
> +with open(args.input, 'r', newline='\n') as f:
> +# 1 -> not find the possible start of diff log
> +# 2 -> find the possible start of diff log
> +# 3 -> finish add ChangeLog to the patch file
> +maybe_diff_log = 1
> +for line in f:
> +if maybe_diff_log == 1 and line == "---\n":
> +maybe_diff_log = 2
> +elif maybe_diff_log == 2 and \
> + re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
> +lines += [output, "---\n", line]
> +maybe_diff_log = 3
> +else:
> +# the possible start is not the true start.
> +if maybe_diff_log == 2:
> +maybe_diff_log = 1
> +lines.append(line)
> +with open(args.input, "w") as f:
> +f.writelines(lines)
> +elif args.changelog:
>  lines = open(args.changelog).read().split('\n')
>  start = list(takewhile(skip_line_in_changelog, lines))
>  end = lines[len(start):]
> -- 
> 2.36.1


[PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..59ab7879d55 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (len && mask && mask_reduc_fn != IFN_LAST)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask && mask_reduc_fn != IFN_LAST)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ else
+   ve

Re: Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Hi, all. From all previous cleanup patches.
Every thing related to "mask && len" are consistent now.

I rebase to the trunk and send V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625159.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-20 15:21
To: Robin Dapp
CC: juzhe.zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Support floating-point in-order reduction for length 
loop control
On Thu, 20 Jul 2023, Robin Dapp wrote:
 
> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.
 
Didn't notice that but yes, consistency would be nice to have.
 
Richard.
 


[PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-07-21 Thread Surya Kumari Jangala via Gcc-patches
The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array
---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
   /* Assign the best chosen hard register to A.  */
   ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
   if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
Hi Martin,


> this patch caused flake8 to complain about contrib/mklog.py:
> 
> $ flake8 contrib/mklog.py
> contrib/mklog.py:377:80: E501 line too long (85 > 79 characters)
> contrib/mklog.py:388:26: E127 continuation line over-indented for 
visual indent
> contrib/mklog.py:388:36: W605 invalid escape sequence '\s'
> contrib/mklog.py:388:40: W605 invalid escape sequence '\s'
> contrib/mklog.py:388:44: W605 invalid escape sequence '\s'
> contrib/mklog.py:388:47: W605 invalid escape sequence '\|'
> contrib/mklog.py:388:49: W605 invalid escape sequence '\s'
> contrib/mklog.py:388:51: W605 invalid escape sequence '\d'
> contrib/mklog.py:388:54: W605 invalid escape sequence '\s'
> contrib/mklog.py:388:58: W605 invalid escape sequence '\-'
> 
> Can you please have a look and ideally fix the issues?


Thank you for pointing out this.
I will fix these format errors in another fix patch[1].
I tried to fix the following format error but couldn't
find a way, do you know how to fix this error?



contrib/mklog.py:388:26: E127 continuation line over-indented for visual indent


Best,
Lehua


[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624880.html

Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Fri, Jul 21 2023, Lehua Ding wrote:
> Hi Martin,
>
>
> > this patch caused flake8 to complain about contrib/mklog.py:
> > 
> > $ flake8 contrib/mklog.py
> > contrib/mklog.py:377:80: E501 line too long (85 > 79 characters)
> > contrib/mklog.py:388:26: E127 continuation line over-indented for 
> visual indent
> > contrib/mklog.py:388:36: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:40: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:44: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:47: W605 invalid escape sequence '\|'
> > contrib/mklog.py:388:49: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:51: W605 invalid escape sequence '\d'
> > contrib/mklog.py:388:54: W605 invalid escape sequence '\s'
> > contrib/mklog.py:388:58: W605 invalid escape sequence '\-'
> > 
> > Can you please have a look and ideally fix the issues?
>
>
> Thank you for pointing out this.
> I will fix these format errors in another fix patch[1].

Thanks!

> I tried to fix the following format error but couldn't
> find a way, do you know how to fix this error?
>
>
> contrib/mklog.py:388:26: E127 continuation line over-indented for visual 
> indent

I am no python expert but the following seems to work:

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 26230b9b4f2..2563d19bc99 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -384,8 +384,8 @@ if __name__ == '__main__':
 for line in f:
 if maybe_diff_log == 1 and line == "---\n":
 maybe_diff_log = 2
-elif maybe_diff_log == 2 and \
- re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line):
+elif (maybe_diff_log == 2 and
+  re.match("\s[^\s]+\s+\|\s\d+\s[+\-]+\n", line)):
 lines += [output, "---\n", line]
 maybe_diff_log = 3
 else:

Martin


Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
> I am no python expert but the following seems to work:


Thank you so much, it works for me.


Lehua

Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, 21 Jul 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch support floating-point in-order reduction for loop length control.
> 
> Consider this following case:
> 
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
> 
> When compile with **NO** -ffast-math on ARM SVE, we will end up with:
> 
> loop_mask = WHILE_ULT
> result = MASK_FOLD_LEFT_PLUS (...loop_mask...)
> 
> For RVV, we don't use length loop control instead of mask:
> 
> So, with this patch, we expect to see:
> 
> loop_len = SELECT_VL
> result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
>   (vectorize_fold_left_reduction): Ditto.
>   (vectorizable_reduction): Ditto.
>   (vect_transform_reduction): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc | 41 -
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..59ab7879d55 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -6800,11 +6800,13 @@ static internal_fn
>  get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
>  {
>internal_fn mask_reduc_fn;
> +  internal_fn mask_len_reduc_fn;
>  
>switch (reduc_fn)
>  {
>  case IFN_FOLD_LEFT_PLUS:
>mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
> +  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
>break;
>  
>  default:
> @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
> vectype_in)
>if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
> OPTIMIZE_FOR_SPEED))
>  return mask_reduc_fn;
> +  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
> +   OPTIMIZE_FOR_SPEED))
> +return mask_len_reduc_fn;
>return IFN_LAST;
>  }
>  
> @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
>  gimple *reduc_def_stmt,
>  tree_code code, internal_fn reduc_fn,
>  tree ops[3], tree vectype_in,
> -int reduc_index, vec_loop_masks *masks)
> +int reduc_index, vec_loop_masks *masks,
> +vec_loop_lens *lens)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gimple *new_stmt;
>tree mask = NULL_TREE;
> +  tree len = NULL_TREE;
> +  tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>   mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
> i);
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
> +i, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +   bias = build_int_cst (intQI_type_node, biasval);
> +   mask = build_minus_one_cst (truth_type_for (vectype_in));
> + }
>  
>/* Handle MINUS by adding the negative.  */
>if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
> @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>the preceding operation.  */
>if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
>   {
> -   if (mask && mask_reduc_fn != IFN_LAST)
> +   if (len && mask && mask_reduc_fn != IFN_LAST)

check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead?

> + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
> +def0, mask, len, bias);
> +   else if (mask && mask_reduc_fn != IFN_LAST)

Likewise.

Otherwise looks good to me.

Richard.

>   new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
>  def0, mask);
> else
> @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>  {
>vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
>  
>if (reduction_type != FOLD_LEFT_REDUCTION
> @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>   }
>else
> - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
> -  

Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
Hi Martin,


By the way, is there a standard format required for these Python files?
I see that other Python files have similar format error when checked
using flake8. If so, it feels necessary to configure a git hook on git 
server
to do this check.


Best,
Lehua

[PATCH V3] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add 
mask_len_fold_left_plus.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..9256bc17c9d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask && mask_reduc_fn != IFN_LAST)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ else
+   

Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Thanks Richi,

Address comment on V3:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625167.html 

Bootstrap and regression is on the way.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-21 18:51
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for 
length loop control
On Fri, 21 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch support floating-point in-order reduction for loop length control.
> 
> Consider this following case:
> 
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
> 
> When compile with **NO** -ffast-math on ARM SVE, we will end up with:
> 
> loop_mask = WHILE_ULT
> result = MASK_FOLD_LEFT_PLUS (...loop_mask...)
> 
> For RVV, we don't use length loop control instead of mask:
> 
> So, with this patch, we expect to see:
> 
> loop_len = SELECT_VL
> result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
> (vectorize_fold_left_reduction): Ditto.
> (vectorizable_reduction): Ditto.
> (vect_transform_reduction): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc | 41 -
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..59ab7879d55 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -6800,11 +6800,13 @@ static internal_fn
>  get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
>  {
>internal_fn mask_reduc_fn;
> +  internal_fn mask_len_reduc_fn;
>  
>switch (reduc_fn)
>  {
>  case IFN_FOLD_LEFT_PLUS:
>mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
> +  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
>break;
>  
>  default:
> @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
> vectype_in)
>if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
>OPTIMIZE_FOR_SPEED))
>  return mask_reduc_fn;
> +  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
> +   OPTIMIZE_FOR_SPEED))
> +return mask_len_reduc_fn;
>return IFN_LAST;
>  }
>  
> @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
> gimple *reduc_def_stmt,
> tree_code code, internal_fn reduc_fn,
> tree ops[3], tree vectype_in,
> -int reduc_index, vec_loop_masks *masks)
> +int reduc_index, vec_loop_masks *masks,
> +vec_loop_lens *lens)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gimple *new_stmt;
>tree mask = NULL_TREE;
> +  tree len = NULL_TREE;
> +  tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
> +i, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +   bias = build_int_cst (intQI_type_node, biasval);
> +   mask = build_minus_one_cst (truth_type_for (vectype_in));
> + }
>  
>/* Handle MINUS by adding the negative.  */
>if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
> @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  the preceding operation.  */
>if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
>  {
> -   if (mask && mask_reduc_fn != IFN_LAST)
> +   if (len && mask && mask_reduc_fn != IFN_LAST)
 
check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead?
 
> + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
> +def0, mask, len, bias);
> +   else if (mask && mask_reduc_fn != IFN_LAST)
 
Likewise.
 
Otherwise looks good to me.
 
Richard.
 
>  new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
> def0, mask);
>else
> @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>  {
>vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
>internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
>  
>if (reduction_type != FOLD_LEFT_REDUCTION
> @@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>  }
>else
> - vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
> -vectype_in, NULL);
> + {
> +   internal_fn m

[PATCH V4] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add 
mask_len_fold_left_plus.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..3b296d41157 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ el

Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Oh. Sorry for missing a fix, Now I fix as you suggested on V4
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625169.html 

Change it as follows:

  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
   def0, mask, len, bias);
  else if (mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
new_stmt = gimple_build_call_internal (reduc_fn, 2, reduc_var,
   def0);

Sorry for that.

Bootstrap && Regression on running.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-21 18:51
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] VECT: Support floating-point in-order reduction for 
length loop control
On Fri, 21 Jul 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch support floating-point in-order reduction for loop length control.
> 
> Consider this following case:
> 
> float foo (float *__restrict a, int n)
> {
>   float result = 1.0;
>   for (int i = 0; i < n; i++)
>result += a[i];
>   return result;
> }
> 
> When compile with **NO** -ffast-math on ARM SVE, we will end up with:
> 
> loop_mask = WHILE_ULT
> result = MASK_FOLD_LEFT_PLUS (...loop_mask...)
> 
> For RVV, we don't use length loop control instead of mask:
> 
> So, with this patch, we expect to see:
> 
> loop_len = SELECT_VL
> result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)
> 
> gcc/ChangeLog:
> 
> * tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
> (vectorize_fold_left_reduction): Ditto.
> (vectorizable_reduction): Ditto.
> (vect_transform_reduction): Ditto.
> 
> ---
>  gcc/tree-vect-loop.cc | 41 -
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..59ab7879d55 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -6800,11 +6800,13 @@ static internal_fn
>  get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
>  {
>internal_fn mask_reduc_fn;
> +  internal_fn mask_len_reduc_fn;
>  
>switch (reduc_fn)
>  {
>  case IFN_FOLD_LEFT_PLUS:
>mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
> +  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
>break;
>  
>  default:
> @@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
> vectype_in)
>if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
>OPTIMIZE_FOR_SPEED))
>  return mask_reduc_fn;
> +  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
> +   OPTIMIZE_FOR_SPEED))
> +return mask_len_reduc_fn;
>return IFN_LAST;
>  }
>  
> @@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
> gimple *reduc_def_stmt,
> tree_code code, internal_fn reduc_fn,
> tree ops[3], tree vectype_in,
> -int reduc_index, vec_loop_masks *masks)
> +int reduc_index, vec_loop_masks *masks,
> +vec_loop_lens *lens)
>  {
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
> @@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  {
>gimple *new_stmt;
>tree mask = NULL_TREE;
> +  tree len = NULL_TREE;
> +  tree bias = NULL_TREE;
>if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
>  mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i);
> +  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
> + {
> +   len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
> +i, 1);
> +   signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +   bias = build_int_cst (intQI_type_node, biasval);
> +   mask = build_minus_one_cst (truth_type_for (vectype_in));
> + }
>  
>/* Handle MINUS by adding the negative.  */
>if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
> @@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info 
> loop_vinfo,
>  the preceding operation.  */
>if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
>  {
> -   if (mask && mask_reduc_fn != IFN_LAST)
> +   if (len && mask && mask_reduc_fn != IFN_LAST)
 
check mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS instead?
 
> + new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
> +def0, mask, len, bias);
> +   else if (mask && mask_reduc_fn != IFN_LAST)
 
Likewise.
 
Otherwise looks good to me.
 
Richard.
 
>  new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
> def0, mask);
>else
> @@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,

[PATCH v2] mklog: handle Signed-Off-By, minor cleanup

2023-07-21 Thread Marc Poulhiès via Gcc-patches
Consider Signed-Off-By lines as part of the ending of the initial
commit to avoid having these in the middle of the log when the
changelog part is injected after.

This is particularly usefull with:

 $ git gcc-commit-mklog --amend -s

that can be used to create the changelog and add the Signed-Off-By line.

Also applies most of the shellcheck suggestions on the
prepare-commit-msg hook.

contrib/ChangeLog:

* mklog.py: Leave SOB lines after changelog.
* prepare-commit-msg: Apply most shellcheck suggestions.

Signed-off-by: Marc Poulhiès 
---
Previous version was missing the ChangeLog.

This command is used in particular during the dev of the frontend
for the Rust language (see r13-7099-g4b25fc15b925f8 as an example
of a SoB ending in the middle of the commit message).

Ok for master?

 contrib/mklog.py   | 34 +-
 contrib/prepare-commit-msg | 20 ++--
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 777212c98d7..e5cc69e0d0a 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -41,7 +41,34 @@ from unidiff import PatchSet
 
 LINE_LIMIT = 100
 TAB_WIDTH = 8
-CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
+
+# Initial commit:
+#   +--+
+#   | gccrs: Some title|
+#   |  | This is the "start"
+#   | This is some text explaining the commit. |
+#   | There can be several lines.  |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+#
+# Results in:
+#   +--+
+#   | gccrs: Some title|
+#   |  |
+#   | This is some text explaining the commit. | This is the "start"
+#   | There can be several lines.  |
+#   |  |<--->
+#   | gcc/rust/ChangeLog:  |
+#   |  | This is the generated
+#   | * some_file (bla):   | ChangeLog part
+#   | (foo):   |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+
+# this regex matches the first line of the "end" in the initial commit message
+FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by|co-authored-by|#): ')
 
 pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
@@ -330,10 +357,7 @@ def update_copyright(data):
 
 
 def skip_line_in_changelog(line):
-if line.lower().startswith(CO_AUTHORED_BY_PREFIX) or line.startswith('#'):
-return False
-return True
-
+return FIRST_LINE_OF_END_RE.match(line) == None
 
 if __name__ == '__main__':
 extra_args = os.getenv('GCC_MKLOG_ARGS')
diff --git a/contrib/prepare-commit-msg b/contrib/prepare-commit-msg
index 48c9dad3c6f..1e94706ba40 100755
--- a/contrib/prepare-commit-msg
+++ b/contrib/prepare-commit-msg
@@ -32,11 +32,11 @@ if ! [ -f "$COMMIT_MSG_FILE" ]; then exit 0; fi
 # Don't do anything unless requested to.
 if [ -z "$GCC_FORCE_MKLOG" ]; then exit 0; fi
 
-if [ -z "$COMMIT_SOURCE" ] || [ $COMMIT_SOURCE = template ]; then
+if [ -z "$COMMIT_SOURCE" ] || [ "$COMMIT_SOURCE" = template ]; then
 # No source or "template" means new commit.
 cmd="diff --cached"
 
-elif [ $COMMIT_SOURCE = message ]; then
+elif [ "$COMMIT_SOURCE" = message ]; then
 # "message" means -m; assume a new commit if there are any changes staged.
 if ! git diff --cached --quiet; then
cmd="diff --cached"
@@ -44,23 +44,23 @@ elif [ $COMMIT_SOURCE = message ]; then
cmd="diff --cached HEAD^"
 fi
 
-elif [ $COMMIT_SOURCE = commit ]; then
+elif [ "$COMMIT_SOURCE" = commit ]; then
 # The message of an existing commit.  If it's HEAD, assume --amend;
 # otherwise, assume a new commit with -C.
-if [ $SHA1 = HEAD ]; then
+if [ "$SHA1" = HEAD ]; then
cmd="diff --cached HEAD^"
if [ "$(git config gcc-config.mklog-hook-type)" = "smart-amend" ]; then
# Check if the existing message still describes the staged changes.
f=$(mktemp /tmp/git-commit.XX) || exit 1
-   git log -1 --pretty=email HEAD > $f
-   printf '\n---\n\n' >> $f
-   git $cmd >> $f
+   git log -1 --pretty=email HEAD > "$f"
+   printf '\n---\n\n' >> "$f"
+   git $cmd >> "$f"
if contrib/gcc-changelog/git_email.py "$f" >/dev/null 2>&1; then
  

[C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-21 Thread Martin Uecker via Gcc-patches



This patch adds a warning for allocations with insufficient size
based on the "alloc_size" attribute and the type of the pointer 
the result is assigned to. While it is theoretically legal to
assign to the wrong pointer type and cast it to the right type
later, this almost always indicates an error. Since this catches
common mistakes and is simple to diagnose, it is suggested to
add this warning.
 

Bootstrapped and regression tested on x86. 


Martin



Add option Walloc-type that warns about allocations that have
insufficient storage for the target type of the pointer the
storage is assigned to.

gcc:
* doc/invoke.texi: Document -Wstrict-flex-arrays option.

gcc/c-family:

* c.opt (Walloc-type): New option.

gcc/c:
* c-typeck.cc (convert_for_assignment): Add Walloc-type warning.

gcc/testsuite:

* gcc.dg/Walloc-type-1.c: New test.


diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4abdc8d0e77..8b9d148582b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -319,6 +319,10 @@ Walloca
 C ObjC C++ ObjC++ Var(warn_alloca) Warning
 Warn on any use of alloca.
 
+Walloc-type
+C ObjC Var(warn_alloc_type) Warning
+Warn when allocating insufficient storage for the target type of the
assigned pointer.
+
 Walloc-size-larger-than=
 C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
ByteSize Warning Init(HOST_WIDE_INT_MAX)
 -Walloc-size-larger-than=   Warn for calls to allocation
functions that
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..2e392f9c952 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
location_t expr_loc, tree type,
"request for implicit conversion "
"from %qT to %qT not permitted in C++", rhstype,
type);
 
+  /* Warn of new allocations are not big enough for the target
type.  */
+  tree fndecl;
+  if (warn_alloc_type
+ && TREE_CODE (rhs) == CALL_EXPR
+ && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
+ && DECL_IS_MALLOC (fndecl))
+   {
+ tree fntype = TREE_TYPE (fndecl);
+ tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
+ tree alloc_size = lookup_attribute ("alloc_size",
fntypeattrs);
+ if (alloc_size)
+   {
+ tree args = TREE_VALUE (alloc_size);
+ int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
+ /* For calloc only use the second argument.  */
+ if (TREE_CHAIN (args))
+   idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
(args))) - 1;
+ tree arg = CALL_EXPR_ARG (rhs, idx);
+ if (TREE_CODE (arg) == INTEGER_CST
+ && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
+warning_at (location, OPT_Walloc_type, "allocation of
"
+"insufficient size %qE for type %qT with
"
+"size %qE", arg, ttl, TYPE_SIZE_UNIT
(ttl));
+   }
+   }
+
   /* See if the pointers point to incompatible address spaces.  */
   asl = TYPE_ADDR_SPACE (ttl);
   asr = TYPE_ADDR_SPACE (ttr);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 88e3c625030..6869bed64c3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
function such as wrappers of
 C++ @code{throw} or fatal error reporting functions leading to
@code{abort}.
 @end table
 
+@opindex Wno-alloc-type
+@opindex Walloc-type
+@item -Walloc-type
+Warn about calls to allocation functions decorated with attribute
+@code{alloc_size} that specify insufficient size for the target type
of
+the pointer the result is assigned to, including those to the built-in
+forms of the functions @code{aligned_alloc}, @code{alloca},
@code{calloc},
+@code{malloc}, and @code{realloc}.
+
 @opindex Wno-alloc-zero
 @opindex Walloc-zero
 @item -Walloc-zero
diff --git a/gcc/testsuite/gcc.dg/Walloc-type-1.c
b/gcc/testsuite/gcc.dg/Walloc-type-1.c
new file mode 100644
index 000..bc62e5e9aa3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-type-1.c
@@ -0,0 +1,37 @@
+/* Tests the warnings for insufficient allocation size. 
+   { dg-do compile }
+ * { dg-options "-Walloc-type" } 
+ * */
+#include 
+#include 
+
+struct b { int x[10]; };
+
+void fo0(void)
+{
+struct b *p = malloc(sizeof *p);
+}
+
+void fo1(void)
+{
+struct b *p = malloc(sizeof p);/* { dg-
warning "allocation of insufficient size" } */
+}
+
+void fo2(void)
+{
+struct b *p = alloca(sizeof p);/* { dg-
warning "allocation of insufficient size" } */
+}
+
+void fo3(void)
+{
+struct b *p = calloc(1, sizeof p); /* { dg-warning
"allocation of insufficient size" } */
+}
+
+void g(struct b* p);
+
+void fo4(void)
+{
+g(malloc(4));  /* { dg-warning "allocation of
insufficient size" } */
+}
+
+





finite_loop_p tweak

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
we have finite_p flag in loop structure.  finite_loop_p already know to
use it, but we also may set the flag when we prove loop to be finite by
SCEV analysis to avoid duplicated work.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-niter.cc (finite_loop_p): Reorder to do cheap
tests first; update finite_p flag.

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 3c4e66291fb..e5985bee235 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -3338,24 +3338,6 @@ finite_loop_p (class loop *loop)
   widest_int nit;
   int flags;
 
-  flags = flags_from_decl_or_type (current_function_decl);
-  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Found loop %i to be finite: it is within pure or 
const function.\n",
-loop->num);
-  return true;
-}
-
-  if (loop->any_upper_bound
-  || max_loop_iterations (loop, &nit))
-{
-  if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Found loop %i to be finite: upper bound found.\n",
-loop->num);
-  return true;
-}
-
   if (loop->finite_p)
 {
   unsigned i;
@@ -3368,11 +3350,36 @@ finite_loop_p (class loop *loop)
  {
if (dump_file)
  fprintf (dump_file, "Assume loop %i to be finite: it has an exit "
-  "and -ffinite-loops is on.\n", loop->num);
+  "and -ffinite-loops is on or loop was
+  " previously finite.\n",
+  loop->num);
return true;
  }
 }
 
+  flags = flags_from_decl_or_type (current_function_decl);
+  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Found loop %i to be finite: it is within "
+"pure or const function.\n",
+loop->num);
+  loop->finite_p = true;
+  return true;
+}
+
+  if (loop->any_upper_bound
+  /* Loop with no normal exit will not pass max_loop_iterations.  */
+  || (!loop->finite_p && max_loop_iterations (loop, &nit)))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Found loop %i to be finite: upper bound found.\n",
+loop->num);
+  loop->finite_p = true;
+  return true;
+}
+
   return false;
 }
 


Re: finite_loop_p tweak

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 1:45 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> we have finite_p flag in loop structure.  finite_loop_p already know to
> use it, but we also may set the flag when we prove loop to be finite by
> SCEV analysis to avoid duplicated work.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK

> gcc/ChangeLog:
>
> * tree-ssa-loop-niter.cc (finite_loop_p): Reorder to do cheap
> tests first; update finite_p flag.
>
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 3c4e66291fb..e5985bee235 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -3338,24 +3338,6 @@ finite_loop_p (class loop *loop)
>widest_int nit;
>int flags;
>
> -  flags = flags_from_decl_or_type (current_function_decl);
> -  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "Found loop %i to be finite: it is within pure or 
> const function.\n",
> -loop->num);
> -  return true;
> -}
> -
> -  if (loop->any_upper_bound
> -  || max_loop_iterations (loop, &nit))
> -{
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -   fprintf (dump_file, "Found loop %i to be finite: upper bound 
> found.\n",
> -loop->num);
> -  return true;
> -}
> -
>if (loop->finite_p)
>  {
>unsigned i;
> @@ -3368,11 +3350,36 @@ finite_loop_p (class loop *loop)
>   {
> if (dump_file)
>   fprintf (dump_file, "Assume loop %i to be finite: it has an 
> exit "
> -  "and -ffinite-loops is on.\n", loop->num);
> +  "and -ffinite-loops is on or loop was
> +  " previously finite.\n",
> +  loop->num);
> return true;
>   }
>  }
>
> +  flags = flags_from_decl_or_type (current_function_decl);
> +  if ((flags & (ECF_CONST|ECF_PURE)) && !(flags & ECF_LOOPING_CONST_OR_PURE))
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file,
> +"Found loop %i to be finite: it is within "
> +"pure or const function.\n",
> +loop->num);
> +  loop->finite_p = true;
> +  return true;
> +}
> +
> +  if (loop->any_upper_bound
> +  /* Loop with no normal exit will not pass max_loop_iterations.  */
> +  || (!loop->finite_p && max_loop_iterations (loop, &nit)))
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, "Found loop %i to be finite: upper bound 
> found.\n",
> +loop->num);
> +  loop->finite_p = true;
> +  return true;
> +}
> +
>return false;
>  }
>


[PATCH] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches
Hi everyone,

Looking forward to all your reviews.

Best regards,
Cupertino

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

* config/bpf/bpf.opt: Added option -masm=.
* config/bpf/bpf-opts.h: Likewize.
* config/bpf/bpf.cc: Changed it to conform with new pseudoc
  dialect support.
* config/bpf/bpf.h: Likewise.
* config/bpf/bpf.md: Added pseudo-c templates.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 5 files changed, 114 insertions(+), 54 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+   {
+ if (REGNO (op) == BPF_FP)
+   fprintf (file, "w10");
+ else
+   fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+   }
+  else
+   {
+ if (REGNO (op) == BPF_FP)
+   fprintf (file, "r10");
+ else
+   fprintf (file, "%s", reg_names[REGNO (op)]+1);
+   }
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
  {
-   fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+   fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+   bpf_print_register (file, op0, 0);
+   fprintf (file, "+");
output_addr_const (file, op1);
-   fputs ("]", file);
+   fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
  }
else
  fatal_insn ("invalid address in operand", addr);
@@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
 }
 }
 
-
 /* This pass finds accesses to structures marked with the BPF target attribute
__attribute__((preserve_access_index)). For every such access, a CO-RE
relocation record is generated, to be output in the .BTF.ext section.  */
diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 344aca02d1bb..9561bf59b800 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -22,7 +22,8 @@
 
 / Controlling the Compilation Driver.  */
 
-#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
+#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
+  "%{masm=pseudoc:-mdialect=pseudoc}"
 #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
 #define LIB_SPEC ""
 #define STARTFILE_SPEC ""
@@ -503,4 +504,6 @@ enum reg_class
 #define DO_GLOBAL_DTORS_BODY   \
   do { } while (0)
 
+#define ASSEMBLER_DIALECT ((int) asm_dialect)
+
 #endif /* ! GCC_BPF_H */
diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index f6be0a212345..0b8f409db687 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -77,6 +77,8 @@
 
 (define_mode_attr mop [(QI "b") (HI "h") (SI "w") (DI "dw")
(S

Re: [PATCH] vect: Don't vectorize a single scalar iteration loop [PR110740]

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 8:08 AM Kewen.Lin  wrote:
>
> Hi,
>
> The function vect_update_epilogue_niters which has been
> removed by r14-2281 has some code taking care of that if
> there is only one scalar iteration left for epilogue then
> we won't try to vectorize it any more.
>
> Although costing should be able to care about it eventually,
> I think we still want this special casing without costing
> enabled, so this patch is to add it back in function
> vect_analyze_loop_costing, and make it more general for
> both main and epilogue loops as Richi suggested, it can fix
> some exposed failures on Power10:
>
>  - gcc.target/powerpc/p9-vec-length-epil-{1,8}.c
>  - gcc.dg/vect/slp-perm-{1,5,6,7}.c
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu, powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

OK.

Thanks,
Richard.

> BR,
> Kewen
> -
> PR tree-optimization/110740
>
> gcc/ChangeLog:
>
> * tree-vect-loop.cc (vect_analyze_loop_costing): Do not vectorize a
> loop with a single scalar iteration.
> ---
>  gcc/tree-vect-loop.cc | 55 ++-
>  1 file changed, 34 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..92d2abde094 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2158,8 +2158,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>   epilogue we can also decide whether the main loop leaves us
>   with enough iterations, prefering a smaller vector epilog then
>   also possibly used for the case we skip the vector loop.  */
> -  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)
> -  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>  {
>widest_int scalar_niters
> = wi::to_widest (LOOP_VINFO_NITERSM1 (loop_vinfo)) + 1;
> @@ -2182,32 +2181,46 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo,
>% lowest_vf + gap);
> }
> }
> -
> -  /* Check that the loop processes at least one full vector.  */
> -  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> -  if (known_lt (scalar_niters, vf))
> +  /* Reject vectorizing for a single scalar iteration, even if
> +we could in principle implement that using partial vectors.  */
> +  unsigned peeling_gap = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
> +  if (scalar_niters <= peeling_gap + 1)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"loop does not have enough iterations "
> -"to support vectorization.\n");
> +"not vectorized: loop only has a single "
> +"scalar iteration.\n");
>   return 0;
> }
>
> -  /* If we need to peel an extra epilogue iteration to handle data
> -accesses with gaps, check that there are enough scalar iterations
> -available.
> -
> -The check above is redundant with this one when peeling for gaps,
> -but the distinction is useful for diagnostics.  */
> -  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> - && known_le (scalar_niters, vf))
> +  if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"loop does not have enough iterations "
> -"to support peeling for gaps.\n");
> - return 0;
> + /* Check that the loop processes at least one full vector.  */
> + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> + if (known_lt (scalar_niters, vf))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"loop does not have enough iterations "
> +"to support vectorization.\n");
> + return 0;
> +   }
> +
> + /* If we need to peel an extra epilogue iteration to handle data
> +accesses with gaps, check that there are enough scalar iterations
> +available.
> +
> +The check above is redundant with this one when peeling for gaps,
> +but the distinction is useful for diagnostics.  */
> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> + && known_le (scalar_niters, vf))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"loop does not have enough iterations "
> +"to support peeling for gaps.\n");
> + return 0;
> +   }
> }

loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
currently loop-ch skips all do-while loops.  But when loop is not do-while
in addition to original goal of turining it to do-while it can do additional
things:
 1) move out loop invariant computations
 2) duplicate loop invariant conditionals and eliminate them in loop body.
 3) prove that some exits are always true in first iteration
and can be skipped

Most of time 1 can be done by lim (exception is when the invariant computation
is conditional). For 2 we however don't really have other place doing it except
for loop unswitching that is more expensive (it will duplicate the loop and
then optimize out one path to non-loop).
3 can be done by loop peeling but it is also more expensive by duplicating full
loop body.

This patch improves heuristics by not giving up on do-while loops and trying
to find sequence of BBs to duplicate to obtain one of goals:
 - turn loop to do-while
 - eliminate invariant conditional in loop body
 - do partial "peeling" as long as code optimizes enough so this does not
   increase code size.
This can be improved upon, but I think this patch should finally get
heuristics into shape that it does not do weird things.

The patch requires bit of testsuite changes
 - I disabled ch in loop-unswitch-17.c since it tests unswitching of
   loop invariant conditional.
 - pr103079.c needs ch disabled to trigger vrp situation it tests for
   (otherwise we optimize stuff earlier and better)
 - copy-headers-7.c now gets only 2 basic blocks duplicated since
   last conditional does not seem to benefit from duplicating,
   so I reordered them.
copy-headers-9 tests the new logic.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-loop-ch.cc (enum ch_decision): New enum.
(should_duplicate_loop_header_p): Return info on profitability.
(do_while_loop_p): Watch for constant conditionals.
(update_profile_after_ch): Do not sanity check that all
static exits are taken.
(ch_base::copy_headers): Run on all loops.
(pass_ch::process_loop_p): Improve heuristics by handling also
do_while loop and duplicating shortest sequence containing all
winning blocks.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-17.c: Disable ch.
* gcc.dg/pr103079.c: Disable ch.
* gcc.dg/tree-ssa/copy-headers-7.c: Update so ch behaves
as expected.
* gcc.dg/tree-ssa/copy-headers.c: Update template.
* gcc.dg/tree-ssa/copy-headers-9.c: New test.

diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-17.c 
b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
index 8655e09a51c..4b806c475b1 100644
--- a/gcc/testsuite/gcc.dg/loop-unswitch-17.c
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized 
-fno-tree-ch" } */
 
 int foo (int a)
 {
diff --git a/gcc/testsuite/gcc.dg/pr103079.c b/gcc/testsuite/gcc.dg/pr103079.c
index 7f6632fc669..7b107544725 100644
--- a/gcc/testsuite/gcc.dg/pr103079.c
+++ b/gcc/testsuite/gcc.dg/pr103079.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Os -fdump-tree-vrp2" } */
+/* { dg-options "-Os -fdump-tree-vrp2 -fno-tree-ch" } */
 
 int a, b = -2;
 int main() {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
index e2a6c75f2e9..b3df3b6398e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
@@ -4,7 +4,7 @@
 int is_sorted(int *a, int n, int m, int k)
 {
   if (k > 0)
-for (int i = 0; i < n - 1 && m && k > i; i++)
+for (int i = 0; k > i && m && i < n - 1 ; i++)
   if (a[i] > a[i + 1])
return 0;
   return 1;
@@ -17,5 +17,4 @@ int is_sorted(int *a, int n, int m, int k)
 /* { dg-final { scan-tree-dump-times "Conditional combines static and 
invariant" 0 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will elliminate invariant exit" 1 "ch2" 
} } */
 /* { dg-final { scan-tree-dump-times "Will eliminate peeled conditional" 1 
"ch2" } } */
-/* { dg-final { scan-tree-dump-times "Not duplicating bb .: condition based on 
non-IV loop variant." 1 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will duplicate bb" 3 "ch2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
new file mode 100644
index 000..7cc162ca94d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch-details" } */
+int a[100];
+void test (int m, int n)
+{
+   int i = 0;
+   do
+   {
+   if (m)
+   break;
+   i++;
+   a[i]=0;
+   }
+   while (i<10);
+}
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } */
+/* { dg-final { scan-tree-dump-t

Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Martin Jambor
Hello Lehua,

On Fri, Jul 21 2023, Lehua Ding wrote:
> Hi Martin,
>
>
> By the way, is there a standard format required for these Python files?

Generally, our Python coding conventions are at
https://gcc.gnu.org/codingconventions.html#python

> I see that other Python files have similar format error when checked
> using flake8.

For historic reasons (i.e. Martin Liška set it up that way), we
currently use flake8 to check python formatting of
contrib/gcc-changelog, contrib/mklog.py and
maintainer-scripts/branch_changer.py and use pytest to check
contrib/gcc-changelog and contrib/test_mklog.py.  That is how I found
out.

I guess many of the files predate the coding conventions and so don't
adhere to them.  Patches to fix them are welcome (I guess) but at least
we should not regress (I guess).

> If so, it feels necessary to configure a git hook on git server to do
> this check.

Performing more thorough checks on pushed commits is a much larger topic
than this thread.  FWIW, I would not oppose to checking python scripts
that are known to be OK.

Martin


[PATCH] tree-optimization/41320 - remove bogus XFAILed testcase

2023-07-21 Thread Richard Biener via Gcc-patches
gcc.dg/tree-ssa/forwprop-12.c looks for reconstruction of an
ARRAY_REF from pointer arithmetic and dereference.  That's not
safe because ARRAY_REFs carry special semantics we later exploit
during data dependence analysis.

The following removes the testcase, closing the bug as WONTFIX.

Pushed.

PR tree-optimization/41320
* gcc.dg/tree-ssa/forwprop-12.c: Remove.
---
 gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c | 21 -
 1 file changed, 21 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
deleted file mode 100644
index de16c6848f2..000
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-12.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-forwprop1" } */
-
-struct X { int a[256]; };
-
-int foo(struct X *p, __SIZE_TYPE__ i)
-{
-  int *q = &p->a[0];
-  int *q2 = (int *)((void *)q + i*4 + 32);
-  return *q2;
-}
-
-int bar(struct X *p, int i)
-{
-  return *((int *)p + i + 8);
-}
-
-/* We should have propagated the base array address through the
-   address arithmetic into the memory access as an array access.  */
-
-/* { dg-final { scan-tree-dump-times "->a\\\[D\\\." 2 "forwprop1" { xfail 
*-*-* } } } */
-- 
2.35.3


[PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches
On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

This change ensures those operations are not fused and hence stops the test
being flaky on that particular machine.  There is no expected change in the
generated code.
Bootstrap & regtest on AArch64 passes with no regressions.

gcc/ChangeLog:

* timevar.cc (get_time): Make this noinline to avoid fusing
behaviour and associated test flakyness.


N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
global reviewer that had the most contributions to this file and Richard
Sandiford since I've asked him for reviews a lot in the past.


### Attachment also inlined for ease of reply###


diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 
d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def 
*total)
HAVE_WALL_TIME macros.  */
 
 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;



diff --git a/gcc/timevar.cc b/gcc/timevar.cc
index 
d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
 100644
--- a/gcc/timevar.cc
+++ b/gcc/timevar.cc
@@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const timevar_time_def 
*total)
HAVE_WALL_TIME macros.  */
 
 static void
+__attribute__((noinline))
 get_time (struct timevar_time_def *now)
 {
   now->user = 0;





Re: loop-ch improvements, part 5

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, Jul 21, 2023 at 1:53 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> currently loop-ch skips all do-while loops.  But when loop is not do-while
> in addition to original goal of turining it to do-while it can do additional
> things:
>  1) move out loop invariant computations
>  2) duplicate loop invariant conditionals and eliminate them in loop body.
>  3) prove that some exits are always true in first iteration
> and can be skipped
>
> Most of time 1 can be done by lim (exception is when the invariant computation
> is conditional). For 2 we however don't really have other place doing it 
> except
> for loop unswitching that is more expensive (it will duplicate the loop and
> then optimize out one path to non-loop).
> 3 can be done by loop peeling but it is also more expensive by duplicating 
> full
> loop body.
>
> This patch improves heuristics by not giving up on do-while loops and trying
> to find sequence of BBs to duplicate to obtain one of goals:
>  - turn loop to do-while
>  - eliminate invariant conditional in loop body
>  - do partial "peeling" as long as code optimizes enough so this does not
>increase code size.
> This can be improved upon, but I think this patch should finally get
> heuristics into shape that it does not do weird things.
>
> The patch requires bit of testsuite changes
>  - I disabled ch in loop-unswitch-17.c since it tests unswitching of
>loop invariant conditional.
>  - pr103079.c needs ch disabled to trigger vrp situation it tests for
>(otherwise we optimize stuff earlier and better)
>  - copy-headers-7.c now gets only 2 basic blocks duplicated since
>last conditional does not seem to benefit from duplicating,
>so I reordered them.
> copy-headers-9 tests the new logic.
>
> Bootstrapped/regtested x86_64-linux, OK?

OK.  In case the size heuristics are a bit too optimistic we could avoid the
peeling in the -Os case?  Did you do any stats on TUs to see whether code
actually increases in the end?

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-ch.cc (enum ch_decision): New enum.
> (should_duplicate_loop_header_p): Return info on profitability.
> (do_while_loop_p): Watch for constant conditionals.
> (update_profile_after_ch): Do not sanity check that all
> static exits are taken.
> (ch_base::copy_headers): Run on all loops.
> (pass_ch::process_loop_p): Improve heuristics by handling also
> do_while loop and duplicating shortest sequence containing all
> winning blocks.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/loop-unswitch-17.c: Disable ch.
> * gcc.dg/pr103079.c: Disable ch.
> * gcc.dg/tree-ssa/copy-headers-7.c: Update so ch behaves
> as expected.
> * gcc.dg/tree-ssa/copy-headers.c: Update template.
> * gcc.dg/tree-ssa/copy-headers-9.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-17.c 
> b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> index 8655e09a51c..4b806c475b1 100644
> --- a/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-17.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized" } */
> +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-optimized 
> -fno-tree-ch" } */
>
>  int foo (int a)
>  {
> diff --git a/gcc/testsuite/gcc.dg/pr103079.c b/gcc/testsuite/gcc.dg/pr103079.c
> index 7f6632fc669..7b107544725 100644
> --- a/gcc/testsuite/gcc.dg/pr103079.c
> +++ b/gcc/testsuite/gcc.dg/pr103079.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-Os -fdump-tree-vrp2" } */
> +/* { dg-options "-Os -fdump-tree-vrp2 -fno-tree-ch" } */
>
>  int a, b = -2;
>  int main() {
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> index e2a6c75f2e9..b3df3b6398e 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
> @@ -4,7 +4,7 @@
>  int is_sorted(int *a, int n, int m, int k)
>  {
>if (k > 0)
> -for (int i = 0; i < n - 1 && m && k > i; i++)
> +for (int i = 0; k > i && m && i < n - 1 ; i++)
>if (a[i] > a[i + 1])
> return 0;
>return 1;
> @@ -17,5 +17,4 @@ int is_sorted(int *a, int n, int m, int k)
>  /* { dg-final { scan-tree-dump-times "Conditional combines static and 
> invariant" 0 "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will elliminate invariant exit" 1 
> "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will eliminate peeled conditional" 1 
> "ch2" } } */
> -/* { dg-final { scan-tree-dump-times "Not duplicating bb .: condition based 
> on non-IV loop variant." 1 "ch2" } } */
>  /* { dg-final { scan-tree-dump-times "Will duplicate bb" 3 "ch2" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
> new file mode 100644
> index 000..7cc162ca94d
> --- /

Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Richard Biener via Gcc-patches
On Fri, 21 Jul 2023, Matthew Malcomson wrote:

> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
> 
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
> 
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

I think this is undesriable.  With fused you mean we use FMA?
I think you could use -ffp-contract=off for the TU instead.

Note you can't use __attribute__((noinline)) literally since the
host compiler might not support this.

Richard.

> gcc/ChangeLog:
> 
>   * timevar.cc (get_time): Make this noinline to avoid fusing
>   behaviour and associated test flakyness.
> 
> 
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 
> d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
>  100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const 
> timevar_time_def *total)
> HAVE_WALL_TIME macros.  */
>  
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>now->user = 0;
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 13:11 +0100, Matthew Malcomson via Gcc-patches
wrote:
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.
> 
> gcc/ChangeLog:
> 
>   * timevar.cc (get_time): Make this noinline to avoid fusing
>   behaviour and associated test flakyness.

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: loop-ch improvements, part 5

2023-07-21 Thread Jan Hubicka via Gcc-patches
> > The patch requires bit of testsuite changes
> >  - I disabled ch in loop-unswitch-17.c since it tests unswitching of
> >loop invariant conditional.
> >  - pr103079.c needs ch disabled to trigger vrp situation it tests for
> >(otherwise we optimize stuff earlier and better)
> >  - copy-headers-7.c now gets only 2 basic blocks duplicated since
> >last conditional does not seem to benefit from duplicating,
> >so I reordered them.
> > copy-headers-9 tests the new logic.
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> OK.  In case the size heuristics are a bit too optimistic we could avoid the
Thanks!
> peeling in the -Os case?  Did you do any stats on TUs to see whether code
> actually increases in the end?

I did only stats on tramp3d and some GCC source files with -O2 where the
new heuristics actually tends to duplicate fewer BBs overall because of
the logic stopping the duplication chain after last winning header while
the prevoious implementation keeps rolling loop more.  Difference is
small (sub 1%) since most loops are very simple and have only one header
BB to duplicate.  We however handle more loops overall and produce more
do-whiles.

I think there is some potential in getting heuristics more speculative
now and allowing more partial peeling, but the code right now is still
on safe side.

For -Os we set code growth limit to 0 so we only duplicate if we know
that one of the two copies will be optimized out.  This is more strict
than we did previously and I need to get more stats on this - we may
want to bump up the limit or at least increase it to account the extra
jump saved with while -> do-while conversion.

Honza


Re: [PATCH] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Hello Cuper.

Thanks for the patch.

We will need an update for the "eBPF Options" section in the GCC manual,
documenting -masm=@var{dialect} and the supported values.  Can you
please add it and re-submit?


> Hi everyone,
>
> Looking forward to all your reviews.
>
> Best regards,
> Cupertino
>
> New pseudo-c BPF assembly dialect already supported by clang and widely
> used in the linux kernel.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h: Likewize.
>   * config/bpf/bpf.cc: Changed it to conform with new pseudoc
> dialect support.
>   * config/bpf/bpf.h: Likewise.
>   * config/bpf/bpf.md: Added pseudo-c templates.
> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  5 files changed, 114 insertions(+), 54 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
> {
> - fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> + bpf_print_register (file, op0, 0);
> + fprintf (file, "+");
>   output_addr_const (file, op1);
> - fputs ("]", file);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
> }
>   else
> fatal_insn ("invalid address in operand", addr);
> @@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
>  }
>  }
>  
> -
>  /* This pass finds accesses to structures marked with the BPF target 
> attribute
> __attribute__((preserve_access_index)). For every such access, a CO-RE
> relocation record is generated, to be output in the .BTF.ext section.  */
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index 344aca02d1bb..9561bf59b800 100644
> --- a/gcc/config/bpf/bpf.h
> +++ b/gcc/config/bpf/bpf.h
> @@ -22,7 +22,8 @@
>  
>  / Controlling the Compilation Driver.  */
>  
> -#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
> +#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
> +  "%{masm=pseudoc:-mdialect=pseudoc}"
>  #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
>  #define LIB_SPEC ""
>  #define STARTFILE_SPEC ""
> @@ -503,4 +

Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches

Responding to two emails at the same time ;-)

On 7/21/23 13:47, Richard Biener wrote:

On Fri, 21 Jul 2023, Matthew Malcomson wrote:


On some AArch64 bootstrapped builds, we were getting a flaky test
because the floating point operations in `get_time` were being fused
with the floating point operations in `timevar_accumulate`.

This meant that the rounding behaviour of our multiplication with
`ticks_to_msec` was different when used in `timer::start` and when
performed in `timer::stop`.  These extra inaccuracies led to the
testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.

This change ensures those operations are not fused and hence stops the test
being flaky on that particular machine.  There is no expected change in the
generated code.
Bootstrap & regtest on AArch64 passes with no regressions.


I think this is undesriable.  With fused you mean we use FMA?
I think you could use -ffp-contract=off for the TU instead.


Yeah -- we used fused multiply subtract because we combined the multiply 
in `get_time` with the subtract in `timevar_accumulate`.




Note you can't use __attribute__((noinline)) literally since the
host compiler might not support this.

Richard.



On 7/21/23 13:49, Xi Ruoyao wrote:
...

I don't think it's correct.  It will break bootstrapping GCC from other
ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
And IMO it's just hiding the real problem.

We need more info of the "particular machine".  Is this a hardware bug
(i.e. the machine violates the AArch64 spec) or a GCC code generation
issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?



My understanding is that this is not a hardware bug and that it's 
specified that rounding does not happen on the multiply "sub-part" in 
`FNMSUB`, but rounding happens on the `FMUL` that generates some input 
to it.


I was given to understand from discussions with others that this codegen 
is allowed -- though I honestly didn't confirm the line of reasoning 
through all the relevant standards.




W.r.t. both:
Thanks for pointing out bootstrapping from other ISO C++ compilers -- 
(didn't realise that was a concern).


I can look into `-ffp-contract=off` as you both have recommended.
One question -- if we have concerns that the host compiler may not be 
able to handle `attribute((noinline))` would we also be concerned that 
this flag may not be supported?
(Or is the severity of lack of support sufficiently different in the two 
cases that this is fine -- i.e. not compile vs may trigger floating 
point rounding inaccuracies?)





Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Richard Biener via Gcc-patches



> Am 21.07.2023 um 15:12 schrieb Matthew Malcomson :
> 
> Responding to two emails at the same time ;-)
> 
>> On 7/21/23 13:47, Richard Biener wrote:
>>> On Fri, 21 Jul 2023, Matthew Malcomson wrote:
>>> On some AArch64 bootstrapped builds, we were getting a flaky test
>>> because the floating point operations in `get_time` were being fused
>>> with the floating point operations in `timevar_accumulate`.
>>> 
>>> This meant that the rounding behaviour of our multiplication with
>>> `ticks_to_msec` was different when used in `timer::start` and when
>>> performed in `timer::stop`.  These extra inaccuracies led to the
>>> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>>> 
>>> This change ensures those operations are not fused and hence stops the test
>>> being flaky on that particular machine.  There is no expected change in the
>>> generated code.
>>> Bootstrap & regtest on AArch64 passes with no regressions.
>> I think this is undesriable.  With fused you mean we use FMA?
>> I think you could use -ffp-contract=off for the TU instead.
> 
> Yeah -- we used fused multiply subtract because we combined the multiply in 
> `get_time` with the subtract in `timevar_accumulate`.
> 
>> Note you can't use __attribute__((noinline)) literally since the
>> host compiler might not support this.
>> Richard.
> 
> On 7/21/23 13:49, Xi Ruoyao wrote:
> ...
>> I don't think it's correct.  It will break bootstrapping GCC from other
>> ISO C++11 compilers, you need to at least guard it with #ifdef __GNUC__.
>> And IMO it's just hiding the real problem.
>> We need more info of the "particular machine".  Is this a hardware bug
>> (i.e. the machine violates the AArch64 spec) or a GCC code generation
>> issue?  Or should we generally use -ffp-contract=off in BOOT_CFLAGS?
> 
> My understanding is that this is not a hardware bug and that it's specified 
> that rounding does not happen on the multiply "sub-part" in `FNMSUB`, but 
> rounding happens on the `FMUL` that generates some input to it.
> 
> I was given to understand from discussions with others that this codegen is 
> allowed -- though I honestly didn't confirm the line of reasoning through all 
> the relevant standards.
> 
> 
> 
> W.r.t. both:
> Thanks for pointing out bootstrapping from other ISO C++ compilers -- (didn't 
> realise that was a concern).
> 
> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be able to 
> handle `attribute((noinline))` would we also be concerned that this flag may 
> not be supported?
> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating point 
> rounding inaccuracies?)

I’d only use it in stage2+ flags where we know we’re dealing with GCC 

Richard 

> 
> 


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:
> My understanding is that this is not a hardware bug and that it's 
> specified that rounding does not happen on the multiply "sub-part" in 
> `FNMSUB`, but rounding happens on the `FMUL` that generates some input
> to it.

AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.

> I can look into `-ffp-contract=off` as you both have recommended.
> One question -- if we have concerns that the host compiler may not be 
> able to handle `attribute((noinline))` would we also be concerned that
> this flag may not be supported?

Only use it in BOOT_CFLAGS, i. e. 'make BOOT_CFLAGS="-O2 -g -ffp-
contract=on"' (or "off" instead of "on").  In 3-stage bootstrapping it's
only applied in stage 2 and 3, during which GCC is compiled by itself.

> (Or is the severity of lack of support sufficiently different in the two 
> cases that this is fine -- i.e. not compile vs may trigger floating 
> point rounding inaccuracies?)

It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Alexander Monakov


On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:

> Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> are building GCC 14 snapshot).  The default is "fast" (if no -std=
> option is used), which allows some contractions disallowed by the
> standard.

Not fully, see below.

> But GCC is in C++ and I'm not sure if the C++ standard has the same
> definition for allowed contractions as C.

It doesn't, but in GCC we should aim to provide the same semantics in C++
as in C.

> > (Or is the severity of lack of support sufficiently different in the two 
> > cases that this is fine -- i.e. not compile vs may trigger floating 
> > point rounding inaccuracies?)
> 
> It's possible that the test itself is flaky.  Can you provide some
> detail about how it fails?

See also PR 99903 for an earlier known issue which appears due to x87
excess precision and so tweaking -ffp-contract wouldn't help:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Now that multiple platforms are hitting this, can we _please_ get rid
of the questionable attempt to compute time in a floating-point variable
and just use an uint64_t storing nanoseconds?

Alexander


Re: [PATCH v2] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches

Hi Jose,

Thanks for the review.
New patch is inline attached.

Regards,
Cupertino

Jose E. Marchesi writes:

> Hello Cuper.
>
> Thanks for the patch.
>
> We will need an update for the "eBPF Options" section in the GCC manual,
> documenting -masm=@var{dialect} and the supported values.  Can you
> please add it and re-submit?
>
>
>> Hi everyone,
>>
>> Looking forward to all your reviews.
>>
>> Best regards,
>> Cupertino


>From fa227fefd84e6eaaf8edafed698e9960d7b115e6 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Mon, 17 Jul 2023 17:42:42 +0100
Subject: [PATCH v2] bpf: pseudo-c assembly dialect support

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

	* config/bpf/bpf.opt: Added option -masm=.
	* config/bpf/bpf-opts.h: Likewize.
	* config/bpf/bpf.cc: Changed it to conform with new pseudoc
	  dialect support.
	* config/bpf/bpf.h: Likewise.
	* config/bpf/bpf.md: Added pseudo-c templates.
	* doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 gcc/doc/invoke.texi   | 21 -
 6 files changed, 133 insertions(+), 56 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "w10");
+	  else
+	fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+	}
+  else
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "r10");
+	  else
+	fprintf (file, "%s", reg_names[REGNO (op)]+1);
+	}
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
 	if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
 	  {
-	fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+	bpf_print_register (file, op0, 0);
+	fprintf (file, "+");
 	output_addr_const (file, op1);
-	fputs ("]", file);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
 	  }
 	else
 	  fatal_insn ("invalid address in operand", addr);
@@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
 }
 }
 
-
 /* This pass finds accesses to structures marked with the BPF target attribute
__attribute__((preserve_access_index)). For every such access, a CO-RE
relocation record is generated, to be output in the .BTF.ext section.  */
diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
index 344aca02d1bb..9561bf59b800 100644
--- a/gcc/config/bpf/bpf.h
+++ b/gcc/config/bpf/bpf.h
@@ -22,7 +22,8 @@
 
 / Controlling the Compilation Driver.  */
 
-#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf}"
+#define ASM_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL} %{mxbpf:-mxbpf} " \
+  "%{masm=pseudoc:-mdialect=pseudoc}"
 #define LINK_SPEC "%{mbig-endian:-EB} %{!mbig-endian:-EL}"
 #define LIB_SPEC ""
 #define STA

Re: [PATCH v2] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h: Likewize.
>   * config/bpf/bpf.cc: Changed it to conform with new pseudoc
> dialect support.
>   * config/bpf/bpf.h: Likewise.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.

I think the ChangeLog could be made more useful, and the syntax of the
last entry is not entirely right.  I suggest something like:

* config/bpf/bpf.opt: Added option -masm=.
* config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
* config/bpf/bpf.cc (bpf_print_register): New function.
(bpf_print_register): Support pseudo-c syntax for registers.
(bpf_print_operand_address): Likewise.
* config/bpf/bpf.h (ASM_SPEC): handle -msasm.
(ASSEMBLER_DIALECT): Define.
* config/bpf/bpf.md: Added pseudo-c templates.
* doc/invoke.texi (-masm=DIALECT): New eBPF option item.

Please make sure to run the contrib/gcc-changelog/git_check-commit.py
script.

> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  gcc/doc/invoke.texi   | 21 -
>  6 files changed, 133 insertions(+), 56 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
> {
> - fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> + bpf_print_register (file, op0, 0);
> + fprintf (file, "+");
>   output_addr_const (file, op1);
> - fputs ("]", file);
> + fprintf (file, asm_dialect == ASM_NORMAL ? "]" : ")");
> }
>   else
> fatal_insn ("invalid address in operand", addr);
> @@ -1816,7 +1851,6 @@ handle_attr_preserve (function *fn)
>  }
>  }
>  
> -
>  /* This pass finds accesses to structures marked with the BPF target 
> attribute
> __attribute__((preserve_access_index)). For every such access, a CO-RE
> relocation record is generated, to be output in the .BTF.ext section.  */
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index 344aca02d1bb..9561bf59b800 100644
> --- a/gcc/config/bpf/bpf.h

Re: [PATCH] mklog: Add --append option to auto add generate ChangeLog to patch file

2023-07-21 Thread Lehua Ding
Hi Martin,


Thank you for telling me about the Python code format specification.
I'm no idea how to add checks for pushed commits.
Anyway, first make sure I don't introduce new format errors myself.


Best,
Lehua

Fix sreal::to_int and implement sreal::to_nearest_int

2023-07-21 Thread Jan Hubicka via Gcc-patches
Fix sreal::to_int and implement sreal::to_nearest_int

while exploring new loop estimate dumps, I noticed that loop iterating 1.8
times by profile is etimated as iterating once instead of 2 by nb_estimate.
While nb_estimate should really be a sreal and I will convert it incrementally,
I found problem is in previous patch doing:

+ *nit = (snit + 0.5).to_int ();

this does not work for sreal because it has only constructor from integer, so
first 0.5 is rounded to 0 and then added to snit.

Some code uses sreal(1, -1) which produces 0.5, but it reuqires unnecessary
addition, so I decided to add to_nearest_int.  Testing it I noticed that to_int
is buggy:
  (sreal(3)/2).to_int () == 1
while
  (sreal(-3)/2).to_int () == -2
Probably not big deal in practice as we do not do conversions on
negative values.

Fix is easy, we need to correctly shift in positive values.  This patch fixes
it and adds the to_nearest_int alternative.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* sreal.cc (sreal::to_nearest_int): New.
(sreal_verify_basics): Verify also to_nearest_int.
(verify_aritmetics): Likewise.
(sreal_verify_conversions): New.
(sreal_cc_tests): Call sreal_verify_conversions.
* sreal.h: (sreal::to_nearest_int): Declare

diff --git a/gcc/sreal.cc b/gcc/sreal.cc
index 8e99d871420..606a571e339 100644
--- a/gcc/sreal.cc
+++ b/gcc/sreal.cc
@@ -116,7 +116,26 @@ sreal::to_int () const
   if (m_exp > 0)
 return sign * (SREAL_ABS ((int64_t)m_sig) << m_exp);
   if (m_exp < 0)
-return m_sig >> -m_exp;
+return sign * (SREAL_ABS ((int64_t)m_sig) >> -m_exp);
+  return m_sig;
+}
+
+/* Return nearest integer value of *this.  */
+
+int64_t
+sreal::to_nearest_int () const
+{
+  int64_t sign = SREAL_SIGN (m_sig);
+
+  if (m_exp <= -SREAL_BITS)
+return 0;
+  if (m_exp >= SREAL_PART_BITS)
+return sign * INTTYPE_MAXIMUM (int64_t);
+  if (m_exp > 0)
+return sign * (SREAL_ABS ((int64_t)m_sig) << m_exp);
+  if (m_exp < 0)
+return sign * ((SREAL_ABS ((int64_t)m_sig) >> -m_exp)
+  + ((SREAL_ABS (m_sig) >> (-m_exp - 1)) & 1));
   return m_sig;
 }
 
@@ -286,6 +305,8 @@ sreal_verify_basics (void)
 
   ASSERT_EQ (INT_MIN/2, minimum.to_int ());
   ASSERT_EQ (INT_MAX/2, maximum.to_int ());
+  ASSERT_EQ (INT_MIN/2, minimum.to_nearest_int ());
+  ASSERT_EQ (INT_MAX/2, maximum.to_nearest_int ());
 
   ASSERT_FALSE (minus_two < minus_two);
   ASSERT_FALSE (seven < seven);
@@ -315,6 +336,10 @@ verify_aritmetics (int64_t a, int64_t b)
   ASSERT_EQ (a - b, (sreal (a) - sreal (b)).to_int ());
   ASSERT_EQ (b + a, (sreal (b) + sreal (a)).to_int ());
   ASSERT_EQ (b - a, (sreal (b) - sreal (a)).to_int ());
+  ASSERT_EQ (a + b, (sreal (a) + sreal (b)).to_nearest_int ());
+  ASSERT_EQ (a - b, (sreal (a) - sreal (b)).to_nearest_int ());
+  ASSERT_EQ (b + a, (sreal (b) + sreal (a)).to_nearest_int ());
+  ASSERT_EQ (b - a, (sreal (b) - sreal (a)).to_nearest_int ());
 }
 
 /* Verify arithmetics for interesting numbers.  */
@@ -377,6 +402,33 @@ sreal_verify_negative_division (void)
   ASSERT_EQ (sreal (1234567) / sreal (-1234567), sreal (-1));
 }
 
+static void
+sreal_verify_conversions (void)
+{
+  ASSERT_EQ ((sreal (11) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (11) / sreal (3)).to_nearest_int (), 4);
+  ASSERT_EQ ((sreal (10) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (10) / sreal (3)).to_nearest_int (), 3);
+  ASSERT_EQ ((sreal (9) / sreal (3)).to_int (), 3);
+  ASSERT_EQ ((sreal (9) / sreal (3)).to_nearest_int (), 3);
+  ASSERT_EQ ((sreal (-11) / sreal (3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-11) / sreal (3)).to_nearest_int (), -4);
+  ASSERT_EQ ((sreal (-10) / sreal (3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-10) / sreal (3)).to_nearest_int (), -3);
+  ASSERT_EQ ((sreal (-3)).to_int (), -3);
+  ASSERT_EQ ((sreal (-3)).to_nearest_int (), -3);
+  for (int i = -10 ; i < 10; i += 123)
+for (int j = -1 ; j < 10; j += 71)
+  if (j != 0)
+   {
+ sreal sval = ((sreal)i) / (sreal)j;
+ double val = (double)i / (double)j;
+ ASSERT_EQ ((fabs (sval.to_double () - val) < 0.1), true);
+ ASSERT_EQ (sval.to_int (), (int)val);
+ ASSERT_EQ (sval.to_nearest_int (), lround (val));
+   }
+}
+
 /* Run all of the selftests within this file.  */
 
 void sreal_cc_tests ()
@@ -385,6 +437,7 @@ void sreal_cc_tests ()
   sreal_verify_arithmetics ();
   sreal_verify_shifting ();
   sreal_verify_negative_division ();
+  sreal_verify_conversions ();
 }
 
 } // namespace selftest
diff --git a/gcc/sreal.h b/gcc/sreal.h
index 8700807a131..4dbb83c3005 100644
--- a/gcc/sreal.h
+++ b/gcc/sreal.h
@@ -51,6 +51,7 @@ public:
 
   void dump (FILE *) const;
   int64_t to_int () const;
+  int64_t to_nearest_int () const;
   double to_double () const;
   void stream_out (struct output_block *);
   static sreal stream_in (class lto_input_block *);


[pushed] Darwin: Handle linker '-demangle' option.

2023-07-21 Thread Iain Sandoe via Gcc-patches
Tested with Darwin linker versions that do/do not support the option
and on x86_64-linux-gnu, pushed to trunk, thanks
Iain

--- 8< ---

Most of the Darwin linkers in use support this option which we will
now pass by default (matching the Xcode clang impl.)>

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config.in: Regenerate.
* config/darwin.h (DARWIN_LD_DEMANGLE): New.
(LINK_COMMAND_SPEC_A): Add demangle handling.
* configure: Regenerate.
* configure.ac: Detect linker support for '-demangle'.
---
 gcc/config.in   |  9 -
 gcc/config/darwin.h |  7 +++
 gcc/configure   | 19 +++
 gcc/configure.ac| 14 ++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/gcc/config.in b/gcc/config.in
index 0e62b9fbfc9..5cf51bc1b01 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2178,6 +2178,12 @@
 #endif
 
 
+/* Define to 1 if ld64 supports '-demangle'. */
+#ifndef USED_FOR_TARGET
+#undef LD64_HAS_DEMANGLE
+#endif
+
+
 /* Define to 1 if ld64 supports '-export_dynamic'. */
 #ifndef USED_FOR_TARGET
 #undef LD64_HAS_EXPORT_DYNAMIC
@@ -2239,7 +2245,8 @@
 #endif
 
 
-/* Define to the sub-directory where libtool stores uninstalled libraries. */
+/* Define to the sub-directory in which libtool stores uninstalled libraries.
+   */
 #ifndef USED_FOR_TARGET
 #undef LT_OBJDIR
 #endif
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 1b538c73593..e0e8672a455 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -270,6 +270,12 @@ extern GTY(()) int darwin_ms_struct;
   "%&6; }
 gcc_cv_ld64_major=`echo "$gcc_cv_ld64_version" | sed -e 's/\..*//'`
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_major" >&5
 $as_echo "$gcc_cv_ld64_major" >&6; }
+if test "$gcc_cv_ld64_major" -ge 97; then
+  gcc_cv_ld64_demangle=1
+fi
 if test "$gcc_cv_ld64_major" -ge 236; then
   gcc_cv_ld64_export_dynamic=1
 fi
@@ -30517,6 +30521,15 @@ $as_echo_n "checking linker version... " >&6; }
 { $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_version" >&5
 $as_echo "$gcc_cv_ld64_version" >&6; }
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for -demangle 
support" >&5
+$as_echo_n "checking linker for -demangle support... " >&6; }
+gcc_cv_ld64_demangle=1
+if $gcc_cv_ld -demangle < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
+  gcc_cv_ld64_demangle=0
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_ld64_demangle" >&5
+$as_echo "$gcc_cv_ld64_demangle" >&6; }
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking linker for 
-export_dynamic support" >&5
 $as_echo_n "checking linker for -export_dynamic support... " >&6; }
 gcc_cv_ld64_export_dynamic=1
@@ -30545,6 +30558,12 @@ _ACEOF
   fi
 
 
+cat >>confdefs.h <<_ACEOF
+#define LD64_HAS_DEMANGLE $gcc_cv_ld64_demangle
+_ACEOF
+
+
+
 cat >>confdefs.h <<_ACEOF
 #define LD64_HAS_EXPORT_DYNAMIC $gcc_cv_ld64_export_dynamic
 _ACEOF
diff --git a/gcc/configure.ac b/gcc/configure.ac
index e91073ba831..46e58a27661 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6211,6 +6211,7 @@ if test x"$ld64_flag" = x"yes"; then
   # Set defaults for possibly untestable items.
   gcc_cv_ld64_export_dynamic=0
   gcc_cv_ld64_platform_version=0
+  gcc_cv_ld64_demangle=0
 
   if test "$build" = "$host"; then
 darwin_try_test=1
@@ -6232,6 +6233,9 @@ if test x"$ld64_flag" = x"yes"; then
 AC_MSG_CHECKING(ld64 specified version)
 gcc_cv_ld64_major=`echo "$gcc_cv_ld64_version" | sed -e 's/\..*//'`
 AC_MSG_RESULT($gcc_cv_ld64_major)
+if test "$gcc_cv_ld64_major" -ge 97; then
+  gcc_cv_ld64_demangle=1
+fi
 if test "$gcc_cv_ld64_major" -ge 236; then
   gcc_cv_ld64_export_dynamic=1
 fi
@@ -6246,6 +6250,13 @@ if test x"$ld64_flag" = x"yes"; then
 fi
 AC_MSG_RESULT($gcc_cv_ld64_version)
 
+AC_MSG_CHECKING(linker for -demangle support)
+gcc_cv_ld64_demangle=1
+if $gcc_cv_ld -demangle < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
+  gcc_cv_ld64_demangle=0
+fi
+AC_MSG_RESULT($gcc_cv_ld64_demangle)
+
 AC_MSG_CHECKING(linker for -export_dynamic support)
 gcc_cv_ld64_export_dynamic=1
 if $gcc_cv_ld -export_dynamic < /dev/null 2>&1 | grep 'unknown option' > 
/dev/null; then
@@ -6266,6 +6277,9 @@ if test x"$ld64_flag" = x"yes"; then
   [Define to ld64 version.])
   fi
 
+  AC_DEFINE_UNQUOTED(LD64_HAS_DEMANGLE, $gcc_cv_ld64_demangle,
+  [Define to 1 if ld64 supports '-demangle'.])
+
   AC_DEFINE_UNQUOTED(LD64_HAS_EXPORT_DYNAMIC, $gcc_cv_ld64_export_dynamic,
   [Define to 1 if ld64 supports '-export_dynamic'.])
 
-- 
2.39.2 (Apple Git-143)



Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches

Thanks for the suggestions/fixes in changelog.
Inlined new patch.

Cupertino

>> gcc/ChangeLog:
>>
>>  * config/bpf/bpf.opt: Added option -masm=.
>>  * config/bpf/bpf-opts.h: Likewize.
>>  * config/bpf/bpf.cc: Changed it to conform with new pseudoc
>>dialect support.
>>  * config/bpf/bpf.h: Likewise.
>>  * config/bpf/bpf.md: Added pseudo-c templates.
>>  * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
>
> I think the ChangeLog could be made more useful, and the syntax of the
> last entry is not entirely right.  I suggest something like:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>   * config/bpf/bpf.cc (bpf_print_register): New function.
>   (bpf_print_register): Support pseudo-c syntax for registers.
>   (bpf_print_operand_address): Likewise.
>   * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>   (ASSEMBLER_DIALECT): Define.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi (-masm=DIALECT): New eBPF option item.
>
> Please make sure to run the contrib/gcc-changelog/git_check-commit.py
> script.
>

>From 6ebe3229a59b32ffb2ed24b3a2cf8c360a807c31 Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Mon, 17 Jul 2023 17:42:42 +0100
Subject: [PATCH v3] bpf: pseudo-c assembly dialect support

New pseudo-c BPF assembly dialect already supported by clang and widely
used in the linux kernel.

gcc/ChangeLog:

	* config/bpf/bpf.opt: Added option -masm=.
	* config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
	* config/bpf/bpf.cc (bpf_print_register): New function.
	(bpf_print_register): Support pseudo-c syntax for registers.
	(bpf_print_operand_address): Likewise.
	* config/bpf/bpf.h (ASM_SPEC): handle -msasm.
	(ASSEMBLER_DIALECT): Define.
	* config/bpf/bpf.md: Added pseudo-c templates.
	* doc/invoke.texi (-masm=): New eBPF option item.
---
 gcc/config/bpf/bpf-opts.h |  6 +++
 gcc/config/bpf/bpf.cc | 46 ---
 gcc/config/bpf/bpf.h  |  5 +-
 gcc/config/bpf/bpf.md | 97 ---
 gcc/config/bpf/bpf.opt| 14 ++
 gcc/doc/invoke.texi   | 21 -
 6 files changed, 133 insertions(+), 56 deletions(-)

diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
index 8282351cf045..92db01ec4d54 100644
--- a/gcc/config/bpf/bpf-opts.h
+++ b/gcc/config/bpf/bpf-opts.h
@@ -60,4 +60,10 @@ enum bpf_isa_version
   ISA_V3,
 };
 
+enum bpf_asm_dialect
+{
+  ASM_NORMAL,
+  ASM_PSEUDOC
+};
+
 #endif /* ! BPF_OPTS_H */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index e0324e1e0e08..1d3936871d60 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -873,16 +873,47 @@ bpf_output_call (rtx target)
   return "";
 }
 
+/* Print register name according to assembly dialect.
+   In normal syntax registers are printed like %rN where N is the
+   register number.
+   In pseudoc syntax, the register names do not feature a '%' prefix.
+   Additionally, the code 'w' denotes that the register should be printed
+   as wN instead of rN, where N is the register number, but only when the
+   value stored in the operand OP is 32-bit wide.  */
+static void
+bpf_print_register (FILE *file, rtx op, int code)
+{
+  if(asm_dialect == ASM_NORMAL)
+fprintf (file, "%s", reg_names[REGNO (op)]);
+  else
+{
+  if (code == 'w' && GET_MODE (op) == SImode)
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "w10");
+	  else
+	fprintf (file, "w%s", reg_names[REGNO (op)]+2);
+	}
+  else
+	{
+	  if (REGNO (op) == BPF_FP)
+	fprintf (file, "r10");
+	  else
+	fprintf (file, "%s", reg_names[REGNO (op)]+1);
+	}
+}
+}
+
 /* Print an instruction operand.  This function is called in the macro
PRINT_OPERAND defined in bpf.h */
 
 void
-bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
+bpf_print_operand (FILE *file, rtx op, int code)
 {
   switch (GET_CODE (op))
 {
 case REG:
-  fprintf (file, "%s", reg_names[REGNO (op)]);
+  bpf_print_register (file, op, code);
   break;
 case MEM:
   output_address (GET_MODE (op), XEXP (op, 0));
@@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
   switch (GET_CODE (addr))
 {
 case REG:
-  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+  bpf_print_register (file, addr, 0);
+  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
   break;
 case PLUS:
   {
@@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
 
 	if (GET_CODE (op0) == REG && GET_CODE (op1) == CONST_INT)
 	  {
-	fprintf (file, "[%s+", reg_names[REGNO (op0)]);
+	fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
+	bpf_print_register (file, op0, 0);
+	fprintf (file, "+");
 	output_addr_const (file, op1);
-	fputs ("]", file);
+	fprintf (file, asm_dialect == ASM_NO

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Ben Boeckel via Gcc-patches
On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:
> On 7/19/23 20:47, Ben Boeckel wrote:
> > But it is inhibiting distributed builds because the distributing tool
> > would need to know:
> > 
> > - what CMIs are actually imported (here, "read the module mapper file"
> >(in CMake's case, this is only the modules that are needed; a single
> >massive mapper file for an entire project would have extra entries) or
> >"act as a proxy for the socket/program specified" for other
> >approaches);
> 
> This information is in the machine (& human) README section of the CMI.

Ok. That leaves it up to distributing build tools to figure out at
least.

> > - read the CMIs as it sends to the remote side to gather any other CMIs
> >that may be needed (recursively);
> > 
> > Contrast this with the MSVC and Clang (17+) mechanism where the command
> > line contains everything that is needed and a single bolus can be sent.
> 
> um, the build system needs to create that command line? Where does the build 
> system get that information?  IIUC it'll need to read some file(s) to do that.

It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.

> > And relocatable is probably fine. How does it interact with reproducible
> > builds? Or are GCC CMIs not really something anyone should consider for
> > installation (even as a "here, maybe this can help consumers"
> > mechanism)?
> 
> Module CMIs should be considered a cacheable artifact.  They are neither 
> object 
> files nor source files.

Sure, cachable sounds fine. What about the installation?

--Ben


[PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-21 Thread Drew Ross via Gcc-patches
Simplifies (x << c) >> c where x is a signed integral type of
width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
on x86_64 and x86 targets.

PR middle-end/101955

gcc/ChangeLog:

* match.pd (x << c) >> c -> -(x & 1): New simplification.

gcc/testsuite/ChangeLog:

* gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 10 +
 gcc/testsuite/gcc.dg/pr101955.c | 69 +
 2 files changed, 79 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..820fc890e8e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && (wi::ltu_p (wi::to_wide (@1), element_precision (type
   (bit_and @0 (rshift { build_minus_one_cst (type); } @1
 
+/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
+   into -(X & 1).  */
+(simplify
+ (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
+ (with { tree cst = uniform_integer_cst_p (@1); }
+ (if (ANY_INTEGRAL_TYPE_P (type)
+  && !TYPE_UNSIGNED (type)
+  && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
+  (negate (bit_and (convert @0) { build_one_cst (type); })
+
 /* Optimize x >> x into 0 */
 (simplify
  (rshift @0 @0)
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..386154911c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,69 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
+
+typedef int v4si __attribute__((vector_size(4 * sizeof(int;
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t2 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  int w = 31;
+  int y = x << w;
+  int z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t4 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t5 (long long x)
+{
+  long long y = x << 63;
+  long long z = y >> 63;
+  return z;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  int w = 63;
+  long long y = x << w;
+  long long z = y >> w;
+  return z;
+}
+
+__attribute__((noipa)) v4si
+t7 (v4si x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) v4si
+t8 (v4si x)
+{
+  v4si t = {31,31,31,31};
+  return (x << t) >> t;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
+/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
+/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
+
-- 
2.39.3



Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


> Thanks for the suggestions/fixes in changelog.
> Inlined new patch.
>
> Cupertino
>
>>> gcc/ChangeLog:
>>>
>>> * config/bpf/bpf.opt: Added option -masm=.
>>> * config/bpf/bpf-opts.h: Likewize.
>>> * config/bpf/bpf.cc: Changed it to conform with new pseudoc
>>>   dialect support.
>>> * config/bpf/bpf.h: Likewise.
>>> * config/bpf/bpf.md: Added pseudo-c templates.
>>> * doc/invoke.texi: (-masm=DIALECT) New eBPF option item.
>>
>> I think the ChangeLog could be made more useful, and the syntax of the
>> last entry is not entirely right.  I suggest something like:
>>
>>  * config/bpf/bpf.opt: Added option -masm=.
>>  * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>>  * config/bpf/bpf.cc (bpf_print_register): New function.
>>  (bpf_print_register): Support pseudo-c syntax for registers.
>>  (bpf_print_operand_address): Likewise.
>>  * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>>  (ASSEMBLER_DIALECT): Define.
>>  * config/bpf/bpf.md: Added pseudo-c templates.
>>  * doc/invoke.texi (-masm=DIALECT): New eBPF option item.
>>
>> Please make sure to run the contrib/gcc-changelog/git_check-commit.py
>> script.
>>
>
> From 6ebe3229a59b32ffb2ed24b3a2cf8c360a807c31 Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Mon, 17 Jul 2023 17:42:42 +0100
> Subject: [PATCH v3] bpf: pseudo-c assembly dialect support
>
> New pseudo-c BPF assembly dialect already supported by clang and widely
> used in the linux kernel.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.opt: Added option -masm=.
>   * config/bpf/bpf-opts.h (enum bpf_asm_dialect): New type.
>   * config/bpf/bpf.cc (bpf_print_register): New function.
>   (bpf_print_register): Support pseudo-c syntax for registers.
>   (bpf_print_operand_address): Likewise.
>   * config/bpf/bpf.h (ASM_SPEC): handle -msasm.
>   (ASSEMBLER_DIALECT): Define.
>   * config/bpf/bpf.md: Added pseudo-c templates.
>   * doc/invoke.texi (-masm=): New eBPF option item.
> ---
>  gcc/config/bpf/bpf-opts.h |  6 +++
>  gcc/config/bpf/bpf.cc | 46 ---
>  gcc/config/bpf/bpf.h  |  5 +-
>  gcc/config/bpf/bpf.md | 97 ---
>  gcc/config/bpf/bpf.opt| 14 ++
>  gcc/doc/invoke.texi   | 21 -
>  6 files changed, 133 insertions(+), 56 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf-opts.h b/gcc/config/bpf/bpf-opts.h
> index 8282351cf045..92db01ec4d54 100644
> --- a/gcc/config/bpf/bpf-opts.h
> +++ b/gcc/config/bpf/bpf-opts.h
> @@ -60,4 +60,10 @@ enum bpf_isa_version
>ISA_V3,
>  };
>  
> +enum bpf_asm_dialect
> +{
> +  ASM_NORMAL,
> +  ASM_PSEUDOC
> +};
> +
>  #endif /* ! BPF_OPTS_H */
> diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
> index e0324e1e0e08..1d3936871d60 100644
> --- a/gcc/config/bpf/bpf.cc
> +++ b/gcc/config/bpf/bpf.cc
> @@ -873,16 +873,47 @@ bpf_output_call (rtx target)
>return "";
>  }
>  
> +/* Print register name according to assembly dialect.
> +   In normal syntax registers are printed like %rN where N is the
> +   register number.
> +   In pseudoc syntax, the register names do not feature a '%' prefix.
> +   Additionally, the code 'w' denotes that the register should be printed
> +   as wN instead of rN, where N is the register number, but only when the
> +   value stored in the operand OP is 32-bit wide.  */
> +static void
> +bpf_print_register (FILE *file, rtx op, int code)
> +{
> +  if(asm_dialect == ASM_NORMAL)
> +fprintf (file, "%s", reg_names[REGNO (op)]);
> +  else
> +{
> +  if (code == 'w' && GET_MODE (op) == SImode)
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "w10");
> +   else
> + fprintf (file, "w%s", reg_names[REGNO (op)]+2);
> + }
> +  else
> + {
> +   if (REGNO (op) == BPF_FP)
> + fprintf (file, "r10");
> +   else
> + fprintf (file, "%s", reg_names[REGNO (op)]+1);
> + }
> +}
> +}
> +
>  /* Print an instruction operand.  This function is called in the macro
> PRINT_OPERAND defined in bpf.h */
>  
>  void
> -bpf_print_operand (FILE *file, rtx op, int code ATTRIBUTE_UNUSED)
> +bpf_print_operand (FILE *file, rtx op, int code)
>  {
>switch (GET_CODE (op))
>  {
>  case REG:
> -  fprintf (file, "%s", reg_names[REGNO (op)]);
> +  bpf_print_register (file, op, code);
>break;
>  case MEM:
>output_address (GET_MODE (op), XEXP (op, 0));
> @@ -936,7 +967,9 @@ bpf_print_operand_address (FILE *file, rtx addr)
>switch (GET_CODE (addr))
>  {
>  case REG:
> -  fprintf (file, "[%s+0]", reg_names[REGNO (addr)]);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "[" : "(");
> +  bpf_print_register (file, addr, 0);
> +  fprintf (file, asm_dialect == ASM_NORMAL ? "+0]" : "+0)");
>break;
>  case PLUS:
>{
> @@ -945,9 +978,11 @@ bpf_print_operand_address (FILE *file, rtx addr)
>  
>   if (GET_CODE (op0

Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-07-21 at 16:58 +0300, Alexander Monakov wrote:
> 
> On Fri, 21 Jul 2023, Xi Ruoyao via Gcc-patches wrote:
> 
> > Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
> > are building GCC 14 snapshot).  The default is "fast" (if no -std=
> > option is used), which allows some contractions disallowed by the
> > standard.
> 
> Not fully, see below.
> 
> > But GCC is in C++ and I'm not sure if the C++ standard has the same
> > definition for allowed contractions as C.
> 
> It doesn't, but in GCC we should aim to provide the same semantics in C++
> as in C.
> 
> > > (Or is the severity of lack of support sufficiently different in the two 
> > > cases that this is fine -- i.e. not compile vs may trigger floating 
> > > point rounding inaccuracies?)
> > 
> > It's possible that the test itself is flaky.  Can you provide some
> > detail about how it fails?
> 
> See also PR 99903 for an earlier known issue which appears due to x87
> excess precision and so tweaking -ffp-contract wouldn't help:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903

Does it affect AArch64 too?

> Now that multiple platforms are hitting this, can we _please_ get rid
> of the questionable attempt to compute time in a floating-point variable
> and just use an uint64_t storing nanoseconds?

To me this is the correct thing to do.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] testsuite/110763: Ensure zero return from test

2023-07-21 Thread Siddhesh Poyarekar
The test deliberately reads beyond bounds to exersize ubsan and the
return value may be anything, based on previous allocations.  The OFF
test caters for it by ANDing the return with 0, do the same for the DYN
test.

gcc/testsuite/ChangeLog:

PR testsuite/110763
* gcc.dg/ubsan/object-size-dyn.c (dyn): New parameter RET.
(main): Use it.

Signed-off-by: Siddhesh Poyarekar 
---
 gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c 
b/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
index 0159f5b9820..49c3abe2e72 100644
--- a/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
+++ b/gcc/testsuite/gcc.dg/ubsan/object-size-dyn.c
@@ -5,12 +5,12 @@
 
 int
 __attribute__ ((noinline))
-dyn (int size, int i)
+dyn (int size, int i, int ret)
 {
   __builtin_printf ("dyn\n");
   fflush (stdout);
   int *alloc = __builtin_calloc (size, sizeof (int));
-  int ret = alloc[i];
+  ret = ret & alloc[i];
   __builtin_free (alloc);
   return ret;
 }
@@ -28,7 +28,7 @@ off (int size, int i, int ret)
 int
 main (void)
 {
-  int ret = dyn (2, 2);
+  int ret = dyn (2, 2, 0);
 
   ret |= off (4, 4, 0);
 
-- 
2.41.0



Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Matthew Malcomson via Gcc-patches

On 7/21/23 14:45, Xi Ruoyao wrote:

On Fri, 2023-07-21 at 14:11 +0100, Matthew Malcomson wrote:

My understanding is that this is not a hardware bug and that it's
specified that rounding does not happen on the multiply "sub-part" in
`FNMSUB`, but rounding happens on the `FMUL` that generates some input
to it.


AFAIK the C standard does only say "A floating *expression* may be
contracted".  I.e:

double r = a * b + c;

may be compiled to use FMA because "a * b + c" is a floating point
expression.  But

double t = a * b;
double r = t + c;

is not, because "a * b" and "t + c" are two separate floating point
expressions.

So a contraction across two functions is not allowed.  We now have -ffp-
contract=on (https://gcc.gnu.org/r14-2023) to only allow C-standard
contractions.

Perhaps -ffp-contract=on (not off) is enough to fix the issue (if you
are building GCC 14 snapshot).  The default is "fast" (if no -std=
option is used), which allows some contractions disallowed by the
standard.

But GCC is in C++ and I'm not sure if the C++ standard has the same
definition for allowed contractions as C.



Thanks -- I'll look into whether `-ffp-contract=on` works.



It's possible that the test itself is flaky.  Can you provide some
detail about how it fails?



Sure -- The outline is that `timer::validate_phases` sees the sum of 
sub-part timers as greater than the timer for the "overall" time 
(outside of a tolerance of 1.01).  It then complains and hits 
`gcc_unreachable()`.


While I found it difficult to get enough information out of the test 
that is run in the testsuite, I found that if passing an invalid 
argument to `cc1plus` all sub-parts would be zero, and sometimes the 
"total" would be negative.


This was due to the `times` syscall returning the same clock tick for 
start and end of the "total" timer and the difference in rounding 
between FNMSUB and FMUL means that depending on what that clock tick is 
the "elapsed time" can end up calculated as negative.


I didn't proove it 100% but I believe the same fundamental difference 
(but opposite rounding error) could trigger the testsuite failure -- if 
the "end" of one sub-phase timer is greater than the "start" of another 
sub-phase timer then sum of parts could be greater than total.


There is a "tolerance" in this test that I considered increasing, but 
since that would not affect the "invalid arguments" thing (where the 
total is negative and hence the tolerance multiplication of 1.01 
would have to be supplemented by a positive offset) I suggested avoiding 
the inline.


W.r.t. the x86 bug that Alexander Monakov has pointed to, it's a very 
similar thing but in this case the problem is not bit-precision of 
values after the inlining, but rather a difference between fused and not 
fused operations after the inlining.


Agreed that using integral arithmetic is the more robust solution.


[WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-21 Thread Benjamin Priour via Gcc-patches
Hi,

Upon David's request I've joined the in progress patch to the below email.
I hope it makes more sense now.

Best,
Benjamin.

-- Forwarded message -
From: Benjamin Priour 
Date: Tue, Jul 18, 2023 at 3:30 PM
Subject: [RFC] analyzer: Add optional trim of the analyzer diagnostics
going too deep [PR110543]
To: , David Malcolm 


Hi,

I'd like to request comments on a patch I am writing for PR110543.
The goal of this patch is to reduce the noise of the analyzer emitted
diagnostics when dealing with
system headers, or simply diagnostic paths that are too long. The new
option only affects the display
of the diagnostics, but doesn't hinder the actual analysis.

I've defaulted the new option to "system", thus preventing the diagnostic
paths from showing system headers.
"never" corresponds to the pre-patch behavior, whereas you can also specify
an unsigned value 
that prevents paths to go deeper than  frames.

fanalyzer-trim-diagnostics=
> Common Joined RejectNegative ToLower Var(flag_analyzer_trim_diagnostics)
> Init("system")
> -fanalyzer-trim-diagnostics=[never|system|] Trim diagnostics
> path that are too long before emission.
>

Does it sounds reasonable and user-friendly ?

Regstrapping was a success against trunk, although one of the newly added
test case fails for c++14.
Note that the test case below was done with "never", thus behaves exactly
as the pre-patch analyzer
on x86_64-linux-gnu.

/* { dg-additional-options "-fdiagnostics-plain-output
> -fdiagnostics-path-format=inline-events -fanalyzer-trim-diagnostics=never"
> } */
> /* { dg-skip-if "" { c++98_only }  } */
>
> #include 
> struct A {int x; int y;};
>
> int main () {
>   std::shared_ptr a;
>   a->x = 4; /* { dg-line deref_a } */
>   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a } */
>
>   return 0;
> }
>
> /* { dg-begin-multiline-output "" }
>   'int main()': events 1-2
> |
> |
> +--> 'std::__shared_ptr_access<_Tp, _Lp, , 
> >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
>  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> false]': events 3-4
>|
>|
>+--> 'std::__shared_ptr_access<_Tp, _Lp, ,
>  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> ,  >::_M_get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> false; bool  = false]': events 5-6
>   |
>   |
>   +--> 'std::__shared_ptr<_Tp, _Lp>::element_type*
> std::__shared_ptr<_Tp, _Lp>::get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]': events 7-8
>  |
>  |
>   <--+
>   |
> 'std::__shared_ptr_access<_Tp, _Lp, ,
>  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> ,  >::_M_get() const [with _Tp = A;
> __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> false; bool  = false]': event 9
>   |
>   |
><--+
>|
>  'std::__shared_ptr_access<_Tp, _Lp, , 
> >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
>  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> false]': event 10
>|
>|
> <--+
> |
>   'int main()': events 11-12
> |
> |
>{ dg-end-multiline-output "" } */
>


The first events "'int main()': events 1-2" vary in c++14 (get events 1-3).

>
> // c++14 with fully detailed output
>   ‘int main()’: events 1-3
> |
> |8 | int main () {
> |  | ^~~~
> |  | |
> |  | (1) entry to ‘main’
> |9 |   std::shared_ptr a;
> |  |  ~
> |  |  |
> |  |  (2)
> ‘a.std::shared_ptr::.std::__shared_ptr __gnu_cxx::_S_atomic>::_M_ptr’ is NULL
> |   10 |   a->x = 4; /* { dg-line deref_a } */
> |  |~~
> |  ||
> |  |(3) calling ‘std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->’ from ‘main’
>

whereas c++17 and posterior give

> // c++17 with fully detailed output
>
// ./xg++ -fanalyzer
>  ../../gcc/gcc/testsuite/g++.dg/analyzer/fanalyzer-trim-diagnostics-never.C
>  -B. -shared-libgcc -fanalyzer-trim-diagnostics=never -std=c++17
>
  ‘int main()’: events 1-2
> |
> |8 | int main () {
> |  | ^~~~
> |  | |
> |  | (1) entry to ‘main’
> |9 |   std::shared_ptr a;
> |   10 |   a->x = 4; /* { dg-line deref_a } */
> |  |~~
> |  ||
> |  |(2) calling ‘std::__shared_ptr_access __gnu_cxx::_S_atomic, false, false>::operator->’ from ‘main’
>

Is there a way to make dg-multiline-output check for a regex ? Or would
checking the multiline-output only for c++17 and c++20 be acceptable ?
This 

Implement flat loop profile detection

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds maybe_flat_loop_profile which can be used in loop profile udpate
to detect situation where the profile may be unrealistically flat and should
not be dwonscalled after vectorizing, unrolling and other transforms that
assume that loop has high iteration count even if the CFG profile says
otherwise.

Profile is flat if it was statically detected and at that time we had
no idea about actual number of iterations or we artificially capped them.
So the function considers flat all profiles that have guessed or lower
reliability in their count and there is no nb_iteration_bounds/estimate
which would prove that the profile iteration count is high enough.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfgloop.h (maybe_flat_loop_profile): Declare
* cfgloopanal.cc (maybe_flat_loop_profile): New function.
* tree-cfg.cc (print_loop_info): Print info about flat profiles.

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 269694c7962..22293e1c237 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -407,6 +407,7 @@ gcov_type expected_loop_iterations_unbounded (const class 
loop *,
 extern bool expected_loop_iterations_by_profile (const class loop *loop,
 sreal *ret,
 bool *reliable = NULL);
+extern bool maybe_flat_loop_profile (const class loop *);
 extern unsigned expected_loop_iterations (class loop *);
 extern rtx doloop_condition_get (rtx_insn *);
 
diff --git a/gcc/cfgloopanal.cc b/gcc/cfgloopanal.cc
index c86a537f024..d8923b27e5d 100644
--- a/gcc/cfgloopanal.cc
+++ b/gcc/cfgloopanal.cc
@@ -303,6 +303,67 @@ expected_loop_iterations_by_profile (const class loop 
*loop, sreal *ret,
   return true;
 }
 
+/* Return true if loop CFG profile may be unrealistically flat.
+   This is a common case, since average loops iterate only about 5 times.
+   In the case we do not have profile feedback or do not know real number of
+   iterations during profile estimation, we are likely going to predict it with
+   similar low iteration count.  For static loop profiles we also artificially
+   cap profile of loops with known large iteration count so they do not appear
+   significantly more hot than other loops with unknown iteration counts.
+
+   For loop optimization heuristics we ignore CFG profile and instead
+   use get_estimated_loop_iterations API which returns estimate
+   only when it is realistic.  For unknown counts some optimizations,
+   like vectorizer or unroller make guess that iteration count will
+   be large.  In this case we need to avoid scaling down the profile
+   after the loop transform.  */
+
+bool
+maybe_flat_loop_profile (const class loop *loop)
+{
+  bool reliable;
+  sreal ret;
+
+  if (!expected_loop_iterations_by_profile (loop, &ret, &reliable))
+return true;
+
+  /* Reliable CFG estimates ought never be flat.  Sanity check with
+ nb_iterations_estimate.  If those differ, it is a but in profile
+ updating code  */
+  if (reliable)
+{
+  int64_t intret = ret.to_nearest_int ();
+  if (loop->any_estimate
+ && (wi::ltu_p (intret * 2, loop->nb_iterations_estimate)
+ || wi::gtu_p (intret, loop->nb_iterations_estimate * 2)))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+   "Loop %i has inconsistent iterations estimates: "
+   "reliable CFG based iteration estimate is %f "
+   "while nb_iterations_estimate is %i\n",
+   loop->num,
+   ret.to_double (),
+   (int)loop->nb_iterations_estimate.to_shwi ());
+ return true;
+   }
+  return false;
+}
+
+  /* Allow some margin of error and see if we are close to known bounds.
+ sreal (9,-3) is 9/8  */
+  int64_t intret = (ret * sreal (9, -3)).to_nearest_int ();
+  if (loop->any_upper_bound && wi::geu_p (intret, 
loop->nb_iterations_upper_bound))
+return false;
+  if (loop->any_likely_upper_bound
+  && wi::geu_p (intret, loop->nb_iterations_likely_upper_bound))
+return false;
+  if (loop->any_estimate
+  && wi::geu_p (intret, loop->nb_iterations_estimate))
+return false;
+  return true;
+}
+
 /* Returns expected number of iterations of LOOP, according to
measured or guessed profile.
 
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index a6c97a04662..c65af8cc800 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -8523,8 +8523,11 @@ print_loop_info (FILE *file, const class loop *loop, 
const char *prefix)
   bool reliable;
   sreal iterations;
   if (loop->num && expected_loop_iterations_by_profile (loop, &iterations, 
&reliable))
-fprintf (file, "\n%siterations by profile: %f %s", prefix,
-iterations.to_double (), reliable ? "(reliable)" : "(unreliable)");
+{
+  fprintf (file, "\n%siterations by profile: %f (%s%s)", prefix,
+  iterations.to_doubl

Fix gcc.dg/tree-ssa/copy-headers-9.c and gcc.dg/tree-ssa/dce-1.c failures

2023-07-21 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes template in the two testcases so it matches the output
correctly.  I did not re-test after last changes in the previous patch,
sorry for that.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/copy-headers-9.c: Fix template for 
tree-ssa-loop-ch.cc changes.
* gcc.dg/tree-ssa/dce-1.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
index 7cc162ca94d..b49d1fc9576 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-9.c
@@ -13,8 +13,7 @@ void test (int m, int n)
}
while (i<10);
 }
-/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 1 "ch2" } } */
-/* { dg-final { scan-tree-dump-times "May duplicate bb" 1 "ch2" } } */
-/* { dg-final { scan-tree-dump-times "Duplicating additional BB to obtain 
do-while loop" 1 "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win" 2 "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Duplicating bb . is a win. it has zero" 
1 "ch2" } } */
 /* { dg-final { scan-tree-dump-times "Will duplicate bb" 2 "ch2" } } */
 /* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
index 91c3bcd6c1c..3ebfa988503 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/dce-1.c
@@ -13,6 +13,6 @@ int foo (int b, int j)
 }
 /* Check that empty loop is eliminated in this case.  We should no longer have
the exit condition after the loop.  */
-/* { dg-final { scan-tree-dump-not "999)" "cddce1"} } */
-/* { dg-final { scan-tree-dump-not "1000)" "cddce1"} } */
+/* { dg-final { scan-tree-dump-not "999\\)" "cddce1"} } */
+/* { dg-final { scan-tree-dump-not "1000\\)" "cddce1"} } */
 


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Alexander Monakov


On Fri, 21 Jul 2023, Xi Ruoyao wrote:

> > See also PR 99903 for an earlier known issue which appears due to x87
> > excess precision and so tweaking -ffp-contract wouldn't help:
> > 
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99903
> 
> Does it affect AArch64 too?

Well, not literally (AArch64 doesn't have excess precision), but absence
of intermediate rounding in FMA is similar to excess precision.

I'm saying it's the same issue manifesting via different pathways on x86
and aarch64. Sorry if I misunderstood your question.

Alexander


[PATCH] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..bb414d8a4428 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -142,7 +142,7 @@
   [(set (match_operand:AM 0 "register_operand" "=r")
 (neg:AM (match_operand:AM 1 "register_operand" " 0")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



Re: [PATCH v2] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
>From 9db2044c1d20bd9f05acf3c910ad0ffc9d5fda8f Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Fri, 21 Jul 2023 17:40:07 +0100
Subject: [PATCH v2] bpf: fixed template for neg (added second operand)

gcc/ChangeLog:

	* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..2ba862f3935a 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r")
+(neg:AM (match_operand:AM 1 "register_operand" " r")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



Re: [PATCH] bpf: fixed template for neg (added second operand)

2023-07-21 Thread David Faust via Gcc-patches
Hi Cupertino,

On 7/21/23 09:43, Cupertino Miranda wrote:
> gcc/ChangeLog:
> 
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..bb414d8a4428 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -142,7 +142,7 @@
>[(set (match_operand:AM 0 "register_operand" "=r")
>  (neg:AM (match_operand:AM 1 "register_operand" " 0")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])

I think you will need to update the constraint for the second
operand as well; it could be any register, or a 32-bit immediate.

>  
>  ;;; Multiplication


Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-21 Thread Jan Hubicka via Gcc-patches
Avoid scaling flat loop profiles of vectorized loops

As discussed, when vectorizing loop with static profile, it is not always good 
idea
to divide the header frequency by vectorization factor because the profile may
not realistically represent the expected number of iterations.  Since in such 
cases
we default to relatively low iteration counts (based on average for spec2k17), 
this
will make vectorized loop body look cold.

This patch makes vectorizer to look for flat profiles and only possibly reduce 
the
profile by known upper bound on iteration counts.

Bootstrapp/regtested of x86_64-linux in progress. I intend to commit this after
testers pick other profile related changes from today.
Tamar, Richard, it would be nice to know if it fixes the testcase you was 
looking at
and possibly turn it into a testcase?

gcc/ChangeLog:

* tree-vect-loop.cc (scale_profile_for_vect_loop): Avoid scaling flat
profiles by vectorization factor.
(vect_transform_loop): Check for flat profiles.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..d036a7d4480 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10837,11 +10837,25 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
gimple_stmt_iterator *gsi,
 }
 
 /* Scale profiling counters by estimation for LOOP which is vectorized
-   by factor VF.  */
+   by factor VF.
+   If FLAT is true, the loop we started with had unrealistically flat
+   profile.  */
 
 static void
-scale_profile_for_vect_loop (class loop *loop, unsigned vf)
+scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
 {
+  /* For flat profiles do not scale down proportionally by VF and only
+ cap by known iteration count bounds.  */
+  if (flat)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Vectorized loop profile seems flat; not scaling iteration "
+"count down by the vectorization factor %i\n", vf);
+  scale_loop_profile (loop, profile_probability::always (),
+ get_likely_max_loop_iterations_int (loop));
+  return;
+}
   /* Loop body executes VF fewer times and exit increases VF times.  */
   edge exit_e = single_exit (loop);
   profile_count entry_count = loop_preheader_edge (loop)->count ();
@@ -10852,7 +10866,13 @@ scale_profile_for_vect_loop (class loop *loop, 
unsigned vf)
   while (vf > 1
 && loop->header->count > entry_count
 && loop->header->count < entry_count * vf)
-vf /= 2;
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Vectorization factor %i seems too large for profile "
+"prevoiusly believed to be consistent; reducing.\n", vf);
+  vf /= 2;
+}
 
   if (entry_count.nonzero_p ())
 set_edge_probability_and_rescale_others
@@ -11184,6 +11204,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   gimple *stmt;
   bool check_profitability = false;
   unsigned int th;
+  bool flat = maybe_flat_loop_profile (loop);
 
   DUMP_VECT_SCOPE ("vec_transform_loop");
 
@@ -11252,7 +11273,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  &step_vector, &niters_vector_mult_vf, th,
  check_profitability, niters_no_overflow,
  &advance);
-
   if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
   && LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
 scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
@@ -11545,7 +11565,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
  assumed_vf) - 1
 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
   assumed_vf) - 1);
-  scale_profile_for_vect_loop (loop, assumed_vf);
+  scale_profile_for_vect_loop (loop, assumed_vf, flat);
 
   if (dump_enabled_p ())
 {


Re: [PATCH v3] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
>From 7756a4becd1934e55d6d14ac4a9fd6d408a4797b Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Fri, 21 Jul 2023 17:40:07 +0100
Subject: [PATCH v3] bpf: fixed template for neg (added second operand)

gcc/ChangeLog:

	* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.30.2



Re: [PATCH]AArch64 fix regexp for live_1.c sve test

2023-07-21 Thread Richard Sandiford via Gcc-patches
Jan Hubicka  writes:
> Avoid scaling flat loop profiles of vectorized loops
>
> As discussed, when vectorizing loop with static profile, it is not always 
> good idea
> to divide the header frequency by vectorization factor because the profile may
> not realistically represent the expected number of iterations.  Since in such 
> cases
> we default to relatively low iteration counts (based on average for 
> spec2k17), this
> will make vectorized loop body look cold.
>
> This patch makes vectorizer to look for flat profiles and only possibly 
> reduce the
> profile by known upper bound on iteration counts.
>
> Bootstrapp/regtested of x86_64-linux in progress. I intend to commit this 
> after
> testers pick other profile related changes from today.
> Tamar, Richard, it would be nice to know if it fixes the testcase you was 
> looking at
> and possibly turn it into a testcase?

Yeah, it does!  Thanks for the quick fix.

The test was gcc.target/aarch64/sve/live_1.c.  Although it wasn't
originally a profile test, I think it should still be a relatively good
way of testing that the latch is treated as more likely than the exit,
without needing to check for that explicitly.

Richard

>
> gcc/ChangeLog:
>
>   * tree-vect-loop.cc (scale_profile_for_vect_loop): Avoid scaling flat
>   profiles by vectorization factor.
>   (vect_transform_loop): Check for flat profiles.
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index b44fb9c7712..d036a7d4480 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10837,11 +10837,25 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> gimple_stmt_iterator *gsi,
>  }
>  
>  /* Scale profiling counters by estimation for LOOP which is vectorized
> -   by factor VF.  */
> +   by factor VF.
> +   If FLAT is true, the loop we started with had unrealistically flat
> +   profile.  */
>  
>  static void
> -scale_profile_for_vect_loop (class loop *loop, unsigned vf)
> +scale_profile_for_vect_loop (class loop *loop, unsigned vf, bool flat)
>  {
> +  /* For flat profiles do not scale down proportionally by VF and only
> + cap by known iteration count bounds.  */
> +  if (flat)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Vectorized loop profile seems flat; not scaling iteration "
> +  "count down by the vectorization factor %i\n", vf);
> +  scale_loop_profile (loop, profile_probability::always (),
> +   get_likely_max_loop_iterations_int (loop));
> +  return;
> +}
>/* Loop body executes VF fewer times and exit increases VF times.  */
>edge exit_e = single_exit (loop);
>profile_count entry_count = loop_preheader_edge (loop)->count ();
> @@ -10852,7 +10866,13 @@ scale_profile_for_vect_loop (class loop *loop, 
> unsigned vf)
>while (vf > 1
>&& loop->header->count > entry_count
>&& loop->header->count < entry_count * vf)
> -vf /= 2;
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> +  "Vectorization factor %i seems too large for profile "
> +  "prevoiusly believed to be consistent; reducing.\n", vf);
> +  vf /= 2;
> +}
>  
>if (entry_count.nonzero_p ())
>  set_edge_probability_and_rescale_others
> @@ -11184,6 +11204,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
>gimple *stmt;
>bool check_profitability = false;
>unsigned int th;
> +  bool flat = maybe_flat_loop_profile (loop);
>  
>DUMP_VECT_SCOPE ("vec_transform_loop");
>  
> @@ -11252,7 +11273,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> &step_vector, &niters_vector_mult_vf, th,
> check_profitability, niters_no_overflow,
> &advance);
> -
>if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)
>&& LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ())
>  scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo),
> @@ -11545,7 +11565,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
> *loop_vectorized_call)
> assumed_vf) - 1
>: wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed,
>  assumed_vf) - 1);
> -  scale_profile_for_vect_loop (loop, assumed_vf);
> +  scale_profile_for_vect_loop (loop, assumed_vf, flat);
>  
>if (dump_enabled_p ())
>  {


Re: [PATCH] match.pd: Implement missed optimization (x << c) >> c -> -(x & 1) [PR101955]

2023-07-21 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 21, 2023 at 8:09 AM Drew Ross via Gcc-patches
 wrote:
>
> Simplifies (x << c) >> c where x is a signed integral type of
> width >= int and c = precision(type) - 1 into -(x & 1). Tested successfully
> on x86_64 and x86 targets.

Thinking about this some more, I think this should be handled in
expand rather than on the gimple level.
It is very much related to PR 110717 even. We are basically truncating
to a signed one bit integer and then sign extending that across the
whole code.

Thanks,
Andrew

>
> PR middle-end/101955
>
> gcc/ChangeLog:
>
> * match.pd (x << c) >> c -> -(x & 1): New simplification.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr101955.c: New test.
> ---
>  gcc/match.pd| 10 +
>  gcc/testsuite/gcc.dg/pr101955.c | 69 +
>  2 files changed, 79 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101955.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..820fc890e8e 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3766,6 +3766,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& (wi::ltu_p (wi::to_wide (@1), element_precision (type
>(bit_and @0 (rshift { build_minus_one_cst (type); } @1
>
> +/* Optimize (X << C) >> C where C = precision(type) - 1 and X is signed
> +   into -(X & 1).  */
> +(simplify
> + (rshift (nop_convert? (lshift @0 uniform_integer_cst_p@1)) @@1)
> + (with { tree cst = uniform_integer_cst_p (@1); }
> + (if (ANY_INTEGRAL_TYPE_P (type)
> +  && !TYPE_UNSIGNED (type)
> +  && wi::eq_p (wi::to_wide (cst), element_precision (type) - 1))
> +  (negate (bit_and (convert @0) { build_one_cst (type); })
> +
>  /* Optimize x >> x into 0 */
>  (simplify
>   (rshift @0 @0)
> diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
> new file mode 100644
> index 000..386154911c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101955.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int;
> +
> +__attribute__((noipa)) int
> +t1 (int x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) int
> +t2 (int x)
> +{
> +  int y = x << 31;
> +  int z = y >> 31;
> +  return z;
> +}
> +
> +__attribute__((noipa)) int
> +t3 (int x)
> +{
> +  int w = 31;
> +  int y = x << w;
> +  int z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t4 (long long x)
> +{
> +  return (x << 63) >> 63;
> +}
> +
> +__attribute__((noipa)) long long
> +t5 (long long x)
> +{
> +  long long y = x << 63;
> +  long long z = y >> 63;
> +  return z;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long x)
> +{
> +  int w = 63;
> +  long long y = x << w;
> +  long long z = y >> w;
> +  return z;
> +}
> +
> +__attribute__((noipa)) v4si
> +t7 (v4si x)
> +{
> +  return (x << 31) >> 31;
> +}
> +
> +__attribute__((noipa)) v4si
> +t8 (v4si x)
> +{
> +  v4si t = {31,31,31,31};
> +  return (x << t) >> t;
> +}
> +
> +/* { dg-final { scan-tree-dump-not " >> " "dse1" } } */
> +/* { dg-final { scan-tree-dump-not " << " "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " -" 8 "dse1" } } */
> +/* { dg-final { scan-tree-dump-times " & " 8 "dse1" } } */
> +
> --
> 2.39.3
>


Re: Fix optimize_mask_stores profile update

2023-07-21 Thread Jan Hubicka via Gcc-patches
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%.  This patch fixes it.
> > Richi, I don't know how to create a testcase, but having one would
> > be nice.
> >
> > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > late email)
> 
> This should trigger with sth like
> 
>   for (i)
> if (cond[i])
>   out[i] = 1.;
> 
> so a masked store and then using AVX2+.  ISTR we disable AVX masked
> stores on zen (but not AVX512).

Richard,
if we know probability of if (cond[i]) to be p,
then we know that the combined conditional is somewhere between
  low = p  (the strategy packing true and falses into VF sized
blocks)
and
  high = min (p*vf,1)
   (the stragegy doing only one true per block if possible)
Likely value is

  likely = 1-pow(1-p, vf)

I wonder if we can work out p at least in common cases. 
Making store unlikely as we do right now will place it offline with
extra jump.  Making it likely is better unless p is very small.

I think if p is close to 0 or 1 which may be common case the analysis
above may be useful. If range [low...high] is small, we can use likely
and keep it as reliable.
If it is high, we can probably just end up with guessed value close but
above 50% so the store stays inline.

Honza


Re: [PATCH] Reduce floating-point difficulties in timevar.cc

2023-07-21 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 21, 2023 at 5:13 AM Matthew Malcomson via Gcc-patches
 wrote:
>
> On some AArch64 bootstrapped builds, we were getting a flaky test
> because the floating point operations in `get_time` were being fused
> with the floating point operations in `timevar_accumulate`.
>
> This meant that the rounding behaviour of our multiplication with
> `ticks_to_msec` was different when used in `timer::start` and when
> performed in `timer::stop`.  These extra inaccuracies led to the
> testcase `g++.dg/ext/timevar1.C` being flaky on some hardware.
>
> This change ensures those operations are not fused and hence stops the test
> being flaky on that particular machine.  There is no expected change in the
> generated code.
> Bootstrap & regtest on AArch64 passes with no regressions.

Oh this does explain why powerpc also sees it: https://gcc.gnu.org/PR110316 .
I wonder if not adding noinline here but rather changing the code to
tolerate the fused multiple-subtract instead
which is kinda related to what I suggested in comment #1 in PR 110316.

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * timevar.cc (get_time): Make this noinline to avoid fusing
> behaviour and associated test flakyness.
>
>
> N.b. I didn't know who to include as reviewer -- guessed Richard Biener as the
> global reviewer that had the most contributions to this file and Richard
> Sandiford since I've asked him for reviews a lot in the past.
>
>
> ### Attachment also inlined for ease of reply
> ###
>
>
> diff --git a/gcc/timevar.cc b/gcc/timevar.cc
> index 
> d695297aae7f6b2a6de01a37fe86c2a232338df0..5ea4ec259e114f31f611e7105cd102f4c9552d18
>  100644
> --- a/gcc/timevar.cc
> +++ b/gcc/timevar.cc
> @@ -212,6 +212,7 @@ timer::named_items::print (FILE *fp, const 
> timevar_time_def *total)
> HAVE_WALL_TIME macros.  */
>
>  static void
> +__attribute__((noinline))
>  get_time (struct timevar_time_def *now)
>  {
>now->user = 0;
>
>
>


Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/20/23 17:58, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 03:51:32PM -0400, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:

On 7/20/23 14:13, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:

On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?


Looks reasonable to me.


Thanks.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.


Sounds plausible.  I think my patch could be applied first since it
removes a tiny bit of code, then I can hopefully remove the flag below,
then maybe go back and optimize the call to is_rvalue_constant_expression.
Does that sound sensible?


Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?


It looks that way.  Seems it's only used in cp_parser_constant_expression:
10806   if (allow_non_constant_p)
10807 *non_constant_p = parser->non_integral_constant_expression_p;
but that could be easily replaced by a local var.  I'd be happy to see if
we can actually do away with it.  (I wonder why it was introduced and when
it actually stopped being useful.)


It was for the C++98 notion of constant-expression, which was more of a
parser-level notion, and has been supplanted by the C++11 version.  I'm
happy to remove it, and therefore remove the is_rvalue_constant_expression
call.


Wonderful.  I'll do that next.


I found a use of parser->non_integral_constant_expression_p:
finish_id_expression_1 can set it to true which then makes
a difference in cp_parser_constant_expression in C++98.  In
cp_parser_constant_expression we set n_i_c_e_p to false, call
cp_parser_assignment_expression in which finish_id_expression_1
sets n_i_c_e_p to true, then back in cp_parser_constant_expression
we skip the cxx11 block, and set *non_constant_p to true.  If I
remove n_i_c_e_p, we lose that.  This can be seen in init/array60.C.


Sure, we would need to use the C++11 code for C++98 mode, which is 
likely fine but is more uncertain.


It's probably simpler to just ignore n_i_c_e_p for C++11 and up, along 
with Patrick's suggestion of allowing null non_constant_p with true 
allow_non_constant_p.


Jason



Re: [PATCH] c++: fix ICE with is_really_empty_class [PR110106]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/20/23 15:51, Marek Polacek wrote:

On Thu, Jul 20, 2023 at 02:37:07PM -0400, Jason Merrill wrote:

On 7/20/23 14:13, Marek Polacek wrote:

On Wed, Jul 19, 2023 at 10:11:27AM -0400, Patrick Palka wrote:

On Tue, 18 Jul 2023, Marek Polacek via Gcc-patches wrote:


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk and branches?


Looks reasonable to me.


Thanks.

Though I wonder if we could also fix this by not checking potentiality
at all in this case?  The problematic call to is_rvalue_constant_expression
happens from cp_parser_constant_expression with 'allow_non_constant' != 0
and with 'non_constant_p' being a dummy out argument that comes from
cp_parser_functional_cast, so the result of is_rvalue_constant_expression
is effectively unused in this case, and we should be able to safely elide
it when 'allow_non_constant && non_constant_p == nullptr'.


Sounds plausible.  I think my patch could be applied first since it
removes a tiny bit of code, then I can hopefully remove the flag below,
then maybe go back and optimize the call to is_rvalue_constant_expression.
Does that sound sensible?


Relatedly, ISTM the member cp_parser::non_integral_constant_expression_p
is also effectively unused and could be removed?


It looks that way.  Seems it's only used in cp_parser_constant_expression:
10806   if (allow_non_constant_p)
10807 *non_constant_p = parser->non_integral_constant_expression_p;
but that could be easily replaced by a local var.  I'd be happy to see if
we can actually do away with it.  (I wonder why it was introduced and when
it actually stopped being useful.)


It was for the C++98 notion of constant-expression, which was more of a
parser-level notion, and has been supplanted by the C++11 version.  I'm
happy to remove it, and therefore remove the is_rvalue_constant_expression
call.


Wonderful.  I'll do that next.
  

-- >8 --

is_really_empty_class is liable to crash when it gets an incomplete
or dependent type.  Since r11-557, we pass the yet-uninstantiated
class type S<0> of the PARM_DECL s to is_really_empty_class -- because
of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
change in cp_parser_constant_expression.  Here we're not parsing
a template so we did not check COMPLETE_TYPE_P as we should.

PR c++/110106

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Check COMPLETE_TYPE_P
even when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept80.C: New test.
---
   gcc/cp/constexpr.cc |  2 +-
   gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
   2 files changed, 13 insertions(+), 1 deletion(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..1f59c5472fb 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9116,7 +9116,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 if (now && want_rval)
{
  tree type = TREE_TYPE (t);
- if ((processing_template_decl && !COMPLETE_TYPE_P (type))
+ if (!COMPLETE_TYPE_P (type)
  || dependent_type_p (type)


There shouldn't be a problem completing the type here, so it seems to me
that we're missing a call to complete_type_p, at least when
!processing_template_decl.  Probably need to move the dependent_type_p check
up as a result.


Like so?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
is_really_empty_class is liable to crash when it gets an incomplete
or dependent type.  Since r11-557, we pass the yet-uninstantiated
class type S<0> of the PARM_DECL s to is_really_empty_class -- because
of the potential_rvalue_constant_expression -> is_rvalue_constant_expression
change in cp_parser_constant_expression.  Here we're not parsing
a template so we did not check COMPLETE_TYPE_P as we should.

It should work to complete the type before checking COMPLETE_TYPE_P.

PR c++/110106

gcc/cp/ChangeLog:

* constexpr.cc (potential_constant_expression_1): Try to complete the
type when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept80.C: New test.
---
  gcc/cp/constexpr.cc |  5 +++--
  gcc/testsuite/g++.dg/cpp0x/noexcept80.C | 12 
  2 files changed, 15 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept80.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 6e8f1c2b61e..fb94f3cefcb 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -9116,8 +9116,9 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
if (now && want_rval)
{
  tree type = TREE_TYPE (t);
- if ((processing_template_decl && !COMPLETE_TYPE_P (type))
- || dependent_type_p (type)
+ if (dependent_type_p (type)
+ || !COM

[PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH v3] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Hi Cuper.
OK.  Thanks!

> From 7756a4becd1934e55d6d14ac4a9fd6d408a4797b Mon Sep 17 00:00:00 2001
> From: Cupertino Miranda 
> Date: Fri, 21 Jul 2023 17:40:07 +0100
> Subject: [PATCH v3] bpf: fixed template for neg (added second operand)
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..adf11e151df1 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -139,10 +139,10 @@
>  
>  ;;; Negation
>  (define_insn "neg2"
> -  [(set (match_operand:AM 0 "register_operand" "=r")
> -(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
> +  [(set (match_operand:AM 0 "register_operand" "=r,r")
> +(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])
>  
>  ;;; Multiplication


[COMMITTED] MAINTAINERS: Add myself to write after approval

2023-07-21 Thread Cupertino Miranda via Gcc-patches


Hi everyone,

Just to confirm that I pushed the change in MAINTAINERS file, adding
myself to the write after approval list.

Thanks,
Cupertino


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Philipp Tomsich
On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize 
> it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd  zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"

> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Should we add a "Fixes: "?

> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:

PR target/110748

>
> * config/riscv/predicates.md (const_0_operand): Add back
>   const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>   patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/predicates.md |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index ..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>


[PATCH] match.pd, v2: Implement missed optimization (~X | Y) ^ X -> ~(X & Y) [PR109986]

2023-07-21 Thread Drew Ross via Gcc-patches
Adds a simplification for (~X | Y) ^ X to be folded into ~(X & Y).
Also adds the macro bitwise_equal_p for generic and gimple which
returns true iff EXPR1 and EXPR2 have the same value. This helps 
to reduce the number of nop_converts necessary to match the pattern. 
Tested successfully on x86_64 and x86 targets.

PR middle-end/109986

gcc/ChangeLog:

* generic-match-head.cc (bitwise_equal_p): New macro.
* gimple-match-head.cc (bitwise_equal_p): New macro.
(gimple_nop_convert): Declare.
(gimple_bitwise_equal_p): Helper for bitwise_equal_p.
* match.pd ((~X | Y) ^ X -> ~(X & Y)): New simplification.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr109986.c: New test.
* gcc.dg/tree-ssa/pr109986.c: New test.

Co-authored-by: Jakub Jelinek 
---
 gcc/generic-match-head.cc |  17 ++
 gcc/gimple-match-head.cc  |  36 
 gcc/match.pd  |   6 +
 .../gcc.c-torture/execute/pr109986.c  |  41 
 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c  | 177 ++
 5 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr109986.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109986.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index f011204c5be..b4b5bc88f4b 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -102,3 +102,20 @@ optimize_successive_divisions_p (tree, tree)
 {
   return false;
 }
+
+/* Return true if EXPR1 and EXPR2 have the same value, but not necessarily
+   same type.  The types can differ through nop conversions.  */
+
+static inline bool
+bitwise_equal_p (tree expr1, tree expr2)
+{
+  STRIP_NOPS (expr1);
+  STRIP_NOPS (expr2);
+  if (expr1 == expr2)
+return true;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == wi::to_wide (expr2);
+  return operand_equal_p (expr1, expr2, 0);
+}
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index b08cd891a13..f960d6cf0b9 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -224,3 +224,39 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
 }
   return true;
 }
+
+/* Return true if EXPR1 and EXPR2 have the same value, but not necessarily
+   same type.  The types can differ through nop conversions.  */
+#define bitwise_equal_p(expr1, expr2) gimple_bitwise_equal_p (expr1, expr2, 
valueize)
+
+bool gimple_nop_convert (tree, tree *, tree (*)(tree));
+
+/* Helper function for bitwise_equal_p macro.  */
+
+static inline bool
+gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree))
+{
+  if (expr1 == expr2)
+return true;
+  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
+return false;
+  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
+return wi::to_wide (expr1) == wi::to_wide (expr2);
+  if (operand_equal_p (expr1, expr2, 0))
+return true;
+  tree expr3, expr4;
+  if (!gimple_nop_convert (expr1, &expr3, valueize))
+expr3 = expr1;
+  if (!gimple_nop_convert (expr2, &expr4, valueize))
+expr4 = expr2;
+  if (expr1 != expr3)
+{
+  if (operand_equal_p (expr3, expr2, 0))
+return true;
+  if (expr2 != expr4 && operand_equal_p (expr3, expr4, 0))
+return true;
+}
+  if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
+return true;
+  return false;
+}
diff --git a/gcc/match.pd b/gcc/match.pd
index a17d6838c14..367e4fc5517 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1627,6 +1627,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert (bit_and @1 (bit_not @0)
 
+/* (~X | Y) ^ X -> ~(X & Y).  */
+(simplify
+ (bit_xor:c (nop_convert1? (bit_ior:c (nop_convert2? (bit_not @0)) @1)) @2)
+ (if (bitwise_equal_p (@0, @2))
+  (convert (bit_not (bit_and @0 (convert @1))
+
 /* Convert ~X ^ ~Y to X ^ Y.  */
 (simplify
  (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr109986.c 
b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
new file mode 100644
index 000..00ee9888539
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr109986.c
@@ -0,0 +1,41 @@
+/* PR middle-end/109986 */
+
+#include "../../gcc.dg/tree-ssa/pr109986.c"
+
+int 
+main ()
+{
+  if (t1 (29789, 29477) != -28678) __builtin_abort ();
+  if (t2 (20196, -18743) != 4294965567) __builtin_abort ();
+  if (t3 (127, 99) != -100) __builtin_abort ();
+  if (t4 (100, 53) != 219) __builtin_abort ();
+  if (t5 (20100, 1283) != -1025) __builtin_abort ();
+  if (t6 (20100, 10283) != 63487) __builtin_abort ();
+  if (t7 (2136614690L, 1136698390L) != -1128276995L) __builtin_abort ();
+  if (t8 (1136698390L, 2136614690L) != -1128276995UL) __builtin

[PATCH v4] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches
This patch fixes define_insn for "neg" to support 2 operands.
Initial implementation assumed the format "neg %0" while the instruction
allows both a destination and source operands. The second operand can
either be a register or an immediate value.

gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])
 
 ;;; Multiplication
-- 
2.38.1



Re: [PATCH v4] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Jose E. Marchesi via Gcc-patches


Better with the commit message.
OK.  Thanks.

> This patch fixes define_insn for "neg" to support 2 operands.
> Initial implementation assumed the format "neg %0" while the instruction
> allows both a destination and source operands. The second operand can
> either be a register or an immediate value.
>
> gcc/ChangeLog:
>
>   * config/bpf/bpf.md: fixed template for neg instruction.
> ---
>  gcc/config/bpf/bpf.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index 329f62f55c33..adf11e151df1 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -139,10 +139,10 @@
>  
>  ;;; Negation
>  (define_insn "neg2"
> -  [(set (match_operand:AM 0 "register_operand" "=r")
> -(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
> +  [(set (match_operand:AM 0 "register_operand" "=r,r")
> +(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
>""
> -  "neg\t%0"
> +  "neg\t%0,%1"
>[(set_attr "type" "")])
>  
>  ;;; Multiplication


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:15, Philipp Tomsich wrote:

On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret
This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"


Fixed.


is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.

Should we add a "Fixes: "?


Sure. Although gcc usage of Fixes tag seems slightly different than say 
linux kernel's.






Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

PR target/110748


Added.

Thx,
-Vineet





 * config/riscv/predicates.md (const_0_operand): Add back
   const_double.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/pr110748-1.c: New Test.
 * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
   patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/predicates.md |  2 +-
  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
  3 files changed, 15 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
 (match_test "INTVAL (op) + 1 != 0")))

  (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
 (match_test "op == CONST0_RTX (GET_MODE (op))")))

  (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
  /* { dg-final { scan-assembler "fmv.x.w" } } */
  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
--
2.34.1





[PATCH v2] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
(gcc-13 regression)

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was getting triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output, which is
also fixed in the patch.

gcc/Changelog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - No code changes
  - Updated commitlog: typo, "Fixes:" tag, mention PR in Changelog entry
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Palmer Dabbelt

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.


Given that it can ICE, we should probably backport it to 13.


Ran thru full multilib testsuite, there was 1 false failure due to


Did you run the test with autovec?  There's also a 
pmode_reg_or_0_operand, some of those don't appear protected from FP 
values.  So we might need something like


diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign"

(define_expand "len_mask_gather_load"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "")
   (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...


random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.

gcc/Changelog:

* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))

 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))

 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same
way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe I'm
missing something?`


diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */


I think that autovec one is the only possible dependency that might have snuck
in, so we should be safe otherwise.  Thanks!

Reviewed-by: Palmer Dabbelt 


[PATCH 2/1] c++: passing partially inst ttp as ttp [PR110566]

2023-07-21 Thread Patrick Palka via Gcc-patches
(This is a follow-up of
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624951.html)

Bootstrapped and regtested on x86_64-pc-linux-gnu, how does this look?

-- >8 --

The previous fix doesn't work for partially instantiated ttps primarily
because most_general_template doesn't work for them.  This patch fixes
this by giving such ttps a DECL_TEMPLATE_INFO (extending the
r11-734-g2fb595f8348e16 fix) with which we can obtain the original ttp.

This patch additionally makes us be more careful about using the correct
amount of levels from the scope of a ttp argument during
coerce_template_template_parms.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Set DECL_TEMPLATE_INFO
on the DECL_TEMPLATE_RESULT of a reduced template template
parameter.
(add_defaults_to_ttp): Also update DECL_TEMPLATE_INFO of the
ttp's DECL_TEMPLATE_RESULT.
(coerce_template_template_parms): Make sure 'scope_args' has
the right amount of levels for the ttp argument.
(most_general_template): Handle template template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp39.C: New test.
---
 gcc/cp/pt.cc  | 46 ---
 gcc/testsuite/g++.dg/template/ttp39.C | 16 ++
 2 files changed, 57 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/ttp39.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e0ed4bc8bbb..be7119dd9a0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4570,8 +4570,14 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  TYPE_DECL, DECL_NAME (decl), type);
  DECL_TEMPLATE_RESULT (decl) = inner;
  DECL_ARTIFICIAL (inner) = true;
- DECL_TEMPLATE_PARMS (decl) = tsubst_template_parms
-   (DECL_TEMPLATE_PARMS (orig_decl), args, complain);
+ tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (orig_decl),
+ args, complain);
+ DECL_TEMPLATE_PARMS (decl) = parms;
+ retrofit_lang_decl (inner);
+ tree orig_inner = DECL_TEMPLATE_RESULT (orig_decl);
+ DECL_TEMPLATE_INFO (inner)
+   = build_template_info (DECL_TI_TEMPLATE (orig_inner),
+  template_parms_to_args (parms));
}
 
   /* Attach the TPI to the decl.  */
@@ -7936,6 +7942,19 @@ add_defaults_to_ttp (tree otmpl)
}
 }
 
+  tree oresult = DECL_TEMPLATE_RESULT (otmpl);
+  tree gen_otmpl = DECL_TI_TEMPLATE (oresult);
+  tree gen_ntmpl;
+  if (gen_otmpl == otmpl)
+gen_ntmpl = ntmpl;
+  else
+gen_ntmpl = add_defaults_to_ttp (gen_otmpl);
+
+  tree nresult = copy_node (oresult);
+  DECL_TEMPLATE_INFO (nresult) = copy_node (DECL_TEMPLATE_INFO (oresult));
+  DECL_TI_TEMPLATE (nresult) = gen_ntmpl;
+  DECL_TEMPLATE_RESULT (ntmpl) = nresult;
+
   hash_map_safe_put (defaulted_ttp_cache, otmpl, ntmpl);
   return ntmpl;
 }
@@ -8121,15 +8140,29 @@ coerce_template_template_parms (tree parm_tmpl,
 OUTER_ARGS are not the right outer levels in this case, as they are
 the args we're building up for PARM, and for the coercion we want the
 args for ARG.  If DECL_CONTEXT isn't set for a template template
-parameter, we can assume that it's in the current scope.  In that case
-we might end up adding more levels than needed, but that shouldn't be
-a problem; any args we need to refer to are at the right level.  */
+parameter, we can assume that it's in the current scope.  */
   tree ctx = DECL_CONTEXT (arg_tmpl);
   if (!ctx && DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
ctx = current_scope ();
   tree scope_args = NULL_TREE;
   if (tree tinfo = get_template_info (ctx))
scope_args = TI_ARGS (tinfo);
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
+   {
+ int level = TEMPLATE_TYPE_LEVEL (TREE_TYPE (gen_arg_tmpl));
+ int scope_depth = TMPL_ARGS_DEPTH (scope_args);
+ if (scope_depth >= level)
+   /* Only use as many levels from the scope as needed (not
+  including the level of ARG).  */
+   scope_args = strip_innermost_template_args
+ (scope_args, scope_depth - (level - 1));
+
+ /* Add the arguments that appear at the level of ARG.  */
+ tree adj_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (arg_tmpl));
+ adj_args = TMPL_ARGS_LEVEL (adj_args, TMPL_ARGS_DEPTH (adj_args) - 1);
+ scope_args = add_to_template_args (scope_args, adj_args);
+   }
+
   pargs = add_to_template_args (scope_args, pargs);
 
   pargs = coerce_template_parms (gen_arg_parms, pargs, NULL_TREE, tf_none);
@@ -25985,6 +26018,9 @@ most_general_template (tree decl)
return NULL_TREE;
 }
 
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (decl))
+return DECL_TI_TEMPLATE (DECL_TEMPLATE_RESULT (decl));
+
   /* Look for more a

Re: [COMMITTED] bpf: fixed template for neg (added second operand)

2023-07-21 Thread Cupertino Miranda via Gcc-patches


This patch fixes define_insn for "neg" to support 2 operands.
Initial implementation assumed the format "neg %0" while the instruction
allows both a destination and source operands. The second operand can
either be a register or an immediate value.

gcc/ChangeLog:

* config/bpf/bpf.md: fixed template for neg instruction.
---
 gcc/config/bpf/bpf.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index 329f62f55c33..adf11e151df1 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -139,10 +139,10 @@

 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand" "=r")
-(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
+  [(set (match_operand:AM 0 "register_operand" "=r,r")
+(neg:AM (match_operand:AM 1 "register_operand" " r,I")))]
   ""
-  "neg\t%0"
+  "neg\t%0,%1"
   [(set_attr "type" "")])

 ;;; Multiplication
--
2.38.1


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Jeff Law via Gcc-patches




On 7/21/23 12:31, Palmer Dabbelt wrote:



(define_expand "len_mask_gather_load"
   [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
    (match_operand:VNX1_QHSDI 2 "register_operand")
    (match_operand 3 "")
    (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if that
could manifest as an actual bug, though...
But won't this cause (const_int 0) to no longer match because CONST_INT 
nodes are modeless (VOIDmode)?


Jeff


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:

On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
optimize it.


void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.


Given that it can ICE, we should probably backport it to 13.


FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree 
already. And this will merge ahead of that.

I'm fine with backport though.




Ran thru full multilib testsuite, there was 1 false failure due to


Did you run the test with autovec?


I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite 
being enabled at -O2 and above will not kick in.

I think we should add a 'v' multilib.


There's also a pmode_reg_or_0_operand, some of those don't appear 
protected from FP values.  So we might need something like


diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign"

(define_expand "len_mask_gather_load"
  [(match_operand:VNX1_QHSD 0 "register_operand")
-   (match_operand 1 "pmode_reg_or_0_operand")
+   (match_operand:P 1 "pmode_reg_or_0_operand")
   (match_operand:VNX1_QHSDI 2 "register_operand")
   (match_operand 3 "")
   (match_operand 4 "")

a bunch of times, as there's a ton of them?  I'm not entirely sure if 
that

could manifest as an actual bug, though...


What does 'P' do here ?


+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm

missing something?`


I need to unpack this first :-)



diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c

index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */


I think that autovec one is the only possible dependency that might 
have snuck

in, so we should be safe otherwise.  Thanks!


I'm not sure if this specific comment is related to the xthead test or 
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler 
output.




[PATCH] gcc-13/changes.html: Add and fix URL to -fstrict-flex-array option.

2023-07-21 Thread Qing Zhao via Gcc-patches
Hi,

In the current GCC13 release note, the URL to the option -fstrict-flex-array
is wrong (pointing to -Wstrict-flex-array).
This is the change to correct the URL and also add the URL in another place
where -fstrict-flex-array is mentioned.

I have checked the resulting HTML file, works well.

Okay for committing?

thanks.

Qing
---
 htdocs/gcc-13/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 68e8c5cc..39b63a84 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -46,7 +46,7 @@ You may also want to check out our
   will no longer issue warnings for out of
   bounds accesses to trailing struct members of one-element array type
   anymore. Instead it diagnoses accesses to trailing arrays according to
-  -fstrict-flex-arrays. 
+  https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays.
 
 https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Static-Analyzer-Options.html";>-fanalyzer
   is still only suitable for analyzing C code.
   In particular, using it on C++ is unlikely to give meaningful 
output.
@@ -213,7 +213,7 @@ You may also want to check out our
  flexible array member for the purpose of accessing the elements of such
  an array. By default, all trailing arrays in aggregates are treated as
  flexible array members. Use the new command-line option
- https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Warning-Options.html#index-Wstrict-flex-arrays";>-fstrict-flex-arrays
+ https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays
  to control which array members are treated as flexible arrays.
  
 
-- 
2.31.1



Re: [PATCH v3] bpf: pseudo-c assembly dialect support

2023-07-21 Thread Cupertino Miranda via Gcc-patches



>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 3063e71c8906..b3be65d3efae 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -946,8 +946,8 @@ Objective-C and Objective-C++ Dialects}.
>>
>>  @emph{eBPF Options}
>>  @gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
>> --mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re
>> --mjmpext -mjmp32 -malu32 -mcpu=@var{version}}
>> +-mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re -mjmpext
>> +-mjmp32 -malu32 -mcpu=@var{version} -masm=@var{dialect>}}
>
> There is a spurious > character there.
>
> Other than that, the patch is OK.
> Thanks!

Fixed the extra character and committed.
Thanks !


Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Vineet Gupta




On 7/21/23 11:31, Palmer Dabbelt wrote:


IIUC the pattern to emit fmv suffers from the same bug -- it's fixed 
in the same
way, but I think we might be able to come up with a test for it: 
`fmv.d.x FREG,

x0` would be the fastest way to generate 0.0, so maybe something like

   double sum(double *d) {
 double sum = 0;
 for (int i = 0; i < 8; ++i)
   sum += d[i];
 return sum;
   }

would do it?  That's generating the fmv on 13 for me, though, so maybe 
I'm
missing something?` 


I don't think we can avoid FMV in this case

    fmv.d.x    fa0,zero #1
    addi    a5,a0,64
.L2:
    fld    fa5,0(a0)
    addi    a0,a0,8
    fadd.d    fa0,fa0,fa5   #2
    bne    a0,a5,.L2
    ret

In #1, the zero needs to be setup in FP reg (possible using FMV), since 
in #2 it will be used for FP math.


If we change ur test slightly,

double zadd(double *d) {
 double sum = 0.0;
 for (int i = 0; i < 8; ++i)
   d[i] = sum;
 return sum;
}

We still get the optimal code for writing to FP 0. The last FMV is 
unavoidable as we need an FP return reg.



    addi    a5,a0,64
.L2:
    sd    zero,0(a0)
    addi    a0,a0,8
    bne    a0,a5,.L2
    fmv.d.x    fa0,zero
    ret


[committed] Require target lra in gcc.c-torture/compile/asmgoto-6.c

2023-07-21 Thread John David Anglin
The asmgoto feature requires LRA support.

Committed to trunk. Tested on hppa64-hp-hpux11.11.

Dave
---

Require target lra in gcc.c-torture/compile/asmgoto-6.c

2023-07-21  John David Anglin  

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/asmgoto-6.c: Require target lra.

diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c 
b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
index 0652bd4e4e1..6799b83c20a 100644
--- a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
+++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
@@ -1,5 +1,5 @@
 
-/* { dg-do compile } */
+/* { dg-do compile { target lra } } */
 /* PR middle-end/110420 */
 /* PR middle-end/103979 */
 /* PR middle-end/98619 */


signature.asc
Description: PGP signature


Re: [pushed][LRA]: Check and update frame to stack pointer elimination after stack slot allocation

2023-07-21 Thread Vladimir Makarov via Gcc-patches



On 7/20/23 16:45, Rainer Orth wrote:

Hi Vladimir,


The following patch is necessary for porting avr to LRA.

The patch was successfully bootstrapped and tested on x86-64, aarch64, and
ppc64le.

There is still avr poring problem with reloading of subreg of frame
pointer.  I'll address it later on this week.

this patch most likely broke sparc-sun-solaris2.11 bootstrap:

/var/gcc/regression/master/11.4-gcc/build/./gcc/xgcc 
-B/var/gcc/regression/master/11.4-gcc/build/./gcc/ 
-B/vol/gcc/sparc-sun-solaris2.11/bin/ -B/vol/gcc/sparc-sun-solaris2.11/lib/ 
-isystem /vol/gcc/sparc-sun-solaris2.11/include -isystem 
/vol/gcc/sparc-sun-solaris2.11/sys-include   -fchecking=1 -c -g -O2   -W -Wall 
-gnatpg -nostdinc   g-alleve.adb -o g-alleve.o
+===GNAT BUG DETECTED==+
| 14.0.0 20230720 (experimental) [master 
506f068e7d01ad2fb107185b8fb204a0ec23785c] (sparc-sun-solaris2.11) GCC error:|
| in update_reg_eliminate, at lra-eliminations.cc:1179 |
| Error detected around g-alleve.adb:4132:8

This is in stage 3.  I haven't investigated further yet.


Thank you for reporting this.  I'll try to fix on this week.  I have a 
patch but unfortunately bootstrap is too slow.  If the patch does not 
work, I'll revert the original patch.





Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948]

2023-07-21 Thread David Malcolm via Gcc-patches
On Thu, 2023-07-06 at 16:43 +0200, priour...@gmail.com wrote:
> As per David's suggestion.
> - Improved leading comment of "is_placement_new_p"
> - "kf_operator_new::matches_call_types_p" now checks that arg 0 is of
>   integral type and that arg 1, if any, is of pointer type.
> - Changed ambiguous "int" to "int8_t" and "int64_t" in placement-new-
> size.C
>   to trigger a target independent out-of-bounds warning.
>   Other OOB tests were not based on the size of types, but on the
> number
>   elements, so them using "int" didn't lead to any ambiguity.
> 
> contrib/check_GNU_style.sh still complains about a space before
> square
> brackets in string "operator new []", but as before, this one space
> is
> mandatory for a correct recognition of the function.
> 
> Changes succesfully regstrapped on x86_64-linux-gnu against trunk
> 3c776fdf1a8.
> 
> Is it OK for trunk ?
> Thanks again,
> Benjamin.

Hi Benjamin, thanks for the updated patch.

As before, this looks close to being ready, but I have some further
comments:

[...snip...]

> diff --git a/gcc/analyzer/kf-lang-cp.cc b/gcc/analyzer/kf-lang-cp.cc
> index 393b4f25e79..ef057da863f 100644
> --- a/gcc/analyzer/kf-lang-cp.cc
> +++ b/gcc/analyzer/kf-lang-cp.cc
> @@ -35,6 +35,49 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #if ENABLE_ANALYZER
>  
> +/* Return true if CALL is a non-allocating operator new or operator new []
> +  that contains no user-defined args, i.e. having any signature of:
> +
> +- void* operator new (std::size_t count, void* ptr);
> +- void* operator new[] (std::size_t count, void* ptr);
> +
> +  See https://en.cppreference.com/w/cpp/memory/new/operator_new.  */
> +
> +bool is_placement_new_p (const gcall *call)
> +{
> +  gcc_assert (call);
> +
> +  tree fndecl = gimple_call_fndecl (call);
> +  if (!fndecl)
> +return false;
> +
> +  if (!is_named_call_p (fndecl, "operator new", call, 2)
> +&& !is_named_call_p (fndecl, "operator new []", call, 2))
> +return false;
> +  tree arg1 = gimple_call_arg (call, 1);
> +
> +  if (!POINTER_TYPE_P (TREE_TYPE (arg1)))
> +return false;
> +
> +  /* We must distinguish between an allocating non-throwing new
> +and a non-allocating new.
> +
> +The former might have one of the following signatures :
> +void* operator new (std::size_t count, const std::nothrow_t& tag);
> +void* operator new[] (std::size_t count, const std::nothrow_t& tag);
> +
> +However, debugging has shown that TAG is actually a POINTER_TYPE,
> +not a REFERENCE_TYPE.
> +
> +Thus, we cannot easily differentiate the types, but we instead have to
> +check if the second argument's type identifies as nothrow_t.  */
> +  tree identifier = TYPE_IDENTIFIER (TREE_TYPE (TREE_TYPE (arg1)));
> +  if (!identifier)
> +return true;
> +  const char *name = IDENTIFIER_POINTER (identifier);
> +  return 0 != strcmp (name, "nothrow_t");
> +}
> +

If we're looking for a simple "void *", wouldn't it be simpler and
cleaner to check for arg1 being a pointer to a type that's VOID_TYPE_P,
rather than this name comparison?

[...snip...]

> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index a8c63eb1ce8..41c313c07dd 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -754,7 +754,7 @@ public:
>  override
>{
>  if (change.m_old_state == m_sm.get_start_state ()
> - && unchecked_p (change.m_new_state))
> + && (unchecked_p (change.m_new_state) || nonnull_p (change.m_new_state)))
>// TODO: verify that it's the allocation stmt, not a copy
>return label_text::borrow ("allocated here");
>  if (unchecked_p (change.m_old_state)
> @@ -1910,11 +1910,16 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
>   return true;
> }
>  
> - if (is_named_call_p (callee_fndecl, "operator new", call, 1))
> -   on_allocator_call (sm_ctxt, call, &m_scalar_delete);
> - else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
> -   on_allocator_call (sm_ctxt, call, &m_vector_delete);
> - else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
> + if (!is_placement_new_p (call))
> +   {
> +  bool returns_nonnull = !TREE_NOTHROW (callee_fndecl) && flag_exceptions;
> +  if (is_named_call_p (callee_fndecl, "operator new"))
> +on_allocator_call (sm_ctxt, call, &m_scalar_delete, returns_nonnull);
> +  else if (is_named_call_p (callee_fndecl, "operator new []"))
> +on_allocator_call (sm_ctxt, call, &m_vector_delete, returns_nonnull);
> +   }
> +
> + if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
>|| is_named_call_p (callee_fndecl, "operator delete", call, 2))
> {
>   on_deallocator_call (sm_ctxt, node, call,

It looks like something's gone wrong with the indentation in the above:
previously we had tab characters, but now I'm seeing a pair of spaces,
which means this wouldn't line up properly.  This might 

Re: [PATCH v5 4/5] c++modules: report imported CMI files as dependencies

2023-07-21 Thread Nathan Sidwell via Gcc-patches

On 7/21/23 10:57, Ben Boeckel wrote:

On Thu, Jul 20, 2023 at 17:00:32 -0400, Nathan Sidwell wrote:

On 7/19/23 20:47, Ben Boeckel wrote:

But it is inhibiting distributed builds because the distributing tool
would need to know:

- what CMIs are actually imported (here, "read the module mapper file"
(in CMake's case, this is only the modules that are needed; a single
massive mapper file for an entire project would have extra entries) or
"act as a proxy for the socket/program specified" for other
approaches);


This information is in the machine (& human) README section of the CMI.


Ok. That leaves it up to distributing build tools to figure out at
least.


- read the CMIs as it sends to the remote side to gather any other CMIs
that may be needed (recursively);

Contrast this with the MSVC and Clang (17+) mechanism where the command
line contains everything that is needed and a single bolus can be sent.


um, the build system needs to create that command line? Where does the build
system get that information?  IIUC it'll need to read some file(s) to do that.


It's chained through the P1689 information in the collator as needed. No
extra files need to be read (at least with CMake's approach); certainly
not CMI files.


It occurs to me that the model I am envisioning is similar to CMake's object 
libraries.  Object libraries are a convenient name for a bunch of object files. 
IIUC they're linked by naming the individual object files (or I think the could 
be implemented as a static lib linked with --whole-archive path/to/libfoo.a 
-no-whole-archive.  But for this conversation consider them a bunch of separate 
object files with a convenient group name.


Consider also that object libraries could themselves contain object libraries (I 
don't know of they can, but it seems like a useful concept).  Then one could 
create an object library from a collection of object files and object libraries 
(recursively).  CMake would handle the transitive gtaph.


Now, allow an object library to itself have some kind of tangible, on-disk 
representation.  *BUT* not like a static library -- it doesn't include the 
object files.



Now that immediately maps onto modules.

CMI: Object library
Direct imports: Direct object libraries of an object library

This is why I don't understand the need explicitly indicate the indirect imports 
of a CMI.  CMake knows them, because it knows the graph.





And relocatable is probably fine. How does it interact with reproducible
builds? Or are GCC CMIs not really something anyone should consider for
installation (even as a "here, maybe this can help consumers"
mechanism)?


Module CMIs should be considered a cacheable artifact.  They are neither object
files nor source files.


Sure, cachable sounds fine. What about the installation?

--Ben


--
Nathan Sidwell



Re: [PATCH 2/1] c++: passing partially inst ttp as ttp [PR110566]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/21/23 14:34, Patrick Palka wrote:

(This is a follow-up of
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624951.html)

Bootstrapped and regtested on x86_64-pc-linux-gnu, how does this look?

-- >8 --

The previous fix doesn't work for partially instantiated ttps primarily
because most_general_template doesn't work for them.  This patch fixes
this by giving such ttps a DECL_TEMPLATE_INFO (extending the
r11-734-g2fb595f8348e16 fix) with which we can obtain the original ttp.

This patch additionally makes us be more careful about using the correct
amount of levels from the scope of a ttp argument during
coerce_template_template_parms.

PR c++/110566

gcc/cp/ChangeLog:

* pt.cc (reduce_template_parm_level): Set DECL_TEMPLATE_INFO
on the DECL_TEMPLATE_RESULT of a reduced template template
parameter.
(add_defaults_to_ttp): Also update DECL_TEMPLATE_INFO of the
ttp's DECL_TEMPLATE_RESULT.
(coerce_template_template_parms): Make sure 'scope_args' has
the right amount of levels for the ttp argument.
(most_general_template): Handle template template parameters.

gcc/testsuite/ChangeLog:

* g++.dg/template/ttp39.C: New test.
---
  gcc/cp/pt.cc  | 46 ---
  gcc/testsuite/g++.dg/template/ttp39.C | 16 ++
  2 files changed, 57 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/ttp39.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e0ed4bc8bbb..be7119dd9a0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4570,8 +4570,14 @@ reduce_template_parm_level (tree index, tree type, int 
levels, tree args,
  TYPE_DECL, DECL_NAME (decl), type);
  DECL_TEMPLATE_RESULT (decl) = inner;
  DECL_ARTIFICIAL (inner) = true;
- DECL_TEMPLATE_PARMS (decl) = tsubst_template_parms
-   (DECL_TEMPLATE_PARMS (orig_decl), args, complain);
+ tree parms = tsubst_template_parms (DECL_TEMPLATE_PARMS (orig_decl),
+ args, complain);
+ DECL_TEMPLATE_PARMS (decl) = parms;
+ retrofit_lang_decl (inner);
+ tree orig_inner = DECL_TEMPLATE_RESULT (orig_decl);
+ DECL_TEMPLATE_INFO (inner)
+   = build_template_info (DECL_TI_TEMPLATE (orig_inner),
+  template_parms_to_args (parms));


Should we assert that orig_inner doesn't have its own 
DECL_TEMPLATE_INFO?  I'm wondering if it's possible to reduce the level 
of a TTP more than once.



}
  
/* Attach the TPI to the decl.  */

@@ -7936,6 +7942,19 @@ add_defaults_to_ttp (tree otmpl)
}
  }
  
+  tree oresult = DECL_TEMPLATE_RESULT (otmpl);

+  tree gen_otmpl = DECL_TI_TEMPLATE (oresult);


Hmm, here we're assuming that all TTPs have DECL_TEMPLATE_INFO?


+  tree gen_ntmpl;
+  if (gen_otmpl == otmpl)
+gen_ntmpl = ntmpl;
+  else
+gen_ntmpl = add_defaults_to_ttp (gen_otmpl);
+
+  tree nresult = copy_node (oresult);
+  DECL_TEMPLATE_INFO (nresult) = copy_node (DECL_TEMPLATE_INFO (oresult));
+  DECL_TI_TEMPLATE (nresult) = gen_ntmpl;
+  DECL_TEMPLATE_RESULT (ntmpl) = nresult;
+
hash_map_safe_put (defaulted_ttp_cache, otmpl, ntmpl);
return ntmpl;
  }
@@ -8121,15 +8140,29 @@ coerce_template_template_parms (tree parm_tmpl,
 OUTER_ARGS are not the right outer levels in this case, as they are
 the args we're building up for PARM, and for the coercion we want the
 args for ARG.  If DECL_CONTEXT isn't set for a template template
-parameter, we can assume that it's in the current scope.  In that case
-we might end up adding more levels than needed, but that shouldn't be
-a problem; any args we need to refer to are at the right level.  */
+parameter, we can assume that it's in the current scope.  */
tree ctx = DECL_CONTEXT (arg_tmpl);
if (!ctx && DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
ctx = current_scope ();
tree scope_args = NULL_TREE;
if (tree tinfo = get_template_info (ctx))
scope_args = TI_ARGS (tinfo);
+  if (DECL_TEMPLATE_TEMPLATE_PARM_P (arg_tmpl))
+   {
+ int level = TEMPLATE_TYPE_LEVEL (TREE_TYPE (gen_arg_tmpl));
+ int scope_depth = TMPL_ARGS_DEPTH (scope_args);
+ if (scope_depth >= level)
+   /* Only use as many levels from the scope as needed (not
+  including the level of ARG).  */
+   scope_args = strip_innermost_template_args
+ (scope_args, scope_depth - (level - 1));
+
+ /* Add the arguments that appear at the level of ARG.  */
+ tree adj_args = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (arg_tmpl));
+ adj_args = TMPL_ARGS_LEVEL (adj_args, TMPL_ARGS_DEPTH (adj_args) - 1);
+ scope_args = add_to_template_args (scope_args, adj_args);


Maybe we should add an integer parameter to add_to_template_args so we 
can specify add

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-07-21 Thread Qing Zhao via Gcc-patches


> On Jul 21, 2023, at 7:21 AM, Martin Uecker via Gcc-patches 
>  wrote:
> 
> 
> 
> This patch adds a warning for allocations with insufficient size
> based on the "alloc_size" attribute and the type of the pointer 
> the result is assigned to. While it is theoretically legal to
> assign to the wrong pointer type and cast it to the right type
> later, this almost always indicates an error. Since this catches
> common mistakes and is simple to diagnose, it is suggested to
> add this warning.
> 
> 
> Bootstrapped and regression tested on x86. 
> 
> 
> Martin
> 
> 
> 
> Add option Walloc-type that warns about allocations that have
> insufficient storage for the target type of the pointer the
> storage is assigned to.
> 
> gcc:
>   * doc/invoke.texi: Document -Wstrict-flex-arrays option.

The above should be “Document -Walloc-type option”. -:).

Qing
> 
> gcc/c-family:
> 
>   * c.opt (Walloc-type): New option.
> 
> gcc/c:
>   * c-typeck.cc (convert_for_assignment): Add Walloc-type warning.
> 
> gcc/testsuite:
> 
>   * gcc.dg/Walloc-type-1.c: New test.
> 
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 4abdc8d0e77..8b9d148582b 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -319,6 +319,10 @@ Walloca
> C ObjC C++ ObjC++ Var(warn_alloca) Warning
> Warn on any use of alloca.
> 
> +Walloc-type
> +C ObjC Var(warn_alloc_type) Warning
> +Warn when allocating insufficient storage for the target type of the
> assigned pointer.
> +
> Walloc-size-larger-than=
> C ObjC C++ LTO ObjC++ Var(warn_alloc_size_limit) Joined Host_Wide_Int
> ByteSize Warning Init(HOST_WIDE_INT_MAX)
> -Walloc-size-larger-than=  Warn for calls to allocation
> functions that
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 7cf411155c6..2e392f9c952 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -7343,6 +7343,32 @@ convert_for_assignment (location_t location,
> location_t expr_loc, tree type,
>   "request for implicit conversion "
>   "from %qT to %qT not permitted in C++", rhstype,
> type);
> 
> +  /* Warn of new allocations are not big enough for the target
> type.  */
> +  tree fndecl;
> +  if (warn_alloc_type
> +   && TREE_CODE (rhs) == CALL_EXPR
> +   && (fndecl = get_callee_fndecl (rhs)) != NULL_TREE
> +   && DECL_IS_MALLOC (fndecl))
> + {
> +   tree fntype = TREE_TYPE (fndecl);
> +   tree fntypeattrs = TYPE_ATTRIBUTES (fntype);
> +   tree alloc_size = lookup_attribute ("alloc_size",
> fntypeattrs);
> +   if (alloc_size)
> + {
> +   tree args = TREE_VALUE (alloc_size);
> +   int idx = TREE_INT_CST_LOW (TREE_VALUE (args)) - 1;
> +   /* For calloc only use the second argument.  */
> +   if (TREE_CHAIN (args))
> + idx = TREE_INT_CST_LOW (TREE_VALUE (TREE_CHAIN
> (args))) - 1;
> +   tree arg = CALL_EXPR_ARG (rhs, idx);
> +   if (TREE_CODE (arg) == INTEGER_CST
> +   && tree_int_cst_lt (arg, TYPE_SIZE_UNIT (ttl)))
> +  warning_at (location, OPT_Walloc_type, "allocation of
> "
> +  "insufficient size %qE for type %qT with
> "
> +  "size %qE", arg, ttl, TYPE_SIZE_UNIT
> (ttl));
> + }
> + }
> +
>   /* See if the pointers point to incompatible address spaces.  */
>   asl = TYPE_ADDR_SPACE (ttl);
>   asr = TYPE_ADDR_SPACE (ttr);
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 88e3c625030..6869bed64c3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8076,6 +8076,15 @@ always leads to a call to another @code{cold}
> function such as wrappers of
> C++ @code{throw} or fatal error reporting functions leading to
> @code{abort}.
> @end table
> 
> +@opindex Wno-alloc-type
> +@opindex Walloc-type
> +@item -Walloc-type
> +Warn about calls to allocation functions decorated with attribute
> +@code{alloc_size} that specify insufficient size for the target type
> of
> +the pointer the result is assigned to, including those to the built-in
> +forms of the functions @code{aligned_alloc}, @code{alloca},
> @code{calloc},
> +@code{malloc}, and @code{realloc}.
> +
> @opindex Wno-alloc-zero
> @opindex Walloc-zero
> @item -Walloc-zero
> diff --git a/gcc/testsuite/gcc.dg/Walloc-type-1.c
> b/gcc/testsuite/gcc.dg/Walloc-type-1.c
> new file mode 100644
> index 000..bc62e5e9aa3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Walloc-type-1.c
> @@ -0,0 +1,37 @@
> +/* Tests the warnings for insufficient allocation size. 
> +   { dg-do compile }
> + * { dg-options "-Walloc-type" } 
> + * */
> +#include 
> +#include 
> +
> +struct b { int x[10]; };
> +
> +void fo0(void)
> +{
> +struct b *p = malloc(sizeof *p);
> +}
> +
> +void fo1(void)
> +{
> +struct b *p = malloc(sizeof p);  /* { dg-
> warning "allocation of insufficient size" } */
> +}
> +
> +void fo2(void)
> +{
> +struct b *p = alloca(size

Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-21 Thread Andreas Schwab
../../gcc/config/riscv/riscv.cc: In function 'void riscv_option_override()':
../../gcc/config/riscv/riscv.cc:6716:7: error: misspelled term 'can not' in 
format; use 'cannot' instead [-Werror=format-diag]
 6716 |   "Current RISC-V GCC can not support VLEN > 4096bit for 'V' 
Extension");
  |   
^   
../../gcc/config/riscv/riscv.cc:6716:7: error: unbalanced punctuation character 
'>' in format [-Werror=format-diag]

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check

2023-07-21 Thread Jeff Law via Gcc-patches




On 7/21/23 15:16, Andreas Schwab wrote:

../../gcc/config/riscv/riscv.cc: In function 'void riscv_option_override()':
../../gcc/config/riscv/riscv.cc:6716:7: error: misspelled term 'can not' in 
format; use 'cannot' instead [-Werror=format-diag]
  6716 |   "Current RISC-V GCC can not support VLEN > 4096bit for 'V' 
Extension");
   |   
^
../../gcc/config/riscv/riscv.cc:6716:7: error: unbalanced punctuation character 
'>' in format [-Werror=format-diag]
Thanks.  There's another similar warning with strong accents.  I'll deal 
with both.


jeff


[PATCH] libstdc++ Add cstdarg to freestanding

2023-07-21 Thread Paul M. Bendixen via Gcc-patches
P1642 includes the header cstdarg to the freestanding implementation.
This was probably left out by accident, this patch puts it in.
Since this is one of the headers that go in whole cloth, there should be no
further actions needed.
This might be related to PR106953, but since that one touches the partial
headers I'm not sure

/Paul M. Bendixen

-- 
• − − •/• −/• • −/• − • •/− • • •/•/− •/− • •/• •/− • • −/•/− •/• − − •−
•/− − •/− −/• −/• •/• − • •/• − • − • −/− • − •/− − −/− −//
From 5584c194927678067e412aeb19f10b9662e398a6 Mon Sep 17 00:00:00 2001
From: "Paul M. Bendixen" 
Date: Fri, 21 Jul 2023 22:04:23 +0200
Subject: [PATCH] libstdc++: Include cstdarg in freestanding

P1642 includes cstdarg in the full headers to include. Include it.

Signed-off-by: Paul M. Bendixen 
---
 libstdc++-v3/include/Makefile.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index 0ff875b280b..f09f97e2f6b 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1194,6 +1194,7 @@ c_base_builddir = .
 c_base_freestanding = \
 	${c_base_srcdir}/cfloat \
 	${c_base_srcdir}/climits \
+	${c_base_srcdir}/cstdarg \
 	${c_base_srcdir}/cstddef \
 	${c_base_srcdir}/cstdint \
 	${c_base_srcdir}/cstdlib
@@ -1213,7 +1214,6 @@ c_base_freestanding = \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/csetjmp \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/csignal \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdalign \
-@GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdarg \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdbool \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstdio \
 @GLIBCXX_HOSTED_TRUE@	${c_base_srcdir}/cstring \
-- 
2.34.1



[PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-07-21 Thread Thiago Jung Bauermann via Gcc-patches
Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: expected iteration declaration or initialization
compiler exited with status 1

to:

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
:8:3: error: invalid type for iteration variable 'i'

Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly.  Also, it's a
better error message.

Therefore, we only need to adjust the testcase to expect the new message.

gcc/testsuite/ChangeLog:
PR testsuite/110756
g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
---
 gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C 
b/gcc/testsuite/g++.dg/gomp/pr58567.C
index 35a5bb027ffe..866d831c65e4 100644
--- a/gcc/testsuite/g++.dg/gomp/pr58567.C
+++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
@@ -5,7 +5,7 @@
 template void foo()
 {
   #pragma omp parallel for
-  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
class, struct, or union type|expected iteration declaration or initialization" 
} */
+  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a 
class, struct, or union type|invalid type for iteration variable 'i'" } */
 ;
 }
 


Re: [PATCH v4 2/3] c++: Improve constexpr error for dangling local variables [PR110619]

2023-07-21 Thread Jason Merrill via Gcc-patches

On 7/21/23 01:39, Nathaniel Shead wrote:

On Thu, Jul 20, 2023 at 11:46:47AM -0400, Jason Merrill wrote:

On 7/20/23 05:36, Nathaniel Shead wrote:

Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values.

This patch removes this "optimisation".


This isn't an optimization, it's for safety, removing a way for an attacker
to get a handle on other data on the stack (CWE-562).

But I agree that we need to preserve some element of UB for constexpr
evaluation to see.

Perhaps we want to move this transformation to cp_maybe_instrument_return,
so it happens after maybe_save_constexpr_fundef?


Hm, OK. I can try giving this a go. I guess I should move the entire
maybe_warn_about_returning_address_of_local function to cp-gimplify.cc
to be able to detect this? Or is there a better way of marking that a
return expression will return a reference to a local for this
transformation? (I guess I can't use whether the warning has been
surpressed or not because the warning might not be enabled at all.)


You could use a TREE_LANG_FLAG, looks like none of them are used on 
RETURN_EXPR.



It looks like this warning is raised also by diag_return_locals in
gimple-ssa-isolate-paths, should the transformation also be made here?


Looks like it already is, in warn_return_addr_local:


  tree zero = build_zero_cst (TREE_TYPE (val));
  gimple_return_set_retval (return_stmt, zero);
  update_stmt (return_stmt);


...but, weirdly, only with -fisolate-erroneous-paths-*, even though it 
isn't isolating anything.  Perhaps there should be another flag for this.



I note that the otherwise very similar -Wdangling-pointer warning
doesn't do this transformation either, should that also be something I
look into fixing here?


With that same flag, perhaps.  I wonder if it would make sense to remove 
the isolate-paths handling of locals in favor of the dangling-pointer 
handling?  I don't know either file much at all.


Jason



Re: [WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-21 Thread David Malcolm via Gcc-patches
On Fri, 2023-07-21 at 17:35 +0200, Benjamin Priour wrote:
> Hi,
> 
> Upon David's request I've joined the in progress patch to the below
> email.
> I hope it makes more sense now.
> 
> Best,
> Benjamin.

Thanks for posting the work-in-progress patch; it makes the idea
clearer.

Some thoughts about this:

- I like the idea of defaulting to *not* showing events within system
headers, which the patch achieves
- I don't like the combination of never/system with maxdepth, in that
it seems complicated and I don't think a user is likely to experiment
with different depths.
- Hence I think it would work better as a simple boolean, perhaps
  "-fanalyzer-show-events-in-system-headers"
  or somesuch?  It seems like the sort of thing that we want to provide
a sensible default for, but have the option of turning off for
debugging the analyzer itself, but I don't expect an end-user to touch
that option.

FWIW the patch seems to have been mangled somewhat via email, so I
don't have a sense of what the actual output from patched analyzer
looks like.  What should we output to the user with -fanalyzer and no
other options for the case in PR 110543?  Currently, for
https://godbolt.org/z/sb9dM9Gqa trunk emits 12 events, of which
probably only this last one is useful:

  (12) dereference of NULL 'a.std::__shared_ptr_access::operator->()'

What does the output look like with your patch?

Thanks
Dave





> 
> -- Forwarded message -
> From: Benjamin Priour 
> Date: Tue, Jul 18, 2023 at 3:30 PM
> Subject: [RFC] analyzer: Add optional trim of the analyzer
> diagnostics
> going too deep [PR110543]
> To: , David Malcolm 
> 
> 
> Hi,
> 
> I'd like to request comments on a patch I am writing for PR110543.
> The goal of this patch is to reduce the noise of the analyzer emitted
> diagnostics when dealing with
> system headers, or simply diagnostic paths that are too long. The new
> option only affects the display
> of the diagnostics, but doesn't hinder the actual analysis.
> 
> I've defaulted the new option to "system", thus preventing the
> diagnostic
> paths from showing system headers.
> "never" corresponds to the pre-patch behavior, whereas you can also
> specify
> an unsigned value 
> that prevents paths to go deeper than  frames.
> 
> fanalyzer-trim-diagnostics=
> > Common Joined RejectNegative ToLower
> > Var(flag_analyzer_trim_diagnostics)
> > Init("system")
> > -fanalyzer-trim-diagnostics=[never|system|] Trim
> > diagnostics
> > path that are too long before emission.
> > 
> 
> Does it sounds reasonable and user-friendly ?
> 
> Regstrapping was a success against trunk, although one of the newly
> added
> test case fails for c++14.
> Note that the test case below was done with "never", thus behaves
> exactly
> as the pre-patch analyzer
> on x86_64-linux-gnu.
> 
> /* { dg-additional-options "-fdiagnostics-plain-output
> > -fdiagnostics-path-format=inline-events -fanalyzer-trim-
> > diagnostics=never"
> > } */
> > /* { dg-skip-if "" { c++98_only }  } */
> > 
> > #include 
> > struct A {int x; int y;};
> > 
> > int main () {
> >   std::shared_ptr a;
> >   a->x = 4; /* { dg-line deref_a } */
> >   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a
> > } */
> > 
> >   return 0;
> > }
> > 
> > /* { dg-begin-multiline-output "" }
> >   'int main()': events 1-2
> >     |
> >     |
> >     +--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> > 
> > > ::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool
> >  =
> > false]': events 3-4
> >    |
> >    |
> >    +--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool
> >  =
> > false; bool  = false]': events 5-6
> >   |
> >   |
> >   +--> 'std::__shared_ptr<_Tp, _Lp>::element_type*
> > std::__shared_ptr<_Tp, _Lp>::get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]': events 7-8
> >  |
> >  |
> >   <--+
> >   |
> >     'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool
> >  =
> > false; bool  = false]': event 9
> >   |
> >   |
> >    <--+
> >    |
> >  'std::__shared_ptr_access<_Tp, _Lp, ,
> > 
> > > ::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool
> >  =
> > false]': event 10
> >    |
> >    |
> >     <--+
> >     |
> >   'int main()': events 11-12
> >     |
> >     |

[PATCH] c++: fix ICE with constexpr ARRAY_REF [PR110382]

2023-07-21 Thread Marek Polacek via Gcc-patches
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/13?

-- >8 --

This code in cxx_eval_array_reference has been hard to get right.
In r12-2304 I added some code; in r13-5693 I removed some of it.

Here the problematic line is "S s = arr[0];" which causes a crash
on the assert in verify_ctor_sanity:

  gcc_assert (!ctx->object || !DECL_P (ctx->object)
  || ctx->global->get_value (ctx->object) == ctx->ctor);

ctx->object is the VAR_DECL 's', which is correct here.  The second
line points to the problem: we replaced ctx->ctor in
cxx_eval_array_reference:

  new_ctx.ctor = build_constructor (elem_type, NULL); // #1

which I think we shouldn't have; the CONSTRUCTOR we created in
cxx_eval_constant_expression/DECL_EXPR

  new_ctx.ctor = build_constructor (TREE_TYPE (r), NULL);

had the right type.

We still need #1 though.  E.g., in constexpr-96241.C, we never
set ctx.ctor/object before calling cxx_eval_array_reference, so
we have to build a CONSTRUCTOR there.  And in constexpr-101371-2.C
we have a ctx.ctor, but it has the wrong type, so we need a new one.

PR c++/110382

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_array_reference): Create a new constructor
only when we don't already have a matching one.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110382.C: New test.
---
 gcc/cp/constexpr.cc   |  5 -
 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C | 17 +
 2 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index fb94f3cefcb..518b7c7a2d5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -4291,7 +4291,10 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   else
 val = build_value_init (elem_type, tf_warning_or_error);
 
-  if (!SCALAR_TYPE_P (elem_type))
+  if (!SCALAR_TYPE_P (elem_type)
+  /* Create a new constructor only if we don't already have one that
+is suitable.  */
+  && !(ctx->ctor && same_type_p (elem_type, TREE_TYPE (ctx->ctor
 {
   new_ctx = *ctx;
   new_ctx.ctor = build_constructor (elem_type, NULL);
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C
new file mode 100644
index 000..317c5ecfcd5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-110382.C
@@ -0,0 +1,17 @@
+// PR c++/110382
+// { dg-do compile { target c++14 } }
+
+struct S {
+  double a = 0;
+};
+
+constexpr double
+g ()
+{
+  S arr[1];
+  S s = arr[0];
+  (void) arr[0];
+  return s.a;
+}
+
+int main() { return  g (); }

base-commit: 87516efcbe28884c39a8c68e600d11cc91ed96c7
-- 
2.41.0



[PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Hello-

This is an update to the v2 patch series last sent in January:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html

While I did not receive any feedback on the v2 patches yet, they did need some
rebasing on top of other recent commits to input.cc, so I thought it would be
helpful to send them again now. The patches have not otherwise changed from
v2, and the above-linked message explains how all the patches fit in with the
original v1 series sent last November.

Dave, I would appreciate it very much if you could please let me know what you
think of this approach? I feel like the diagnostics we currently
output for _Pragmas are worth improving. As a reminder, say for this example:

=
 #define S "GCC diagnostic ignored \"oops"
 _Pragma(S)
=

We currently output:

=
file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^
=

While after these patches, we would output:

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

Thanks!

-Lewis


[PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
---
 gcc/edit-context.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..ae11b6f2e00 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
 return false;
   if (start.column == 0)
 return false;
+  if (start.generated_data)
+return false;
   if (next_loc.column == 0)
 return false;
+  if (next_loc.generated_data)
+return false;
 
   edited_file &file = get_or_insert_file (start.file);
   if (!m_valid)


  1   2   >