Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-02-02 Thread Andi Kleen
> Patch 7 is OK otherwise, and I'm taking a look at the rest of the
> patches now; thanks.

Any comments on the other patches?

Thanks,
-Andi


[PATCH v2] ira: Add a target hook for callee-saved register cost scale

2025-02-02 Thread H.J. Lu
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala 
Date:   Tue Jun 25 08:37:49 2024 -0500

ira: Scale save/restore costs of callee save registers with block frequency

scales the cost of saving/restoring a callee-save hard register in epilogue
and prologue with the entry block frequency, which, if not optimizing for
size, is 1, for all targets.  As the result, callee-saved registers
may not be used to preserve local variable values across calls on some
targets, like x86.  Add a target hook for the callee-saved register cost
scale in epilogue and prologue used by IRA.  The default version of this
target hook returns 1 if optimizing for size, otherwise returns the entry
block frequency.  Add an x86 version of this target hook to restore the
old behavior prior to the above commit.

PR rtl-optimization/111673
PR rtl-optimization/115932
PR rtl-optimization/116028
PR rtl-optimization/117081
PR rtl-optimization/117082
PR rtl-optimization/118497
* ira-color.cc (assign_hard_reg): Call the target hook for the
callee-saved register cost scale in epilogue and prologue.
* target.def (ira_callee_saved_register_cost_scale): New target
hook.
* targhooks.cc (default_ira_callee_saved_register_cost_scale):
New.
* targhooks.h (default_ira_callee_saved_register_cost_scale):
Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
New.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
New.

Signed-off-by: H.J. Lu 
---
 gcc/config/i386/i386.cc | 11 +++
 gcc/doc/tm.texi |  8 
 gcc/doc/tm.texi.in  |  2 ++
 gcc/ira-color.cc|  3 +--
 gcc/target.def  | 12 
 gcc/targhooks.cc|  8 
 gcc/targhooks.h |  1 +
 7 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f89201684a8..3128973ba79 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
   return false;
 }
 
+/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
+
+static int
+ix86_ira_callee_saved_register_cost_scale (int)
+{
+  return 1;
+}
+
 /* Return true if a set of DST by the expression SRC should be allowed.
This prevents complex sets of likely_spilled hard regs before split1.  */
 
@@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS ix86_preferred_output_reload_class
 #undef TARGET_CLASS_LIKELY_SPILLED_P
 #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
+#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
+#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
+  ix86_ira_callee_saved_register_cost_scale
 
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0de24eda6f0..9f42913a4ef 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3047,6 +3047,14 @@ A target hook which can change allocno class for given 
pseudo from
   The default version of this target hook always returns given class.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE (int 
@var{hard_regno})
+A target hook which returns the callee-saved register @var{hard_regno}
+cost scale in epilogue and prologue used by IRA.
+
+The default version of this target hook returns 1 if optimizing for
+size, otherwise returns the entry block frequency.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_LRA_P (void)
 A target hook which returns true if we use LRA instead of reload pass.
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 631d04131e3..6dbe22581ca 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2388,6 +2388,8 @@ in the reload pass.
 
 @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
 
+@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
+
 @hook TARGET_LRA_P
 
 @hook TARGET_REGISTER_PRIORITY
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 0699b349a1a..233060e1587 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
  mode) - 1)
-  * (optimize_size ? 1 :
- REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+  * targetm.ira_callee_saved_register_cost_scale 
(hard_regno);
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/target.def b/gcc/target.def
index 4050b2

Patch held up in gcc-patches due to size

2025-02-02 Thread Thomas Koenig

Hi,

I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html
to gcc-patches also, as normal, but got back an e-mail that it
was too large. and that a moderator would look at it.

Maybe the limits can be increased a bit, sometimes patches can
be quite large, especially if they contain large test cases
or a large number of generated files.

(Does anybody actually look at the messages, as promised in the e-mail?=

Best regards

Thomas



Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-02 Thread H.J. Lu
On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
 wrote:
>
>
>
> > Am 02.02.2025 um 08:00 schrieb H.J. Lu :
> >
> > Don't increase callee-saved register cost by 1000x, which leads to that
> > callee-saved registers aren't used to preserve local variable values
> > across calls, by capping the scale to 300.
>
> >PR rtl-optimization/111673
> >PR rtl-optimization/115932
> >PR rtl-optimization/116028
> >PR rtl-optimization/117081
> >PR rtl-optimization/118497
> >* ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> >scale to 300.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> > gcc/ira-color.cc | 16 ++--
> > 1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > index 0699b349a1a..707ff188250 100644
> > --- a/gcc/ira-color.cc
> > +++ b/gcc/ira-color.cc
> > @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >  /* We need to save/restore the hard register in
> > epilogue/prologue.  Therefore we increase the cost.  */
> >  {
> > +int scale;
> > +if (optimize_size)
> > +  scale = 1;
> > +else
> > +  {
> > +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > +/* Don't increase callee-saved register cost by 1000x,
> > +   which leads to that callee-saved registers aren't
> > +   used to preserve local variable values across calls,
> > +   by capping the scale to 300.  */
> > +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > +  scale = 300;
>
> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have expected 
> to scale this down to [0, 300] or is MAX a magic value?

There are

* The weights for each insn varies from 0 to REG_FREQ_BASE.
   This constant does not need to be high, as in infrequently executed
   regions we want to count instructions equivalently to optimize for
   size instead of speed.  */
#define REG_FREQ_MAX 1000

/* Compute register frequency from the BB frequency.  When optimizing for size,
   or profile driven feedback is available and the function is never executed,
   frequency is always equivalent.  Otherwise rescale the basic block
   frequency.  */
#define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)\
   || !cfun->cfg->count_max.initialized_p ()) \
  ? REG_FREQ_MAX  \
  : ((bb)->count.to_frequency (cfun)  \
* REG_FREQ_MAX / BB_FREQ_MAX) \
  ? ((bb)->count.to_frequency (cfun)  \
 * REG_FREQ_MAX / BB_FREQ_MAX)\
  : 1)

1000 is the default.  If it isn't 1000, it isn't the default.  I only want
to get a more reasonable default scale, instead of 1000.   Lower
scale will fail the PR rtl-optimization/111673 test on powerpc64.


> > +  }
> >rclass = REGNO_REG_CLASS (hard_regno);
> >add_cost = ((ira_memory_move_cost[mode][rclass][0]
> > + ira_memory_move_cost[mode][rclass][1])
> >* saved_nregs / hard_regno_nregs (hard_regno,
> >  mode) - 1)
> > -   * (optimize_size ? 1 :
> > -  REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> > +   * scale;
> >cost += add_cost;
> >full_cost += add_cost;
> >  }
> > --
> > 2.48.1
> >



-- 
H.J.


Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-02 Thread Richard Biener



> Am 02.02.2025 um 08:59 schrieb H.J. Lu :
> 
> On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
>  wrote:
>> 
>> 
>> 
 Am 02.02.2025 um 08:00 schrieb H.J. Lu :
>>> 
>>> Don't increase callee-saved register cost by 1000x, which leads to that
>>> callee-saved registers aren't used to preserve local variable values
>>> across calls, by capping the scale to 300.
>> 
>>>   PR rtl-optimization/111673
>>>   PR rtl-optimization/115932
>>>   PR rtl-optimization/116028
>>>   PR rtl-optimization/117081
>>>   PR rtl-optimization/118497
>>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
>>>   scale to 300.
>>> 
>>> Signed-off-by: H.J. Lu 
>>> ---
>>> gcc/ira-color.cc | 16 ++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
>>> index 0699b349a1a..707ff188250 100644
>>> --- a/gcc/ira-color.cc
>>> +++ b/gcc/ira-color.cc
>>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>>> /* We need to save/restore the hard register in
>>>epilogue/prologue.  Therefore we increase the cost.  */
>>> {
>>> +int scale;
>>> +if (optimize_size)
>>> +  scale = 1;
>>> +else
>>> +  {
>>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>> +/* Don't increase callee-saved register cost by 1000x,
>>> +   which leads to that callee-saved registers aren't
>>> +   used to preserve local variable values across calls,
>>> +   by capping the scale to 300.  */
>>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
>>> +  scale = 300;
>> 
>> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have expected 
>> to scale this down to [0, 300] or is MAX a magic value?
> 
> There are
> 
> * The weights for each insn varies from 0 to REG_FREQ_BASE.
>   This constant does not need to be high, as in infrequently executed
>   regions we want to count instructions equivalently to optimize for
>   size instead of speed.  */
> #define REG_FREQ_MAX 1000
> 
> /* Compute register frequency from the BB frequency.  When optimizing for 
> size,
>   or profile driven feedback is available and the function is never executed,
>   frequency is always equivalent.  Otherwise rescale the basic block
>   frequency.  */
> #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)
> \
>   || !cfun->cfg->count_max.initialized_p ()) \
>  ? REG_FREQ_MAX  \
>  : ((bb)->count.to_frequency (cfun)  \
>* REG_FREQ_MAX / BB_FREQ_MAX) \
>  ? ((bb)->count.to_frequency (cfun)  \
> * REG_FREQ_MAX / BB_FREQ_MAX)\
>  : 1)
> 
> 1000 is the default.  If it isn't 1000, it isn't the default.  I only want
> to get a more reasonable default scale, instead of 1000.   Lower
> scale will fail the PR rtl-optimization/111673 test on powerpc64.

I see.  Why not adjust the above macro then?  That would be a bit more obvious. 
 Like use MAX/2 or so?

> 
> 
>>> +  }
>>>   rclass = REGNO_REG_CLASS (hard_regno);
>>>   add_cost = ((ira_memory_move_cost[mode][rclass][0]
>>>+ ira_memory_move_cost[mode][rclass][1])
>>>   * saved_nregs / hard_regno_nregs (hard_regno,
>>> mode) - 1)
>>> -   * (optimize_size ? 1 :
>>> -  REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> +   * scale;
>>>   cost += add_cost;
>>>   full_cost += add_cost;
>>> }
>>> --
>>> 2.48.1
>>> 
> 
> 
> 
> --
> H.J.


Re: [PATCH] ira: Cap callee-saved register cost scale to 300

2025-02-02 Thread H.J. Lu
On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
 wrote:
>
>
>
> > Am 02.02.2025 um 08:59 schrieb H.J. Lu :
> >
> > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> >  wrote:
> >>
> >>
> >>
>  Am 02.02.2025 um 08:00 schrieb H.J. Lu :
> >>>
> >>> Don't increase callee-saved register cost by 1000x, which leads to that
> >>> callee-saved registers aren't used to preserve local variable values
> >>> across calls, by capping the scale to 300.
> >>
> >>>   PR rtl-optimization/111673
> >>>   PR rtl-optimization/115932
> >>>   PR rtl-optimization/116028
> >>>   PR rtl-optimization/117081
> >>>   PR rtl-optimization/118497
> >>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> >>>   scale to 300.
> >>>
> >>> Signed-off-by: H.J. Lu 
> >>> ---
> >>> gcc/ira-color.cc | 16 ++--
> >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> >>> index 0699b349a1a..707ff188250 100644
> >>> --- a/gcc/ira-color.cc
> >>> +++ b/gcc/ira-color.cc
> >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >>> /* We need to save/restore the hard register in
> >>>epilogue/prologue.  Therefore we increase the cost.  */
> >>> {
> >>> +int scale;
> >>> +if (optimize_size)
> >>> +  scale = 1;
> >>> +else
> >>> +  {
> >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> >>> +/* Don't increase callee-saved register cost by 1000x,
> >>> +   which leads to that callee-saved registers aren't
> >>> +   used to preserve local variable values across calls,
> >>> +   by capping the scale to 300.  */
> >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> >>> +  scale = 300;
> >>
> >> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have 
> >> expected to scale this down to [0, 300] or is MAX a magic value?
> >
> > There are
> >
> > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> >   This constant does not need to be high, as in infrequently executed
> >   regions we want to count instructions equivalently to optimize for
> >   size instead of speed.  */
> > #define REG_FREQ_MAX 1000
> >
> > /* Compute register frequency from the BB frequency.  When optimizing for 
> > size,
> >   or profile driven feedback is available and the function is never 
> > executed,
> >   frequency is always equivalent.  Otherwise rescale the basic block
> >   frequency.  */
> > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)  
> >   \
> >   || !cfun->cfg->count_max.initialized_p ())
> >  \
> >  ? REG_FREQ_MAX 
> >  \
> >  : ((bb)->count.to_frequency (cfun) 
> >  \
> >* REG_FREQ_MAX / BB_FREQ_MAX)
> >  \
> >  ? ((bb)->count.to_frequency (cfun) 
> >  \
> > * REG_FREQ_MAX / BB_FREQ_MAX)   
> >  \
> >  : 1)
> >
> > 1000 is the default.  If it isn't 1000, it isn't the default.  I only want
> > to get a more reasonable default scale, instead of 1000.   Lower
> > scale will fail the PR rtl-optimization/111673 test on powerpc64.
>
> I see.  Why not adjust the above macro then?  That would be a bit more 
> obvious.  Like use MAX/2 or so?

commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala 
Date:   Tue Jun 25 08:37:49 2024 -0500

ira: Scale save/restore costs of callee save registers with block frequency

uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
I don't want to change REG_FREQ_FROM_BB since it is used in other places,
not as a cost scale.  Maybe the above commit should be reverted and we add
a target hook for callee-saved register cost scale.  Each target can choose
a proper cost scale, install of increasing the cost by 1000x for everyone.

> >
> >
> >>> +  }
> >>>   rclass = REGNO_REG_CLASS (hard_regno);
> >>>   add_cost = ((ira_memory_move_cost[mode][rclass][0]
> >>>+ ira_memory_move_cost[mode][rclass][1])
> >>>   * saved_nregs / hard_regno_nregs (hard_regno,
> >>> mode) - 1)
> >>> -   * (optimize_size ? 1 :
> >>> -  REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> >>> +   * scale;
> >>>   cost += add_cost;
> >>>   full_cost += add_cost;
> >>> }
> >>> --
> >>> 2.48.1
> >>>
> >
> >
> >
> > --
> > H.J.



-- 
H.J.


[PATCH] arm: testsuite: Adapt mve-vabs.c to improved codegen

2025-02-02 Thread Thiago Jung Bauermann
Since commit r15-491-gc290e6a0b7a9de this failure happens on on
armv8l-linux-gnueabihf and arm-eabi:

Running gcc:gcc.target/arm/simd/simd.exp ...
gcc.target/arm/simd/mve-vabs.c: memmove found 0 times
FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3

In PR PR target/116010, Andrew Pinski noted that
"gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict
instead of memmove. That should be a simple fix there."

Therefore change the test to expect memcpy rather than memmove.

Another change is that memcpy is inlined rather than called, so also change
the test to check the optimized tree dump rather than the generated
assembly.

Tested on armv8l-linux-gnueabihf and arm-eabi.

gcc/testsuite/ChangeLog:
PR target/116010
* gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new
code.

Suggested-by: Andrew Pinski 
---
 gcc/testsuite/gcc.target/arm/simd/mve-vabs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
index f2f9ee349906..e85d0b18ee71 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c
@@ -1,7 +1,7 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
-/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations 
-fdump-tree-optimized" } */
 
 #include 
 #include 
@@ -35,10 +35,10 @@ FUNC_FLOAT(f, float, 32, 4, vabs)
 FUNC(f, float, 16, 8, vabs)
 
 /* Taking the absolute value of an unsigned value is a no-op, so half of the
-   integer optimizations actually generate a call to memmove, the other ones a
+   integer optimizations actually generate a call to memcpy, the other ones a
'vabs'.  */
 /* { dg-final { scan-assembler-times {vabs.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
 /* { dg-final { scan-assembler-times {vabs.f[0-9]+\tq[0-9]+, q[0-9]+} 2 } } */
 /* { dg-final { scan-assembler-times {vldr[bhw].[0-9]+\tq[0-9]+} 5 } } */
 /* { dg-final { scan-assembler-times {vstr[bhw].[0-9]+\tq[0-9]+} 5 } } */
-/* { dg-final { scan-assembler-times {memmove} 3 } } */
+/* { dg-final { scan-tree-dump-times "memcpy" 3 "optimized" } } */


Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]

2025-02-02 Thread A J Ryan Solutions Ltd
On Monday, 27 January 2025 at 19:48, Jason Merrill - jason at redhat.com 
 wrote:
>
> On 1/27/25 1:26 PM, Patrick Palka wrote:
>
> > On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote:
> >
> > > Hi and happy new year, this patch is to fix a compiler seg-fault as 
> > > encountered in the following example:
> >
> > Hi, thanks for the patch! Your fix makes sense to me, and I believe it
> > also fixes the testcases from PR102626 and at least some of its related
> > PRs.
>
>
> Yes, thanks!
>
> In addition to Patrick's comments, all of which I agree with, the
> subject line should start with "c++" rather than "gcc". We also leave
> out a closing period.
>
> The subject line, and the rest of the commit message, must have lines no
> longer than 76 characters.
>
> For the subject line specifically, under 70 characters is preferable,
> for the sake of git log --oneline.
>
> Also, the testcase should not go in the main g++.dg/ directory, but a
> subdirectory, with a filename indicating what it tests. Perhaps
> cpp1z/variadic-nontype1.C
>
> > > template struct Class1{};
> > >
> > > template class Class2;
> > > template...Un> class Class2
> > > {
> > > public: void apply(){}
> > > };
> > >
> > > Class1 class1_bool;
> > > Class1 class1_char;
> > >
> > > int main()
> > > {
> > > Class2 class2;
> > > class2.apply();
> > > }
> >
> > > where it ends up populating the V argument in the instantiated template 
> > > at the class2 declaration with the unknown placeholder and the segfault 
> > > is when it later tries to get the name for an incompletely
> > > resolved type.
> >
> > Here's a more reduced example that's fixed by your patch:
> >
> > template struct A { };
> >
> > template
> > void f(A...);
> >
> > int main() {
> > f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long}
> > }
> >
> > A related PR that isn't and shouldn't be fixed by this patch is PR84796
> > where the type parameter pack is declared at a different level than
> > the non-type parameter pack, e.g.:
> >
> > template
> > struct A { };
> >
> > template
> > struct B {
> > template
> > static void f(A);
> > };
> >
> > int main() {
> > A<1, true, 'c'> a;
> > B::f(a);
> > }
> >
> > Before your patch we would ICE on this valid testcase, after we now reject 
> > it
> > with a bogus error:
> >
> > : In function ‘int main()’:
> > :12:24: error: no matching function for call to ‘B > char>::f(A<1, true, 'c'>&)’
> > :12:24: note: there is 1 candidate
> > :7:15: note: candidate 1: ‘template static void 
> > B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’
> > :7:15: note: template argument deduction/substitution failed:
> > :12:24: note: ‘Vs’ is not equivalent to ‘1’
> >
> > Which is no worse than ICEing, I suppose :)

It's interesting you've raised this as along with this bug I also investigated 
what I thought was a related failure in PR78216 and in the end that was due to 
the issue your mentioning here with multilevel parameter packs. I created a fix 
for it but when I combined and tested it with this one it led to an ICE further 
down the line and it led me to consider if this fix is correct or if it is 
leading to the construction of invalid chains / nodes. Anyway I had a chance to 
look at it again today and found it was exposing another fault in the 
multi-level param handling. I'll look at putting that one up next.

> >
> > > The incorrect processing happens in unify_pack_expansion where it is 
> > > calling unify_one_argument
> > > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718)
> > >  for each argument of the argument pack and recursively unifying the 
> > > inner arguments. In
> > > this example the inner argument against V (bool/char) is set in targs 
> > > however it is not seen as a pack argument in unify_pack_expansion because 
> > > in the unfixed code it does not reach and link this associated
> > > pack and later on when processing the collected arguments to substitute 
> > > it in in instantiate_class_template the argument has not been set as an 
> > > argument pack in targs and it doesn't match the the template
> > > specialisation parameter which is a parameter pack and it falls back to 
> > > the unknown placeholder.
> > > The parameter packs are originally linked in make_pack_expansion 
> > > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246)
> > >  and the change is in
> > > this tree walk so that in the example above V is chained to Un and after 
> > > calling unify_one_argument it loops over both packs and extracts their 
> > > associated value set in targs and puts it into it's pack
> > > argument collection.
> > >
> > > Regards
> > > Adam Ryan
> >
> > > Subject: [PATCH] Add tree walk case to obtain A pack in
> > > ...B>.
> > >
> > > For non-type parameter packs when unifying the arguments in 
> > > unify_pack_expansion
> > > it iterates over the associated packs of a param so 

[PATCH v2] c++: Add tree walk case to reach A pack from B in ...B> [PR118265]

2025-02-02 Thread A J Ryan Solutions Ltd
This version has all the updates as per feedback from version 1. It makes a 
minor correction to the code styling, reformats the commit message and moves 
the test into the cpp1z directory.
In addition I've updated the test to conform with c++17 for better coverage. 
Andrew Pinski had put one up on the ticket to use, it would be c++20, I can 
switch it to that if there was another reason to use it that I've overlooked.

Regards
Adam RyanFrom 966d4d2d9cdf7181162f15ff0e167270a8207b32 Mon Sep 17 00:00:00 2001
From: Adam J Ryan 
Date: Wed, 1 Jan 2025 13:31:01 +
Subject: [PATCH] Add tree walk case to obtain A pack in
 ...B>.

For non-type parameter packs when unifying the arguments in
unify_pack_expansion it iterates over the associated packs of a param so
that when it recursively unifies the param with the arguments it knows
which targs have been populated with parameter pack arguments that it can
then collect up. This change adds a tree walk so that in the example above
it reaches ...A and adds it to the associated packs for ...B and therefore
knows it will have been set in targs in unify_pack_expansion and processes
it as per other pack arguments.

PR c++/118265

gcc/cp/ChangeLog:

 	* pt.cc (find_parameter_packs_r) :
 	Walk into the type of a parameter pack.

Signed-off-by: Adam J Ryan 
---
 gcc/cp/pt.cc   |  5 +
 gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C | 18 ++
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 77583dd6bb5..61ba4b98e3d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4010,6 +4010,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, void* data)
 		&find_parameter_packs_r, ppd, ppd->visited);
   return NULL_TREE;
 
+case TEMPLATE_PARM_INDEX:
+  if (parameter_pack_p)
+WALK_SUBTREE (TREE_TYPE (t));
+  return NULL_TREE;
+
 case DECL_EXPR:
   {
 	tree decl = DECL_EXPR_DECL (t);
diff --git a/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C b/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C
new file mode 100644
index 000..ad2af623b13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C
@@ -0,0 +1,18 @@
+// { dg-do compile { target c++17 } }
+struct Class1
+{
+void apply_bool(bool){}
+void apply_char(char){}
+};
+
+template struct Class2;
+template struct Class2
+{
+void apply(){}
+};
+
+int main()
+{
+Class2<&Class1::apply_bool, &Class1::apply_char> class2;
+class2.apply ();
+}
-- 
2.43.0



Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-02-02 Thread Andi Kleen
On Sun, Feb 02, 2025 at 09:35:52PM -0800, Andi Kleen wrote:
> > Patch 7 is OK otherwise, and I'm taking a look at the rest of the
> > patches now; thanks.
> 
> Any comments on the other patches?

nm. I see you already commented. somehow i missed that.

-Andi


Re: [PATCH v2 4/7] Add a cache of recent lines

2025-02-02 Thread Andi Kleen
> 
> If I reading this right, calls to get_next_line lead to insertions into
> the ring buffer whilst the buffer is empty or the last line in the ring
> buffer cache is m_line_num - 1.
> 
> There are a few places where we update m_line_num, but this caching
> code doesn't seem to touch those places.  Should it?  I don't know if
> the lack of a reset is an issue, but it's an aspect of the patch that's
> a bit hazy to me; sorry.

The idea was that there is only a single cursor for this cache,
assuming the main parser algorithm goes linearly through the file.

So even if m_line_num changes temporarily for some warning 
it will eventually go back to where that cursor was. So
I didn't bother to refill the cache on these changes.

There is a good chance that the changed position hits the cache
if it's within its range.

If there is an access pattern where this assumption is not true
the cache will not work, but the normal anchors still do of course.

I will add a comment describing this.

> If I'm reading this right, the caching that this adds is only for the
> final 256 lines read so far in the file, and lets us use a line offset
> relative to the most recent entry to go direct to such a recent line in
> the file.

Right.

-Andi


Re: [PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function

2025-02-02 Thread Andi Kleen
On Tue, Jan 28, 2025 at 09:50:41AM +0100, Richard Biener wrote:
> On Mon, Jan 27, 2025 at 9:59 PM David Malcolm  wrote:
> >
> > On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote:
> > > From: Andi Kleen 
> > >
> > > This is the hot function in input.cc
> > >
> > > The vectorizer can vectorize it now, but in a generic cpu O2 x86
> > > build it isn't.
> > > Add a automatic target clone to handle it for x86 and build
> > > that function with O3.
> > >
> > > The ifdef here is ugly, perhaps gcc should have a more convenient
> > > "clone for vectorization if possible" attribute to handle this
> > > portably.
> >
> > This patch is very cool (no pun intended); how much does it help?
> >
> > The patch is OK by me, but given that we're in stage 4, does a release
> > manager approve? [CCed]
> 
> I'd like to see good evidence that it helps and doesn't cause issues - IIRC
> target_clones requires IFUNCs so this needs to be likely guarded not only
> on the host architecture but also the host OS.

True.

I don't have data actually. With the look behind cache the
-Wmisleading-indentation test case runs fast with not much searching,
and I currently don't have another test case that is slow.

I could create the situation artificially with some params, 
but that seems wrong.

I'll drop it for now.

I prefer using the vectorizer if possible.

-Andi


[PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]

2025-02-02 Thread Jakub Jelinek
Hi!

As mentioned in the PR, this pedwarni is desirable for the implicit or
explicit capturing of structured bindings in C++17, but in the case of
init-captures the initializer is just some expression and that can include
structured bindings.

So, the following patch limits the warning to non-explicit_init_p.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2025-02-02  Jakub Jelinek  

PR c++/118719
* lambda.cc (add_capture): Only pedwarn about capturing structured
binding if !explicit_init_p.

* g++.dg/cpp1z/decomp63.C: New test.

--- gcc/cp/lambda.cc.jj 2025-01-24 17:37:49.004457905 +0100
+++ gcc/cp/lambda.cc2025-01-31 23:47:08.907034696 +0100
@@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree
return error_mark_node;
}
 
-  if (cxx_dialect < cxx20)
+  if (cxx_dialect < cxx20 && !explicit_init_p)
{
  auto_diagnostic_group d;
  tree stripped_init = tree_strip_any_location_wrapper (initializer);
--- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj2025-01-31 23:54:15.480699418 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp63.C   2025-01-31 23:53:02.998578507 
+0100
@@ -0,0 +1,18 @@
+// PR c++/118719
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+int
+main ()
+{
+  int a[] = { 42 };
+  auto [x] = a;// { dg-warning 
"structured bindings only available with" "" { target c++14_down } }
+   // { dg-message "declared here" 
"" { target c++17_down } .-1 }
+  [=] () { int b = x; (void) b; }; // { dg-warning "captured 
structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [&] () { int b = x; (void) b; }; // { dg-warning "captured 
structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [x] () { int b = x; (void) b; }; // { dg-warning "captured 
structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [&x] () { int b = x; (void) b; };// { dg-warning "captured 
structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } }
+  [x = x] () { int b = x; (void) b; }; // { dg-warning "lambda capture 
initializers only available with" "" { target c++11_only } }
+  [y = x] () { int b = y; (void) b; }; // { dg-warning "lambda capture 
initializers only available with" "" { target c++11_only } }
+  [y = x * 2] () { int b = y; (void) b; }; // { dg-warning "lambda capture 
initializers only available with" "" { target c++11_only } }
+}

Jakub