Re: [PATCH v2 7/7] Add a unit test for random access in the file cache
> Patch 7 is OK otherwise, and I'm taking a look at the rest of the > patches now; thanks. Any comments on the other patches? Thanks, -Andi
[PATCH v2] ira: Add a target hook for callee-saved register cost scale
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b Author: Surya Kumari Jangala Date: Tue Jun 25 08:37:49 2024 -0500 ira: Scale save/restore costs of callee save registers with block frequency scales the cost of saving/restoring a callee-save hard register in epilogue and prologue with the entry block frequency, which, if not optimizing for size, is 1, for all targets. As the result, callee-saved registers may not be used to preserve local variable values across calls on some targets, like x86. Add a target hook for the callee-saved register cost scale in epilogue and prologue used by IRA. The default version of this target hook returns 1 if optimizing for size, otherwise returns the entry block frequency. Add an x86 version of this target hook to restore the old behavior prior to the above commit. PR rtl-optimization/111673 PR rtl-optimization/115932 PR rtl-optimization/116028 PR rtl-optimization/117081 PR rtl-optimization/117082 PR rtl-optimization/118497 * ira-color.cc (assign_hard_reg): Call the target hook for the callee-saved register cost scale in epilogue and prologue. * target.def (ira_callee_saved_register_cost_scale): New target hook. * targhooks.cc (default_ira_callee_saved_register_cost_scale): New. * targhooks.h (default_ira_callee_saved_register_cost_scale): Likewise. * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): New. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. * doc/tm.texi: Regenerated. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): New. Signed-off-by: H.J. Lu --- gcc/config/i386/i386.cc | 11 +++ gcc/doc/tm.texi | 8 gcc/doc/tm.texi.in | 2 ++ gcc/ira-color.cc| 3 +-- gcc/target.def | 12 gcc/targhooks.cc| 8 gcc/targhooks.h | 1 + 7 files changed, 43 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index f89201684a8..3128973ba79 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -20600,6 +20600,14 @@ ix86_class_likely_spilled_p (reg_class_t rclass) return false; } +/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ + +static int +ix86_ira_callee_saved_register_cost_scale (int) +{ + return 1; +} + /* Return true if a set of DST by the expression SRC should be allowed. This prevents complex sets of likely_spilled hard regs before split1. */ @@ -27078,6 +27086,9 @@ ix86_libgcc_floating_mode_supported_p #define TARGET_PREFERRED_OUTPUT_RELOAD_CLASS ix86_preferred_output_reload_class #undef TARGET_CLASS_LIKELY_SPILLED_P #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p +#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE +#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ + ix86_ira_callee_saved_register_cost_scale #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 0de24eda6f0..9f42913a4ef 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3047,6 +3047,14 @@ A target hook which can change allocno class for given pseudo from The default version of this target hook always returns given class. @end deftypefn +@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE (int @var{hard_regno}) +A target hook which returns the callee-saved register @var{hard_regno} +cost scale in epilogue and prologue used by IRA. + +The default version of this target hook returns 1 if optimizing for +size, otherwise returns the entry block frequency. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_LRA_P (void) A target hook which returns true if we use LRA instead of reload pass. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 631d04131e3..6dbe22581ca 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -2388,6 +2388,8 @@ in the reload pass. @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS +@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE + @hook TARGET_LRA_P @hook TARGET_REGISTER_PRIORITY diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc index 0699b349a1a..233060e1587 100644 --- a/gcc/ira-color.cc +++ b/gcc/ira-color.cc @@ -2180,8 +2180,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) + ira_memory_move_cost[mode][rclass][1]) * saved_nregs / hard_regno_nregs (hard_regno, mode) - 1) - * (optimize_size ? 1 : - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); + * targetm.ira_callee_saved_register_cost_scale (hard_regno); cost += add_cost; full_cost += add_cost; } diff --git a/gcc/target.def b/gcc/target.def index 4050b2
Patch held up in gcc-patches due to size
Hi, I sent https://gcc.gnu.org/pipermail/fortran/2025-February/061670.html to gcc-patches also, as normal, but got back an e-mail that it was too large. and that a moderator would look at it. Maybe the limits can be increased a bit, sometimes patches can be quite large, especially if they contain large test cases or a large number of generated files. (Does anybody actually look at the messages, as promised in the e-mail?= Best regards Thomas
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
On Sun, Feb 2, 2025 at 3:33 PM Richard Biener wrote: > > > > > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > > > > Don't increase callee-saved register cost by 1000x, which leads to that > > callee-saved registers aren't used to preserve local variable values > > across calls, by capping the scale to 300. > > >PR rtl-optimization/111673 > >PR rtl-optimization/115932 > >PR rtl-optimization/116028 > >PR rtl-optimization/117081 > >PR rtl-optimization/118497 > >* ira-color.cc (assign_hard_reg): Cap callee-saved register cost > >scale to 300. > > > > Signed-off-by: H.J. Lu > > --- > > gcc/ira-color.cc | 16 ++-- > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > index 0699b349a1a..707ff188250 100644 > > --- a/gcc/ira-color.cc > > +++ b/gcc/ira-color.cc > > @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > /* We need to save/restore the hard register in > > epilogue/prologue. Therefore we increase the cost. */ > > { > > +int scale; > > +if (optimize_size) > > + scale = 1; > > +else > > + { > > +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > +/* Don't increase callee-saved register cost by 1000x, > > + which leads to that callee-saved registers aren't > > + used to preserve local variable values across calls, > > + by capping the scale to 300. */ > > +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > + scale = 300; > > That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected > to scale this down to [0, 300] or is MAX a magic value? There are * The weights for each insn varies from 0 to REG_FREQ_BASE. This constant does not need to be high, as in infrequently executed regions we want to count instructions equivalently to optimize for size instead of speed. */ #define REG_FREQ_MAX 1000 /* Compute register frequency from the BB frequency. When optimizing for size, or profile driven feedback is available and the function is never executed, frequency is always equivalent. Otherwise rescale the basic block frequency. */ #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)\ || !cfun->cfg->count_max.initialized_p ()) \ ? REG_FREQ_MAX \ : ((bb)->count.to_frequency (cfun) \ * REG_FREQ_MAX / BB_FREQ_MAX) \ ? ((bb)->count.to_frequency (cfun) \ * REG_FREQ_MAX / BB_FREQ_MAX)\ : 1) 1000 is the default. If it isn't 1000, it isn't the default. I only want to get a more reasonable default scale, instead of 1000. Lower scale will fail the PR rtl-optimization/111673 test on powerpc64. > > + } > >rclass = REGNO_REG_CLASS (hard_regno); > >add_cost = ((ira_memory_move_cost[mode][rclass][0] > > + ira_memory_move_cost[mode][rclass][1]) > >* saved_nregs / hard_regno_nregs (hard_regno, > > mode) - 1) > > - * (optimize_size ? 1 : > > - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > > + * scale; > >cost += add_cost; > >full_cost += add_cost; > > } > > -- > > 2.48.1 > > -- H.J.
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
> Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > wrote: >> >> >> Am 02.02.2025 um 08:00 schrieb H.J. Lu : >>> >>> Don't increase callee-saved register cost by 1000x, which leads to that >>> callee-saved registers aren't used to preserve local variable values >>> across calls, by capping the scale to 300. >> >>> PR rtl-optimization/111673 >>> PR rtl-optimization/115932 >>> PR rtl-optimization/116028 >>> PR rtl-optimization/117081 >>> PR rtl-optimization/118497 >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost >>> scale to 300. >>> >>> Signed-off-by: H.J. Lu >>> --- >>> gcc/ira-color.cc | 16 ++-- >>> 1 file changed, 14 insertions(+), 2 deletions(-) >>> >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc >>> index 0699b349a1a..707ff188250 100644 >>> --- a/gcc/ira-color.cc >>> +++ b/gcc/ira-color.cc >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) >>> /* We need to save/restore the hard register in >>>epilogue/prologue. Therefore we increase the cost. */ >>> { >>> +int scale; >>> +if (optimize_size) >>> + scale = 1; >>> +else >>> + { >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); >>> +/* Don't increase callee-saved register cost by 1000x, >>> + which leads to that callee-saved registers aren't >>> + used to preserve local variable values across calls, >>> + by capping the scale to 300. */ >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) >>> + scale = 300; >> >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected >> to scale this down to [0, 300] or is MAX a magic value? > > There are > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > This constant does not need to be high, as in infrequently executed > regions we want to count instructions equivalently to optimize for > size instead of speed. */ > #define REG_FREQ_MAX 1000 > > /* Compute register frequency from the BB frequency. When optimizing for > size, > or profile driven feedback is available and the function is never executed, > frequency is always equivalent. Otherwise rescale the basic block > frequency. */ > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > \ > || !cfun->cfg->count_max.initialized_p ()) \ > ? REG_FREQ_MAX \ > : ((bb)->count.to_frequency (cfun) \ >* REG_FREQ_MAX / BB_FREQ_MAX) \ > ? ((bb)->count.to_frequency (cfun) \ > * REG_FREQ_MAX / BB_FREQ_MAX)\ > : 1) > > 1000 is the default. If it isn't 1000, it isn't the default. I only want > to get a more reasonable default scale, instead of 1000. Lower > scale will fail the PR rtl-optimization/111673 test on powerpc64. I see. Why not adjust the above macro then? That would be a bit more obvious. Like use MAX/2 or so? > > >>> + } >>> rclass = REGNO_REG_CLASS (hard_regno); >>> add_cost = ((ira_memory_move_cost[mode][rclass][0] >>>+ ira_memory_move_cost[mode][rclass][1]) >>> * saved_nregs / hard_regno_nregs (hard_regno, >>> mode) - 1) >>> - * (optimize_size ? 1 : >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); >>> + * scale; >>> cost += add_cost; >>> full_cost += add_cost; >>> } >>> -- >>> 2.48.1 >>> > > > > -- > H.J.
Re: [PATCH] ira: Cap callee-saved register cost scale to 300
On Sun, Feb 2, 2025 at 4:20 PM Richard Biener wrote: > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu : > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > wrote: > >> > >> > >> > Am 02.02.2025 um 08:00 schrieb H.J. Lu : > >>> > >>> Don't increase callee-saved register cost by 1000x, which leads to that > >>> callee-saved registers aren't used to preserve local variable values > >>> across calls, by capping the scale to 300. > >> > >>> PR rtl-optimization/111673 > >>> PR rtl-optimization/115932 > >>> PR rtl-optimization/116028 > >>> PR rtl-optimization/117081 > >>> PR rtl-optimization/118497 > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > >>> scale to 300. > >>> > >>> Signed-off-by: H.J. Lu > >>> --- > >>> gcc/ira-color.cc | 16 ++-- > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > >>> index 0699b349a1a..707ff188250 100644 > >>> --- a/gcc/ira-color.cc > >>> +++ b/gcc/ira-color.cc > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > >>> /* We need to save/restore the hard register in > >>>epilogue/prologue. Therefore we increase the cost. */ > >>> { > >>> +int scale; > >>> +if (optimize_size) > >>> + scale = 1; > >>> +else > >>> + { > >>> +scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > >>> +/* Don't increase callee-saved register cost by 1000x, > >>> + which leads to that callee-saved registers aren't > >>> + used to preserve local variable values across calls, > >>> + by capping the scale to 300. */ > >>> +if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > >>> + scale = 300; > >> > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > There are > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > This constant does not need to be high, as in infrequently executed > > regions we want to count instructions equivalently to optimize for > > size instead of speed. */ > > #define REG_FREQ_MAX 1000 > > > > /* Compute register frequency from the BB frequency. When optimizing for > > size, > > or profile driven feedback is available and the function is never > > executed, > > frequency is always equivalent. Otherwise rescale the basic block > > frequency. */ > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > \ > > || !cfun->cfg->count_max.initialized_p ()) > > \ > > ? REG_FREQ_MAX > > \ > > : ((bb)->count.to_frequency (cfun) > > \ > >* REG_FREQ_MAX / BB_FREQ_MAX) > > \ > > ? ((bb)->count.to_frequency (cfun) > > \ > > * REG_FREQ_MAX / BB_FREQ_MAX) > > \ > > : 1) > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want > > to get a more reasonable default scale, instead of 1000. Lower > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > I see. Why not adjust the above macro then? That would be a bit more > obvious. Like use MAX/2 or so? commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b Author: Surya Kumari Jangala Date: Tue Jun 25 08:37:49 2024 -0500 ira: Scale save/restore costs of callee save registers with block frequency uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. I don't want to change REG_FREQ_FROM_BB since it is used in other places, not as a cost scale. Maybe the above commit should be reverted and we add a target hook for callee-saved register cost scale. Each target can choose a proper cost scale, install of increasing the cost by 1000x for everyone. > > > > > >>> + } > >>> rclass = REGNO_REG_CLASS (hard_regno); > >>> add_cost = ((ira_memory_move_cost[mode][rclass][0] > >>>+ ira_memory_move_cost[mode][rclass][1]) > >>> * saved_nregs / hard_regno_nregs (hard_regno, > >>> mode) - 1) > >>> - * (optimize_size ? 1 : > >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > >>> + * scale; > >>> cost += add_cost; > >>> full_cost += add_cost; > >>> } > >>> -- > >>> 2.48.1 > >>> > > > > > > > > -- > > H.J. -- H.J.
[PATCH] arm: testsuite: Adapt mve-vabs.c to improved codegen
Since commit r15-491-gc290e6a0b7a9de this failure happens on on armv8l-linux-gnueabihf and arm-eabi: Running gcc:gcc.target/arm/simd/simd.exp ... gcc.target/arm/simd/mve-vabs.c: memmove found 0 times FAIL: gcc.target/arm/simd/mve-vabs.c scan-assembler-times memmove 3 In PR PR target/116010, Andrew Pinski noted that "gcc.target/arm/simd/mve-vabs.c now calls memcpy because of the restrict instead of memmove. That should be a simple fix there." Therefore change the test to expect memcpy rather than memmove. Another change is that memcpy is inlined rather than called, so also change the test to check the optimized tree dump rather than the generated assembly. Tested on armv8l-linux-gnueabihf and arm-eabi. gcc/testsuite/ChangeLog: PR target/116010 * gcc.target/arm/simd/mve-vabs.c: Test tree dump and adjust to new code. Suggested-by: Andrew Pinski --- gcc/testsuite/gcc.target/arm/simd/mve-vabs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c index f2f9ee349906..e85d0b18ee71 100644 --- a/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vabs.c @@ -1,7 +1,7 @@ /* { dg-do assemble } */ /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ /* { dg-add-options arm_v8_1m_mve_fp } */ -/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ +/* { dg-additional-options "-O3 -funsafe-math-optimizations -fdump-tree-optimized" } */ #include #include @@ -35,10 +35,10 @@ FUNC_FLOAT(f, float, 32, 4, vabs) FUNC(f, float, 16, 8, vabs) /* Taking the absolute value of an unsigned value is a no-op, so half of the - integer optimizations actually generate a call to memmove, the other ones a + integer optimizations actually generate a call to memcpy, the other ones a 'vabs'. */ /* { dg-final { scan-assembler-times {vabs.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */ /* { dg-final { scan-assembler-times {vabs.f[0-9]+\tq[0-9]+, q[0-9]+} 2 } } */ /* { dg-final { scan-assembler-times {vldr[bhw].[0-9]+\tq[0-9]+} 5 } } */ /* { dg-final { scan-assembler-times {vstr[bhw].[0-9]+\tq[0-9]+} 5 } } */ -/* { dg-final { scan-assembler-times {memmove} 3 } } */ +/* { dg-final { scan-tree-dump-times "memcpy" 3 "optimized" } } */
Re: [PATCH] gcc: Add tree walk case to reach A pack from B in ...B>. [PR118265]
On Monday, 27 January 2025 at 19:48, Jason Merrill - jason at redhat.com wrote: > > On 1/27/25 1:26 PM, Patrick Palka wrote: > > > On Wed, 1 Jan 2025, A J Ryan Solutions Ltd wrote: > > > > > Hi and happy new year, this patch is to fix a compiler seg-fault as > > > encountered in the following example: > > > > Hi, thanks for the patch! Your fix makes sense to me, and I believe it > > also fixes the testcases from PR102626 and at least some of its related > > PRs. > > > Yes, thanks! > > In addition to Patrick's comments, all of which I agree with, the > subject line should start with "c++" rather than "gcc". We also leave > out a closing period. > > The subject line, and the rest of the commit message, must have lines no > longer than 76 characters. > > For the subject line specifically, under 70 characters is preferable, > for the sake of git log --oneline. > > Also, the testcase should not go in the main g++.dg/ directory, but a > subdirectory, with a filename indicating what it tests. Perhaps > cpp1z/variadic-nontype1.C > > > > template struct Class1{}; > > > > > > template class Class2; > > > template...Un> class Class2 > > > { > > > public: void apply(){} > > > }; > > > > > > Class1 class1_bool; > > > Class1 class1_char; > > > > > > int main() > > > { > > > Class2 class2; > > > class2.apply(); > > > } > > > > > where it ends up populating the V argument in the instantiated template > > > at the class2 declaration with the unknown placeholder and the segfault > > > is when it later tries to get the name for an incompletely > > > resolved type. > > > > Here's a more reduced example that's fixed by your patch: > > > > template struct A { }; > > > > template > > void f(A...); > > > > int main() { > > f(A<0>{}, A<1u>{}, A<2l>{}); // OK, Ts deduced to {int,unsigned,long} > > } > > > > A related PR that isn't and shouldn't be fixed by this patch is PR84796 > > where the type parameter pack is declared at a different level than > > the non-type parameter pack, e.g.: > > > > template > > struct A { }; > > > > template > > struct B { > > template > > static void f(A); > > }; > > > > int main() { > > A<1, true, 'c'> a; > > B::f(a); > > } > > > > Before your patch we would ICE on this valid testcase, after we now reject > > it > > with a bogus error: > > > > : In function ‘int main()’: > > :12:24: error: no matching function for call to ‘B > char>::f(A<1, true, 'c'>&)’ > > :12:24: note: there is 1 candidate > > :7:15: note: candidate 1: ‘template static void > > B::f(A) [with Ts ...Vs = {Vs ...}; Ts = {int, bool, char}]’ > > :7:15: note: template argument deduction/substitution failed: > > :12:24: note: ‘Vs’ is not equivalent to ‘1’ > > > > Which is no worse than ICEing, I suppose :) It's interesting you've raised this as along with this bug I also investigated what I thought was a related failure in PR78216 and in the end that was due to the issue your mentioning here with multilevel parameter packs. I created a fix for it but when I combined and tested it with this one it led to an ICE further down the line and it led me to consider if this fix is correct or if it is leading to the construction of invalid chains / nodes. Anyway I had a chance to look at it again today and found it was exposing another fault in the multi-level param handling. I'll look at putting that one up next. > > > > > The incorrect processing happens in unify_pack_expansion where it is > > > calling unify_one_argument > > > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l24718) > > > for each argument of the argument pack and recursively unifying the > > > inner arguments. In > > > this example the inner argument against V (bool/char) is set in targs > > > however it is not seen as a pack argument in unify_pack_expansion because > > > in the unfixed code it does not reach and link this associated > > > pack and later on when processing the collected arguments to substitute > > > it in in instantiate_class_template the argument has not been set as an > > > argument pack in targs and it doesn't match the the template > > > specialisation parameter which is a parameter pack and it falls back to > > > the unknown placeholder. > > > The parameter packs are originally linked in make_pack_expansion > > > here(https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/pt.cc;h=0ffa0b53e26a87df3ed2c7202446c68d2800db64;hb=HEAD#l4246) > > > and the change is in > > > this tree walk so that in the example above V is chained to Un and after > > > calling unify_one_argument it loops over both packs and extracts their > > > associated value set in targs and puts it into it's pack > > > argument collection. > > > > > > Regards > > > Adam Ryan > > > > > Subject: [PATCH] Add tree walk case to obtain A pack in > > > ...B>. > > > > > > For non-type parameter packs when unifying the arguments in > > > unify_pack_expansion > > > it iterates over the associated packs of a param so
[PATCH v2] c++: Add tree walk case to reach A pack from B in ...B> [PR118265]
This version has all the updates as per feedback from version 1. It makes a minor correction to the code styling, reformats the commit message and moves the test into the cpp1z directory. In addition I've updated the test to conform with c++17 for better coverage. Andrew Pinski had put one up on the ticket to use, it would be c++20, I can switch it to that if there was another reason to use it that I've overlooked. Regards Adam RyanFrom 966d4d2d9cdf7181162f15ff0e167270a8207b32 Mon Sep 17 00:00:00 2001 From: Adam J Ryan Date: Wed, 1 Jan 2025 13:31:01 + Subject: [PATCH] Add tree walk case to obtain A pack in ...B>. For non-type parameter packs when unifying the arguments in unify_pack_expansion it iterates over the associated packs of a param so that when it recursively unifies the param with the arguments it knows which targs have been populated with parameter pack arguments that it can then collect up. This change adds a tree walk so that in the example above it reaches ...A and adds it to the associated packs for ...B and therefore knows it will have been set in targs in unify_pack_expansion and processes it as per other pack arguments. PR c++/118265 gcc/cp/ChangeLog: * pt.cc (find_parameter_packs_r) : Walk into the type of a parameter pack. Signed-off-by: Adam J Ryan --- gcc/cp/pt.cc | 5 + gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C | 18 ++ 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 77583dd6bb5..61ba4b98e3d 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -4010,6 +4010,11 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, void* data) &find_parameter_packs_r, ppd, ppd->visited); return NULL_TREE; +case TEMPLATE_PARM_INDEX: + if (parameter_pack_p) +WALK_SUBTREE (TREE_TYPE (t)); + return NULL_TREE; + case DECL_EXPR: { tree decl = DECL_EXPR_DECL (t); diff --git a/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C b/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C new file mode 100644 index 000..ad2af623b13 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/variadic-nontype1.C @@ -0,0 +1,18 @@ +// { dg-do compile { target c++17 } } +struct Class1 +{ +void apply_bool(bool){} +void apply_char(char){} +}; + +template struct Class2; +template struct Class2 +{ +void apply(){} +}; + +int main() +{ +Class2<&Class1::apply_bool, &Class1::apply_char> class2; +class2.apply (); +} -- 2.43.0
Re: [PATCH v2 7/7] Add a unit test for random access in the file cache
On Sun, Feb 02, 2025 at 09:35:52PM -0800, Andi Kleen wrote: > > Patch 7 is OK otherwise, and I'm taking a look at the rest of the > > patches now; thanks. > > Any comments on the other patches? nm. I see you already commented. somehow i missed that. -Andi
Re: [PATCH v2 4/7] Add a cache of recent lines
> > If I reading this right, calls to get_next_line lead to insertions into > the ring buffer whilst the buffer is empty or the last line in the ring > buffer cache is m_line_num - 1. > > There are a few places where we update m_line_num, but this caching > code doesn't seem to touch those places. Should it? I don't know if > the lack of a reset is an issue, but it's an aspect of the patch that's > a bit hazy to me; sorry. The idea was that there is only a single cursor for this cache, assuming the main parser algorithm goes linearly through the file. So even if m_line_num changes temporarily for some warning it will eventually go back to where that cursor was. So I didn't bother to refill the cache on these changes. There is a good chance that the changed position hits the cache if it's within its range. If there is an access pattern where this assumption is not true the cache will not work, but the normal anchors still do of course. I will add a comment describing this. > If I'm reading this right, the caching that this adds is only for the > final 256 lines read so far in the file, and lets us use a line offset > relative to the most recent entry to go direct to such a recent line in > the file. Right. -Andi
Re: [PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function
On Tue, Jan 28, 2025 at 09:50:41AM +0100, Richard Biener wrote: > On Mon, Jan 27, 2025 at 9:59 PM David Malcolm wrote: > > > > On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote: > > > From: Andi Kleen > > > > > > This is the hot function in input.cc > > > > > > The vectorizer can vectorize it now, but in a generic cpu O2 x86 > > > build it isn't. > > > Add a automatic target clone to handle it for x86 and build > > > that function with O3. > > > > > > The ifdef here is ugly, perhaps gcc should have a more convenient > > > "clone for vectorization if possible" attribute to handle this > > > portably. > > > > This patch is very cool (no pun intended); how much does it help? > > > > The patch is OK by me, but given that we're in stage 4, does a release > > manager approve? [CCed] > > I'd like to see good evidence that it helps and doesn't cause issues - IIRC > target_clones requires IFUNCs so this needs to be likely guarded not only > on the host architecture but also the host OS. True. I don't have data actually. With the look behind cache the -Wmisleading-indentation test case runs fast with not much searching, and I currently don't have another test case that is slow. I could create the situation artificially with some params, but that seems wrong. I'll drop it for now. I prefer using the vectorizer if possible. -Andi
[PATCH] c++: Fix up pedwarn for capturing structured bindings in lambdas [PR118719]
Hi! As mentioned in the PR, this pedwarni is desirable for the implicit or explicit capturing of structured bindings in C++17, but in the case of init-captures the initializer is just some expression and that can include structured bindings. So, the following patch limits the warning to non-explicit_init_p. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2025-02-02 Jakub Jelinek PR c++/118719 * lambda.cc (add_capture): Only pedwarn about capturing structured binding if !explicit_init_p. * g++.dg/cpp1z/decomp63.C: New test. --- gcc/cp/lambda.cc.jj 2025-01-24 17:37:49.004457905 +0100 +++ gcc/cp/lambda.cc2025-01-31 23:47:08.907034696 +0100 @@ -613,7 +613,7 @@ add_capture (tree lambda, tree id, tree return error_mark_node; } - if (cxx_dialect < cxx20) + if (cxx_dialect < cxx20 && !explicit_init_p) { auto_diagnostic_group d; tree stripped_init = tree_strip_any_location_wrapper (initializer); --- gcc/testsuite/g++.dg/cpp1z/decomp63.C.jj2025-01-31 23:54:15.480699418 +0100 +++ gcc/testsuite/g++.dg/cpp1z/decomp63.C 2025-01-31 23:53:02.998578507 +0100 @@ -0,0 +1,18 @@ +// PR c++/118719 +// { dg-do compile { target c++11 } } +// { dg-options "" } + +int +main () +{ + int a[] = { 42 }; + auto [x] = a;// { dg-warning "structured bindings only available with" "" { target c++14_down } } + // { dg-message "declared here" "" { target c++17_down } .-1 } + [=] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [&] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [x] () { int b = x; (void) b; }; // { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [&x] () { int b = x; (void) b; };// { dg-warning "captured structured bindings are a C\\\+\\\+20 extension" "" { target c++17_down } } + [x = x] () { int b = x; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } + [y = x] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } + [y = x * 2] () { int b = y; (void) b; }; // { dg-warning "lambda capture initializers only available with" "" { target c++11_only } } +} Jakub