[PATCH (pushed)] param: document ranger-recompute-depth
gcc/ChangeLog: * doc/invoke.texi: Document new param. --- gcc/doc/invoke.texi | 4 1 file changed, 4 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index def2df4584b..c9482886c5a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16170,6 +16170,10 @@ per supernode, before terminating analysis. Maximum depth of logical expression evaluation ranger will look through when evaluating outgoing edge ranges. +@item ranger-recompute-depth +Maximum depth of instruction chains to consider for recomputation +in the outgoing range calculator. + @item relation-block-limit Maximum number of relations the oracle will register in a basic block. -- 2.40.0
[PATCH] driver: drop flag_var_tracking_assignments flag
The revision r13-259-g76db543db88727 moved a condition from one file to another, but now we do not drop x_flag_var_tracking_assignments as it was done before the mentioned revision. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR driver/108241 gcc/ChangeLog: * opts.cc (finish_options): Drop also x_flag_var_tracking_assignments. gcc/testsuite/ChangeLog: * gcc.dg/pr108241.c: New test. * gcc.dg/pr79570.c: Add also -g option. --- gcc/opts.cc | 1 + gcc/testsuite/gcc.dg/pr108241.c | 63 + gcc/testsuite/gcc.dg/pr79570.c | 2 +- 3 files changed, 65 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/pr108241.c diff --git a/gcc/opts.cc b/gcc/opts.cc index f102c1328b9..fb2e5388ab1 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -1384,6 +1384,7 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, } opts->x_flag_var_tracking = 0; opts->x_flag_var_tracking_uninit = 0; + opts->x_flag_var_tracking_assignments = 0; } /* One could use EnabledBy, but it would lead to a circular dependency. */ diff --git a/gcc/testsuite/gcc.dg/pr108241.c b/gcc/testsuite/gcc.dg/pr108241.c new file mode 100644 index 000..06d210fae68 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr108241.c @@ -0,0 +1,63 @@ +/* PR driver/108241 */ +/* { dg-options "-Os -frounding-math -fvar-tracking-assignments -fno-dce -fno-trapping-math -fno-tree-dce -fno-tree-dse" } */ + +long int n1; +int n2, n3, n4; +char n5; + +void +foo (long int x1, long int x2, int x3, int x4, int x5, char x6, char x7) +{ + char a01 = n2, a02 = x4, a03 = 0; + short int a04; + unsigned short int a05 = x5; + int a06, a07, a08 = a05, a09 = x3, a10 = 0; + long int a11, a12 = x4; + + if (x1) +{ + a07 = x6 + (float)0x101; + a03 = a12 = a01 = a06 = ~0; + + if (x5) + a11 = n5; +} + else +{ + a10 = x3 = n3; + if (n3) + a06 = a05 = x7; +} + + if (n3 < n5) +{ + n4 = (x2 == x4) + !n1; + if (n4 % (n1 % x3)) + { + a04 = n4; + a02 = n2; + } + + if (x3) + { + a05 = !n1 % n2; + a08 = n1; + a04 = x5 + a06; + } + + if (a12) + a09 = n3 + n4; + + a12 = a07; + n3 = a11 % x1; + n5 += x6; + n1 = a04; +} + + n4 = x2 % x5 % a11; + a06 = a10 + a08 % a02 == n4; + a09 = a09 == a01 * x7; + n4 = x4; + a12 += x4 / 0xc000 + !a03; + a03 = !a05; +} diff --git a/gcc/testsuite/gcc.dg/pr79570.c b/gcc/testsuite/gcc.dg/pr79570.c index 00841b9487a..a15be9f201d 100644 --- a/gcc/testsuite/gcc.dg/pr79570.c +++ b/gcc/testsuite/gcc.dg/pr79570.c @@ -1,6 +1,6 @@ /* PR target/79570 */ /* { dg-do compile { target powerpc*-*-* ia64-*-* i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments" } */ +/* { dg-options "-O2 -fselective-scheduling2 -fvar-tracking-assignments -g" } */ /* { dg-warning "changes selective scheduling" "" { target *-*-* } 0 } */ #include "pr69956.c" -- 2.40.0
[PATCH 0/2] Support Intel AMX-COMPLEX
Hi all, These patch aims to add Intel AMX-COMPLEX instructions. Also we added AMX-COMPLEX to -march=graniterapids. The information is based on newly released Intel Architecture Instruction Set Extensions and Future Features. The document comes following: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html Since there are only two instructions under this new ISA, I suppose the risk is low and might get a change for GCC13. So I send the patches out now. Tested on x86_64-pc-linux-gnu. Ok for trunk? BRs, Haochen
[PATCH 2/2] i386: Add AMX-COMPLEX to Granite Rapids
gcc/Changelog: * config/gcc/i386.h (PTA_GRANITERAPIDS): Add PTA_AMX_COMPLEX. --- gcc/config/i386/i386.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index dd9391c492b..1da6dce8e0b 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -2361,7 +2361,7 @@ constexpr wide_int_bitmask PTA_ALDERLAKE = PTA_TREMONT | PTA_ADX | PTA_AVX constexpr wide_int_bitmask PTA_SIERRAFOREST = PTA_ALDERLAKE | PTA_AVXIFMA | PTA_AVXVNNIINT8 | PTA_AVXNECONVERT | PTA_CMPCCXADD; constexpr wide_int_bitmask PTA_GRANITERAPIDS = PTA_SAPPHIRERAPIDS | PTA_AMX_FP16 - | PTA_PREFETCHI; + | PTA_PREFETCHI | PTA_AMX_COMPLEX; constexpr wide_int_bitmask PTA_GRANDRIDGE = PTA_SIERRAFOREST | PTA_RAOINT; constexpr wide_int_bitmask PTA_KNM = PTA_KNL | PTA_AVX5124VNNIW | PTA_AVX5124FMAPS | PTA_AVX512VPOPCNTDQ; -- 2.31.1
[PATCH 1/2] Support Intel AMX-COMPLEX
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect AMX-COMPLEX. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AMX_COMPLEX_SET, OPTION_MASK_ISA2_AMX_COMPLEX_UNSET): New. (ix86_handle_option): Handle -mamx-complex. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AMX_COMPLEX. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-complex. * config.gcc: Add amxcomplexintrin.h. * config/i386/cpuid.h (bit_AMX_COMPLEX): New. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AMX_COMPLEX__. * config/i386/i386-isa.def (AMX_COMPLEX): Add DEF_PTA(AMX_COMPLEX). * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Handle amx-complex. * config/i386/i386.opt: Add option -mamx-complex. * config/i386/immintrin.h: Include amxcomplexintrin.h. * doc/extend.texi: Document amx-complex. * doc/invoke.texi: Document -mamx-complex. * doc/sourcebuild.texi: Document target amx-complex. * config/i386/amxcomplexintrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mamx-complex. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/amx-check.h: Add cpu check for AMX-COMPLEX. * gcc.target/i386/amx-helper.h: Add amx-complex support. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mamx-complex. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add amx-complex. * gcc.target/i386/sse-23.c: Ditto. * lib/target-supports.exp (check_effective_target_amx_complex): New. * gcc.target/i386/amxcomplex-asmatt-1.c: New test. * gcc.target/i386/amxcomplex-asmintel-1.c: Ditto. * gcc.target/i386/amxcomplex-cmmimfp16ps-2.c: Ditto. * gcc.target/i386/amxcomplex-cmmrlfp16ps-2.c: Ditto. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 19 +- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h| 2 + gcc/config.gcc| 2 +- gcc/config/i386/amxcomplexintrin.h| 59 +++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 4 ++ gcc/config/i386/immintrin.h | 2 + gcc/doc/extend.texi | 5 ++ gcc/doc/invoke.texi | 11 ++-- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/amx-check.h | 3 + gcc/testsuite/gcc.target/i386/amx-helper.h| 4 +- .../gcc.target/i386/amxcomplex-asmatt-1.c | 15 + .../gcc.target/i386/amxcomplex-asmintel-1.c | 12 .../i386/amxcomplex-cmmimfp16ps-2.c | 53 + .../i386/amxcomplex-cmmrlfp16ps-2.c | 53 + gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c| 2 +- gcc/testsuite/gcc.target/i386/sse-13.c| 2 +- gcc/testsuite/gcc.target/i386/sse-14.c| 2 +- gcc/testsuite/gcc.target/i386/sse-22.c| 4 +- gcc/testsuite/gcc.target/i386/sse-23.c| 2 +- gcc/testsuite/lib/target-supports.exp | 11 30 files changed, 270 insertions(+), 17 deletions(-) create mode 100644 gcc/config/i386/amxcomplexintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-asmatt-1.c create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-asmintel-1.c create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-cmmimfp16ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/amxcomplex-cmmrlfp16ps-2.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index 5bde0cddb24..61559ed9de2 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -879,6 +879,8 @@ get_available_features (struct __processor_model *cpu_model, { if (eax & bit_AMX_FP16) set_feature (FEATURE_AMX_FP16); + if (edx & bit_AMX_COMPLEX) + set_feature (FEATURE_AMX_COMPLEX); } } diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 0181e06b1c5..d90c558311b 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -117,6 +117,8 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_AMX_TILE | OPTION_MASK_ISA
[PATCH][stage1] gcov: respect -fprofile-prefix-map when it comes to output of .gcda file
Respect the profile prefix map and save .gcda files to a path that is also translated with -fprofile-prefix-map option (if provided). It's a stage 1 material, if you are interested in the fix, please install it, I won't be able to take care of it at that time. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Thanks, Martin PR gcov-profile/105063 gcc/ChangeLog: * coverage.cc (coverage_init): Combine strings with concat and respect profile path mapping. --- gcc/coverage.cc | 24 +++- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/gcc/coverage.cc b/gcc/coverage.cc index 7ed3a5d4ceb..3fd7f6e8e76 100644 --- a/gcc/coverage.cc +++ b/gcc/coverage.cc @@ -112,7 +112,7 @@ static char *bbg_file_name; static unsigned bbg_file_stamp; /* Name of the count data (gcda) file. */ -static char *da_file_name; +static const char *da_file_name; /* The names of merge functions for counters. */ #define STR(str) #str @@ -1259,8 +1259,6 @@ coverage_init (const char *filename) #else const char *separator = "/"; #endif - int len = strlen (filename); - int prefix_len = 0; /* Since coverage_init is invoked very early, before the pass manager, we need to set up the dumping explicitly. This is @@ -1289,26 +1287,19 @@ coverage_init (const char *filename) "prefix %qs", filename, profile_prefix_path); } filename = mangle_path (filename); - len = strlen (filename); } else profile_data_prefix = getpwd (); } - if (profile_data_prefix) -prefix_len = strlen (profile_data_prefix); - /* Name of da file. */ - da_file_name = XNEWVEC (char, len + strlen (GCOV_DATA_SUFFIX) - + prefix_len + 2); - if (profile_data_prefix) -{ - memcpy (da_file_name, profile_data_prefix, prefix_len); - da_file_name[prefix_len++] = *separator; -} - memcpy (da_file_name + prefix_len, filename, len); - strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX); +da_file_name = concat (profile_data_prefix, separator, filename, + GCOV_DATA_SUFFIX, NULL); + else +da_file_name = concat (filename, GCOV_DATA_SUFFIX, NULL); + + da_file_name = remap_profile_filename (da_file_name); bbg_file_stamp = local_tick; if (flag_auto_profile) @@ -1385,7 +1376,6 @@ coverage_finish (void) coverage_obj_finish (fn_ctor, object_checksum); } - XDELETEVEC (da_file_name); da_file_name = NULL; } -- 2.40.0
[PATCH] ipa: propagate attributes for target attribute clones
Hi. The patch propagates noreturn attribute for MV functions. However, I noticed we've got the following ICE when we do the same for TREE_READONLY attr: $ cat tc.c double bar() __attribute__((target_clones("avx,avx2,avx512f,default"))); double bar() { return 1.2f; } int foo() { return (int)bar(); } $ ./xgcc -B. ~/Programming/testcases/tc.c -O3 -c -fprofile-generate /home/marxin/Programming/testcases/tc.c: In function ‘foo’: /home/marxin/Programming/testcases/tc.c:4:5: error: virtual definition of statement not up to date 4 | int foo() { return (int)bar(); } | ^~~ _1 = bar (); during GIMPLE pass: fixup_cfg Thus my ambition is to propagate only noreturn attr. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR ipa/106816 gcc/ChangeLog: * config/i386/i386-features.cc (ix86_get_function_versions_dispatcher): Propagate function attributes for clones. gcc/testsuite/ChangeLog: * g++.target/i386/pr106816.C: New test. Co-Authored-By: H.J. Lu --- gcc/config/i386/i386-features.cc | 1 + gcc/testsuite/g++.target/i386/pr106816.C | 23 +++ 2 files changed, 24 insertions(+) create mode 100644 gcc/testsuite/g++.target/i386/pr106816.C diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index c09abf8fc20..f2b0d59a73c 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -3379,6 +3379,7 @@ ix86_get_function_versions_dispatcher (void *decl) /* Right now, the dispatching is done via ifunc. */ dispatch_decl = make_dispatcher_decl (default_node->decl); TREE_NOTHROW (dispatch_decl) = TREE_NOTHROW (fn); + TREE_THIS_VOLATILE (dispatch_decl) = TREE_THIS_VOLATILE (fn); dispatcher_node = cgraph_node::get_create (dispatch_decl); gcc_assert (dispatcher_node != NULL); diff --git a/gcc/testsuite/g++.target/i386/pr106816.C b/gcc/testsuite/g++.target/i386/pr106816.C new file mode 100644 index 000..0f5cc1f13dd --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr106816.C @@ -0,0 +1,23 @@ +// PR ipa/106816 + +// { dg-do compile } +// { dg-require-ifunc "" } +// { dg-options "-O2 -fdump-tree-optimized" } + +__attribute__((noreturn, target("default"))) void f() +{ + for (;;) {} +} + +__attribute__((noreturn, target("sse4.2,bmi"))) void f() +{ + for (;;) {} +} + +int main() +{ + f(); + return 12345; +} + +/* { dg-final { scan-tree-dump-not "12345" "optimized" } } */ -- 2.40.0
Re: [PATCH v3] rs6000: Fix vector parity support [PR108699]
On Mon, Mar 20, 2023 at 02:31:31PM +0800, Kewen.Lin wrote: > The failures on the original failed case builtin-bitops-1.c > and the associated test case pr108699.c here show that the > current support of parity vector mode is wrong on Power. > The hardware insns vprtyb[wdq] which operate on the least > significant bit of each byte per element, they doesn't match > what RTL opcode parity needs, but the current implementation > expands it with them wrongly. > > This patch is to fix the handling with one more insn vpopcntb. > > Comparing to v2 [1]: > - Use rs6000_vprtyb2 rather than parityb2, and > adjust several places with it accordingly. > > Bootstrapped and regtested on powerpc64-linux-gnu P{8,9} > and powerpc64le-linux-gnu P10. > > Is it ok for trunk? Looks good. Thanks! Segher
Re: [PATCH, rs6000] rs6000: correct vector sign extend built-ins on Big Endian [PR108812]
On Mon, Mar 27, 2023 at 03:14:26PM +0800, Kewen.Lin wrote: > on 2023/3/27 14:16, HAO CHEN GUI wrote: > > This patch removes byte reverse operation before vector integer sign > > extension on Big Endian. These built-ins require to sign extend the > > rightmost > > element. So both BE and LE should do the same operation and the byte > > reversion > > is no need. This patch fixes it. Now these built-ins have the same behavior > > on > > all compilers. The test case is modified also. When extending from sizes A to B the rightmost A in every B. That is the same in every endianness, yes -- it is what the machine insns do after all, it has nothing to do with how the elements are numbered in the ABI :-) > I think the whole define_expand can be removed, we can just use the > define_insn names vsx_sign_extend_qi_* in rs6000-builtins.def (just > like what you changed for __builtin_altivec_vsignextsw2d). A very welcome cleanup :-) > One interesting thing is that we used qi/hi/si in the name for > V16QI/V8HI/V4SI but used v2di for V2DI, could you also adjust the > names from vsx_sign_extend_{qi,hi,si}_* to ..._{v16qi,v8hi,v4si}_* > then make them adopt the same naming style? Yes please :-) Segher
Re: [PATCH] rs6000: Fix vector_set_var_p9 by considering BE [PR108807]
Hi! On Fri, Feb 17, 2023 at 05:55:04PM +0800, Kewen.Lin wrote: > As PR108807 exposes, the current handling in function > rs6000_expand_vector_set_var_p9 doesn't take care of big > endianness. Currently the function is to rotate the > target vector by moving element to-be-set to element 0, > set element 0 with the given val, then rotate back. To > get the permutation control vector for the rotation, it > makes use of lvsr and lvsl, but the element ordering is > different for BE and LE (like element 0 is the most > significant one on BE while the least significant one on > LE), this patch is to add consideration for BE and make > sure permutation control vectors for rotations are expected. > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -7235,22 +7235,26 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, > rtx idx) > >machine_mode shift_mode; >rtx (*gen_ashl)(rtx, rtx, rtx); > - rtx (*gen_lvsl)(rtx, rtx); > - rtx (*gen_lvsr)(rtx, rtx); > + rtx (*gen_pcvr1)(rtx, rtx); > + rtx (*gen_pcvr2)(rtx, rtx); Space before "(" btw, you can fix that at the same time? :-) What does "pcvr" mean? You could put that in a short comment? > + /* Generate one permutation control vector used for rotating the element Ah. Yeah just "/* Permutation control vector */" for the above one prevents all mysteries :-) Patch looks good. Thanks! Segher
Re: [PATCH] aarch64: update ampere1 vectorization cost
Kyrill, We reran on GCC12 and GCC11, reproducing the same improvements (e.g., on fotonik3d) that prompted the changes. I'll apply the backports later this week, unless you have any further concerns… Thanks, Philipp. On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov wrote: > > > > > -Original Message- > > From: Philipp Tomsich > > Sent: Monday, March 27, 2023 9:50 AM > > To: Kyrylo Tkachov > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford > > ; Tamar Christina > > ; Manolis Tsamis > > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost > > > > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov > > wrote: > > > > > > Hi Philipp, > > > > > > > -Original Message- > > > > From: Gcc-patches > > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp > > > > Tomsich > > > > Sent: Monday, March 27, 2023 8:47 AM > > > > To: gcc-patches@gcc.gnu.org > > > > Cc: Richard Sandiford ; Tamar Christina > > > > ; Philipp Tomsich > > ; > > > > Manolis Tsamis > > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost > > > > > > > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred > > > > prior to exhaustive testing of vectorizable workloads against > > > > hardware. > > > > > > > > Adjust the vector costs to achieve the best results and more closely > > > > match the underlying hardware. > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/aarch64/aarch64.cc: Update vector costs for ampere1. > > > > > > > > Co-Authored-By: Manolis Tsamis > > > > > > > > Signed-off-by: Philipp Tomsich > > > > --- > > > > We would like to get this into GCC 13 to avoid having to backport at > > > > the start of the next cycle. > > > > > > > > > > Given this affects only the ampere1 costs that sounds fine to me and > > > fairly > > low risk, you are being trusted that these costs are actually desirable and > > properly validated on the hardware involved. > > > > > > > OK for backports? > > > > > > This is ok for trunk (GCC 13). Do you also want to backport this to other > > branches? > > > > Ampere1 (with the older vector costs) are in GCC12 and GCC11. > > I would like to backport to those as well. > > Ok then, though you may want to run the benchmarks on the branches as well to > make sure the costs give the expected benefit there as well. > Thanks, > Kyrill > > > > > Thanks, > > Philipp. > > > > > Thanks, > > > Kyrill > > > > > > > > > > > gcc/config/aarch64/aarch64.cc | 12 ++-- > > > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.cc > > b/gcc/config/aarch64/aarch64.cc > > > > index b27f4354031..661fff65cea 100644 > > > > --- a/gcc/config/aarch64/aarch64.cc > > > > +++ b/gcc/config/aarch64/aarch64.cc > > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost > > > > thunderx3t110_vector_cost = > > > > > > > > static const advsimd_vec_cost ampere1_advsimd_vector_cost = > > > > { > > > > - 3, /* int_stmt_cost */ > > > > + 1, /* int_stmt_cost */ > > > >3, /* fp_stmt_cost */ > > > >0, /* ld2_st2_permute_cost */ > > > >0, /* ld3_st3_permute_cost */ > > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost > > > > ampere1_advsimd_vector_cost = > > > >8, /* store_elt_extra_cost */ > > > >6, /* vec_to_scalar_cost */ > > > >7, /* scalar_to_vec_cost */ > > > > - 5, /* align_load_cost */ > > > > - 5, /* unalign_load_cost */ > > > > - 2, /* unalign_store_cost */ > > > > - 2 /* store_cost */ > > > > + 4, /* align_load_cost */ > > > > + 4, /* unalign_load_cost */ > > > > + 1, /* unalign_store_cost */ > > > > + 1 /* store_cost */ > > > > }; > > > > > > > > /* Ampere-1 costs for vector insn classes. */ > > > > static const struct cpu_vector_cost ampere1_vector_cost = > > > > { > > > >1, /* scalar_int_stmt_cost */ > > > > - 1, /* scalar_fp_stmt_cost */ > > > > + 3, /* scalar_fp_stmt_cost */ > > > >4, /* scalar_load_cost */ > > > >1, /* scalar_store_cost */ > > > >1, /* cond_taken_branch_cost */ > > > > -- > > > > 2.34.1 > > >
Re: [PATCH (pushed)] param: document ranger-recompute-depth
Bah.. forgot that.. thanks :-) Andrew On 4/3/23 04:04, Martin Liška wrote: gcc/ChangeLog: * doc/invoke.texi: Document new param. --- gcc/doc/invoke.texi | 4 1 file changed, 4 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index def2df4584b..c9482886c5a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16170,6 +16170,10 @@ per supernode, before terminating analysis. Maximum depth of logical expression evaluation ranger will look through when evaluating outgoing edge ranges. +@item ranger-recompute-depth +Maximum depth of instruction chains to consider for recomputation +in the outgoing range calculator. + @item relation-block-limit Maximum number of relations the oracle will register in a basic block.
Re: [PATCH] sanitizer: missing signed integer overflow errors [PR109107]
On Tue, Mar 14, 2023 at 06:50:26PM -0400, Marek Polacek via Gcc-patches wrote: > Here we're failing to detect a signed overflow with -O because match.pd, > since r8-1516, transforms > > c = (a + 1) - (int) (short int) b; > > into > > c = (int) ((unsigned int) a + 4294946117); > > wrongly eliding the overflow. This kind of problems is usually > avoided by using TYPE_OVERFLOW_SANITIZED in the appropriate place. > The first match.pd hunk in the patch fixes it. I've constructed > a testcase for each of the surrounding cases as well. Then I > noticed that fold_binary_loc/associate has the same problem, so I've > added a TYPE_OVERFLOW_SANITIZED there as well (it may be too coarse, > sorry). Then I found yet another problem, but instead of fixing it > now I've opened 109134. I could probably go on and find a dozen more. > > Is this worth doing? > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? > > PR sanitizer/109107 > > gcc/ChangeLog: > > * fold-const.cc (fold_binary_loc): Use TYPE_OVERFLOW_SANITIZED > when associating. > * match.pd: Use TYPE_OVERFLOW_SANITIZED. > > gcc/testsuite/ChangeLog: > > * c-c++-common/ubsan/pr109107-2.c: New test. > * c-c++-common/ubsan/pr109107-3.c: New test. > * c-c++-common/ubsan/pr109107-4.c: New test. > * c-c++-common/ubsan/pr109107.c: New test. Please rename the last test to pr109107-1.c. Otherwise LGTM. Jakub
[og12] OpenACC: Pass pre-allocated 'ptrblock' to 'goacc_noncontig_array_create_ptrblock' [PR76739] (was: [PATCH, OpenACC, v3] Non-contiguous array support for OpenACC data clauses)
Hi! On 2019-11-26T22:49:21+0800, Chung-Lin Tang wrote: > this is a reorg of the last non-contiguous arrays patch. (Sorry, this is still not the master branch integration email...) Just a small clean-up, to simplify other changes that I'm working on: On 2019-11-26T22:49:21+0800, Chung-Lin Tang wrote: > --- libgomp/oacc-parallel.c (revision 278656) > +++ libgomp/oacc-parallel.c (working copy) > +void * > +goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca, > +void *tgt_ptrblock_addr) > +{ > + [...] > + void *ptrblock = gomp_malloc (nca->ptrblock_size); > --- libgomp/target.c (revision 278656) > +++ libgomp/target.c (working copy) > @@ -1044,6 +1114,98 @@ gomp_map_vars_internal (struct gomp_device_descr * > + /* Now we have the target memory allocated, and target offsets of > all > + row blocks assigned and calculated, we can construct the > + accelerator side ptrblock and copy it in. */ > + if (nca->ptrblock_size) > + { > + void *ptrblock = goacc_noncontig_array_create_ptrblock > + (nca, target_ptrblock); > + gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock, > + nca->ptrblock_size, cbufp); > + free (ptrblock); > + } Pushed to devel/omp/gcc-12 branch commit c58b28cb650995a41e1ab0166169799f3991bdd6 "OpenACC: Pass pre-allocated 'ptrblock' to 'goacc_noncontig_array_create_ptrblock' [PR76739]", see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From c58b28cb650995a41e1ab0166169799f3991bdd6 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 15 Mar 2023 14:32:12 +0100 Subject: [PATCH] OpenACC: Pass pre-allocated 'ptrblock' to 'goacc_noncontig_array_create_ptrblock' [PR76739] ... to simplify later changes. No functional change. Follow-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f "Merge non-contiguous array support patches". PR other/76739 libgomp/ * target.c (gomp_map_vars_internal): Pass pre-allocated 'ptrblock' to 'goacc_noncontig_array_create_ptrblock'. * oacc-parallel.c (goacc_noncontig_array_create_ptrblock): Adjust. * oacc-int.h (goacc_noncontig_array_create_ptrblock): Adjust. --- libgomp/ChangeLog.omp | 6 ++ libgomp/oacc-int.h | 3 ++- libgomp/oacc-parallel.c | 5 ++--- libgomp/target.c| 5 +++-- 4 files changed, 13 insertions(+), 6 deletions(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index d8a7e476090..7afb5f43c04 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,5 +1,11 @@ 2023-04-03 Thomas Schwinge + PR other/76739 + * target.c (gomp_map_vars_internal): Pass pre-allocated 'ptrblock' + to 'goacc_noncontig_array_create_ptrblock'. + * oacc-parallel.c (goacc_noncontig_array_create_ptrblock): Adjust. + * oacc-int.h (goacc_noncontig_array_create_ptrblock): Adjust. + * libgomp.texi (AMD Radeon, nvptx): Document OpenMP 'pinned' memory. diff --git a/libgomp/oacc-int.h b/libgomp/oacc-int.h index d86aeb82dfa..28a6118873a 100644 --- a/libgomp/oacc-int.h +++ b/libgomp/oacc-int.h @@ -213,7 +213,8 @@ struct goacc_ncarray_info struct goacc_ncarray ncarray[]; }; -extern void *goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *, void *); +extern void goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *, + void *, void *); #ifdef HAVE_ATTRIBUTE_VISIBILITY diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c index 136702d6e61..8d1c2cce836 100644 --- a/libgomp/oacc-parallel.c +++ b/libgomp/oacc-parallel.c @@ -165,13 +165,13 @@ goacc_process_noncontiguous_arrays (size_t mapnum, void **hostaddrs, return nca_info; } -void * +void goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca, + void *ptrblock, void *tgt_ptrblock_addr) { struct goacc_ncarray_descr_type *descr = nca->descr; void **tgt_data_rows = nca->tgt_data_rows; - void *ptrblock = gomp_malloc (nca->ptrblock_size); void **curr_dim_ptrblock = (void **) ptrblock; size_t n = 1; @@ -210,7 +210,6 @@ goacc_noncontig_array_create_ptrblock (struct goacc_ncarray *nca, curr_dim_ptrblock = next_dim_ptrblock; } assert (n == nca->data_row_num); - return ptrblock; } /* Handle the mapping pair that are presented when a diff --git a/libgomp/target.c b/libgomp/target.c index de3facb6428..b88b1ebaa13 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1939,8 +1939,9 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep, accelerator side ptrblock and copy it in. */ if (nca->ptrblock_size) { - void *ptrblock = goacc_noncontig_array_create_ptrblock -
[PATCH] c++: satisfaction and ARGUMENT_PACK_SELECT [PR105644]
This testcase demonstrates we can legitimately enter satisfaction with an ARGUMENT_PACK_SELECT argument, which is problematic because we can't store such arguments in the satisfaction cache (or any other hash table). Since this appears to be possible only during constrained auto deduction for a return-type-requirement, the most appropriate spot to fix this seems to be from do_auto_deduction, by calling preserve_args to strip A_P_S args before entering satisfaction. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/12? PR c++/105644 gcc/cp/ChangeLog: * pt.cc (do_auto_deduction): Call preserve_args before entering satisfaction for adc_requirement contexts. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires36.C: New test. --- gcc/cp/pt.cc | 6 ++ gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C | 12 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 4429ae66b68..821e0035c08 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -30965,6 +30965,12 @@ do_auto_deduction (tree type, tree init, tree auto_node, return type; } + /* We can see an ARGUMENT_PACK_SELECT argument when evaluating +a return-type-requirement. Get rid of them before entering +satisfaction, since the satisfaction cache can't handle them. */ + if (context == adc_requirement) + outer_targs = preserve_args (outer_targs); + if (context == adc_return_type || context == adc_variable_type || context == adc_decomp_type) diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C b/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C new file mode 100644 index 000..7d13b9b3e54 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires36.C @@ -0,0 +1,12 @@ +// PR c++/105644 +// { dg-do compile { target c++20 } } + +template +concept same_as = __is_same(T, U); + +template +concept C = (requires { { Ts() } -> same_as; } && ...); + +static_assert(C); +static_assert(!C); +static_assert(!C); -- 2.40.0.153.g6369acd968
[og12] '-foffload-memory=pinned' using offloading device interfaces (was: -foffload-memory=pinned)
Hi! On 2023-02-13T15:20:07+, Andrew Stubbs wrote: > On 13/02/2023 14:38, Thomas Schwinge wrote: >> On 2022-03-08T11:30:55+, Hafiz Abid Qadeer >> wrote: >>> From: Andrew Stubbs >>> >>> Add a new option. It will be used in follow-up patches. >> >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >> >>> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this >>> +mode may require the user to increase the ulimit setting for locked >>> memory). >> >> So, this is currently implemented via 'mlockall', which, as discussed, >> (a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it >> meant to achieve (because it doesn't register the page-locked memory with >> the GPU driver). >> [...] >> As '-foffload-memory=pinned', per the name >> of the option, concerns itself with memory used in offloading but not >> host execution generally, why are we actually attempting to "[force] all >> host memory to be pinned" -- why not just the memory that's being used >> with offloading? That is, if '-foffload-memory=pinned' is set, register >> as page-locked with the GPU driver all memory that appears in OMP >> offloading data regions, such as OpenMP 'target' 'map' clauses etc. That >> way, this is directed at the offloading data transfers, as itended, but >> at the same time we don't "waste" page-locked memory for generic host >> memory allocations. What do you think -- you, who've spent a lot more >> time on this topic than I have, so it's likely possible that I fail to >> realize some "details"? > > The main reason it is the way it is is because in general it's not > possible to know what memory is going to be offloaded at the time it is > allocated (and stack/static memory is never allocated that way). > > If there's a way to pin it after the fact then maybe that's not a > terrible idea? [...] I've now pushed to devel/omp/gcc-12 branch my take on this in commit 43095690ea519205bf56fc148b346edaa43e0f0f "'-foffload-memory=pinned' using offloading device interfaces", and for changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f "Merge non-contiguous array support patches": commit 694bbd399c1323975b4a6735646e46c6914de63d "'-foffload-memory=pinned' using offloading device interfaces for non-contiguous array support", see attached. Grüße Thomas - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 >From 43095690ea519205bf56fc148b346edaa43e0f0f Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 30 Mar 2023 10:08:12 +0200 Subject: [PATCH 1/2] '-foffload-memory=pinned' using offloading device interfaces Implemented for nvptx offloading via 'cuMemHostAlloc', 'cuMemHostRegister'. gcc/ * doc/invoke.texi (-foffload-memory=pinned): Document. include/ * cuda/cuda.h (CUresult): Add 'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'. (CUdevice_attribute): Add 'CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED'. (CU_MEMHOSTREGISTER_READ_ONLY): Add. (cuMemHostGetFlags, cuMemHostRegister, cuMemHostUnregister): Add. libgomp/ * libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add 'struct goacc_asyncqueue *' formal parameter. (GOMP_OFFLOAD_page_locked_host_register) (GOMP_OFFLOAD_page_locked_host_unregister) (GOMP_OFFLOAD_page_locked_host_p): Add. * libgomp.h (always_pinned_mode) (gomp_page_locked_host_register_dev) (gomp_page_locked_host_unregister_dev): Add. (struct splay_tree_key_s): Add 'page_locked_host_p'. (struct gomp_device_descr): Add 'GOMP_OFFLOAD_page_locked_host_register', 'GOMP_OFFLOAD_page_locked_host_unregister', 'GOMP_OFFLOAD_page_locked_host_p'. * libgomp.texi (-foffload-memory=pinned): Document. * plugin/cuda-lib.def (cuMemHostGetFlags, cuMemHostRegister_v2) (cuMemHostRegister, cuMemHostUnregister): Add. * plugin/plugin-nvptx.c (struct ptx_device): Add 'read_only_host_register_supported'. (nvptx_open_device): Initialize it. (free_host_blocks, free_host_blocks_lock) (nvptx_run_deferred_page_locked_host_free) (nvptx_page_locked_host_free_callback, nvptx_page_locked_host_p) (GOMP_OFFLOAD_page_locked_host_register) (nvptx_page_locked_host_unregister_callback) (GOMP_OFFLOAD_page_locked_host_unregister) (GOMP_OFFLOAD_page_locked_host_p) (nvptx_run_deferred_page_locked_host_unregister) (nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback): Add. (GOMP_OFFLOAD_fini_device, GOMP_OFFLOAD_page_locked_host_alloc) (GOMP_OFFLOAD_run): Call 'nvptx_run_deferred_page_locked_host_free'. (struct goacc_asyncqueue): Add 'page_locked_host_unregister_blocks_lock', 'page_locked_host_unregister_blocks'. (nvptx_goacc_asyncqueue_construct) (nvptx_goacc_asyncqueue_destruct): Handle those. (GOMP_OFFLOAD_page_locked_host_free): Handle 'struct goacc_asyncqueue *' formal parameter. (GOMP_OFFLOAD_openac
Re: [PATCH] c++: ICE on loopy var tmpl auto deduction [PR109300]
On Wed, 29 Mar 2023, Jason Merrill wrote: > On 3/28/23 13:37, Patrick Palka wrote: > > Now that we resolve non-dependent variable template-ids ahead of time, > > cp_finish_decl needs to handle a new invalid situation: we can end up > > trying to instantiate a variable template with deduced return type > > before we fully parsed (and attached) its initializer. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this OK for > > trunK? > > > > PR c++/109300 > > > > gcc/cp/ChangeLog: > > > > * decl.cc (cp_finish_decl): Diagnose ordinary auto deduction > > with no initializer instead of asserting. > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/cpp1y/var-templ79.C: New test. > > --- > > gcc/cp/decl.cc | 15 ++- > > gcc/testsuite/g++.dg/cpp1y/var-templ79.C | 5 + > > 2 files changed, 19 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ79.C > > > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc > > index 20b980f68c8..2c91693b99d 100644 > > --- a/gcc/cp/decl.cc > > +++ b/gcc/cp/decl.cc > > @@ -8276,7 +8276,20 @@ cp_finish_decl (tree decl, tree init, bool > > init_const_expr_p, > > return; > > } > > - gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node)); > > + if (CLASS_PLACEHOLDER_TEMPLATE (auto_node)) > > + /* Class deduction with no initializer is OK. */; > > + else > > + { > > + /* Ordinary auto deduction without an initializer, a situation > > +which grokdeclarator already catches and rejects for the most > > +part. But we can still get here if we're instantiating a > > +variable template before we've fully parsed (and attached) > > its > > +initializer, e.g. template auto x = x; */ > > In the case of recursively dependent instantiation I'd hope to have an > error_mark_node initializer, rather than none? Do you mean setting the initializer to error_mark_node after the fact, e.g. @@ -8288,7 +8297,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, error_at (DECL_SOURCE_LOCATION (decl), "declaration of %q#D has no initializer", decl); TREE_TYPE (decl) = error_mark_node; - return; + init = error_mark_node; } } d_init = init; or before the fact, i.e. setting DECL_INITIAL to error_mark_node as a sentinel value for detecting recursion before we begin parsing a variable initializer? The former should work I suppose, but the latter is problematic because we also call cp_finish_decl with init=error_mark_node when the initializer is generally invalid, so by overloading the meaning of error_mark_node here and checking for it from cp_finish_decl we would end up emitting a bogus extra diagnostic in a bunch of cases e.g. g++.dg/pr53055.C: int i = p ->* p ; // invalid initializer I guess we would need to use a different sentinel value for detecting recursion, or expose and inspect the 'lambda_scope' stack which already keeps track of whether we're in the middle of a variable initializer... Dunno if it's worth it just for sake of a better diagnostic for this corner case, I notice e.g. Clang doesn't give a great diagnostic either: src/gcc/testsuite/g++.dg/cpp1y/var-templ79.C:5:6: error: declaration of variable 'x' with deduced type 'auto' requires an initializer auto x = x; // { dg-error "" } ^ > > > + error_at (DECL_SOURCE_LOCATION (decl), > > + "declaration of %q#D has no initializer", decl); > > + TREE_TYPE (decl) = error_mark_node; > > + return; > > + } > > } > > d_init = init; > > if (d_init) > > diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ79.C > > b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C > > new file mode 100644 > > index 000..3c0d276153a > > --- /dev/null > > +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C > > @@ -0,0 +1,5 @@ > > +// PR c++/109300 > > +// { dg-do compile { target c++14 } } > > + > > +template > > +auto x = x; // { dg-error "" } > >
Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector
On Mon, 13 Mar 2023 at 13:03, Richard Biener wrote: > > On Fri, 10 Mar 2023, Richard Sandiford wrote: > > > Sorry for the slow reply. > > > > Prathamesh Kulkarni writes: > > > Unfortunately it regresses code-gen for the following case: > > > > > > svint32_t f(int32x4_t x) > > > { > > > return svdupq_s32 (x[0], x[1], x[2], x[3]); > > > } > > > > > > -O2 code-gen with trunk: > > > f: > > > dup z0.q, z0.q[0] > > > ret > > > > > > -O2 code-gen with patch: > > > f: > > > dup s1, v0.s[1] > > > movv2.8b, v0.8b > > > ins v1.s[1], v0.s[3] > > > ins v2.s[1], v0.s[2] > > > zip1v0.4s, v2.4s, v1.4s > > > dup z0.q, z0.q[0] > > > ret > > > > > > IIUC, svdupq_impl::expand uses aarch64_expand_vector_init > > > to initialize the "base 128-bit vector" and then use dupq to replicate it. > > > > > > Without patch, aarch64_expand_vector_init generates fallback code, and > > > then > > > combine optimizes a sequence of vec_merge/vec_select pairs into an > > > assignment: > > > > > > (insn 7 3 8 2 (set (reg:SI 99) > > > (vec_select:SI (reg/v:V4SI 97 [ x ]) > > > (parallel [ > > > (const_int 1 [0x1]) > > > ]))) "bar.c":6:10 2592 {aarch64_get_lanev4si} > > > (nil)) > > > > > > (insn 13 9 15 2 (set (reg:V4SI 102) > > > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 99)) > > > (reg/v:V4SI 97 [ x ]) > > > (const_int 2 [0x2]))) "bar.c":6:10 1794 > > > {aarch64_simd_vec_setv4si} > > > (expr_list:REG_DEAD (reg:SI 99) > > > (expr_list:REG_DEAD (reg/v:V4SI 97 [ x ]) > > > (nil > > > > > > into: > > > Trying 7 -> 13: > > > 7: r99:SI=vec_select(r97:V4SI,parallel) > > >13: r102:V4SI=vec_merge(vec_duplicate(r99:SI),r97:V4SI,0x2) > > > REG_DEAD r99:SI > > > REG_DEAD r97:V4SI > > > Successfully matched this instruction: > > > (set (reg:V4SI 102) > > > (reg/v:V4SI 97 [ x ])) > > > > > > which eventually results into: > > > (note 2 25 3 2 NOTE_INSN_DELETED) > > > (note 3 2 7 2 NOTE_INSN_FUNCTION_BEG) > > > (note 7 3 8 2 NOTE_INSN_DELETED) > > > (note 8 7 9 2 NOTE_INSN_DELETED) > > > (note 9 8 13 2 NOTE_INSN_DELETED) > > > (note 13 9 15 2 NOTE_INSN_DELETED) > > > (note 15 13 17 2 NOTE_INSN_DELETED) > > > (note 17 15 18 2 NOTE_INSN_DELETED) > > > (note 18 17 22 2 NOTE_INSN_DELETED) > > > (insn 22 18 23 2 (parallel [ > > > (set (reg/i:VNx4SI 32 v0) > > > (vec_duplicate:VNx4SI (reg:V4SI 108))) > > > (clobber (scratch:VNx16BI)) > > > ]) "bar.c":7:1 5202 {aarch64_vec_duplicate_vqvnx4si_le} > > > (expr_list:REG_DEAD (reg:V4SI 108) > > > (nil))) > > > (insn 23 22 0 2 (use (reg/i:VNx4SI 32 v0)) "bar.c":7:1 -1 > > > (nil)) > > > > > > I was wondering if we should add the above special case, of assigning > > > target = vec in aarch64_expand_vector_init, if initializer is { > > > vec[0], vec[1], ... } ? > > > > I'm not sure it will be easy to detect that. Won't the inputs to > > aarch64_expand_vector_init just be plain registers? It's not a > > good idea in general to search for definitions of registers > > during expansion. > > > > It would be nice to fix this by lowering svdupq into: > > > > (a) a constructor for a 128-bit vector > > (b) a duplication of the 128-bit vector to fill an SVE vector > > > > But I'm not sure what the best way of doing (b) would be. > > In RTL we can use vec_duplicate, but I don't think gimple > > has an equivalent construct. Maybe Richi has some ideas. > > On GIMPLE it would be > > _1 = { a, ... }; // (a) > _2 = { _1, ... }; // (b) > > but I'm not sure if (b), a VL CTOR of fixed len(?) sub-vectors is > possible? But at least a CTOR of vectors is what we use to > concat vectors. > > With the recent relaxing of VEC_PERM inputs it's also possible to > express (b) with a VEC_PERM: > > _2 = VEC_PERM <_1, _1, { 0, 1, 2, 3, 0, 1, 2, 3, ... }> > > but again I'm not sure if that repeating 0, 1, 2, 3 is expressible > for VL vectors (maybe we'd allow "wrapping" here, I'm not sure). > Hi, Thanks for the suggestions and sorry for late response in turn. The attached patch tries to fix the issue by explicitly constructing a CTOR from svdupq's arguments and then using VEC_PERM_EXPR with VL mask having encoded elements {0, 1, ... nargs-1}, npatterns == nargs, and nelts_per_pattern == 1, to replicate the base vector. So for example, for the above case, svint32_t f_32(int32x4_t x) { return svdupq_s32 (x[0], x[1], x[2], x[3]); } forwprop1 lowers it to: svint32_t _6; vector(4) int _8; : _1 = BIT_FIELD_REF ; _2 = BIT_FIELD_REF ; _3 = BIT_FIELD_REF ; _4 = BIT_FIELD_REF ; _8 = {_1, _2, _3, _4}; _6 = VEC_PERM_EXPR <_8, _8, { 0, 1, 2, 3, ... }>; return _6; which is then eventually optimized to: svint32_t _6; [local count: 1073741824]: _6 = VEC_PERM_EXPR ; return _6; code-gen: f_32: dup z0.q,
Re: [aarch64] Code-gen for vector initialization involving constants
On Mon, 13 Feb 2023 at 11:58, Prathamesh Kulkarni wrote: > > On Fri, 3 Feb 2023 at 12:46, Prathamesh Kulkarni > wrote: > > > > Hi Richard, > > While digging thru aarch64_expand_vector_init, I noticed it gives > > priority to loading a constant first: > > /* Initialise a vector which is part-variable. We want to first try > > to build those lanes which are constant in the most efficient way we > > can. */ > > > > which results in suboptimal code-gen for following case: > > int16x8_t f_s16(int16_t x) > > { > > return (int16x8_t) { x, x, x, x, x, x, x, 1 }; > > } > > > > code-gen trunk: > > f_s16: > > moviv0.8h, 0x1 > > ins v0.h[0], w0 > > ins v0.h[1], w0 > > ins v0.h[2], w0 > > ins v0.h[3], w0 > > ins v0.h[4], w0 > > ins v0.h[5], w0 > > ins v0.h[6], w0 > > ret > > > > The attached patch tweaks the following condition: > > if (n_var == n_elts && n_elts <= 16) > > { > > ... > > } > > > > to pass if maxv >= 80% of n_elts, with 80% being an > > arbitrary "high enough" threshold. The intent is to dup > > the most repeating variable if it it's repetition > > is "high enough" and insert constants which should be "better" than > > loading constant first and inserting variables like in the above case. > > > > Alternatively, I suppose we can remove threshold and for constants, > > generate both sequences and check which one is more > > efficient ? > > > > code-gen with patch: > > f_s16: > > dup v0.8h, w0 > > moviv1.4h, 0x1 > > ins v0.h[7], v1.h[0] > > ret > > > > The patch is lightly tested to verify that vec[t]-init-*.c tests pass > > with bootstrap+test > > in progress. > > Does this look OK ? > Hi Richard, > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html Hi Richard, ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611243.html Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh
[PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]
With the relatively new warnings (11..) affecting VLA bounds, I now get a lot of false positives with -Wall. In general, I find the new warnings very useful, but they seem a bit too aggressive and some minor tweaks are needed, otherwise they are too noisy. This patch suggests two changes: 1. For VLA bounds non-null is implied only when 'static' is used (similar to clang) and not already when a bound > 0 is specified: int foo(int n, char buf[static n]); int foo(10, 0); // warning with 'static' but not without. (It also seems problematic to require a size of 0 to indicate that the pointer may be null, because 0 is not allowed in ISO C as a size. It is also inconsistent to how arrays with static bound behave.) There seems to be agreement about this change in PR98541. 2. GCC always warns when the number of unspecified bounds is different between two declarations: int foo(int n, char buf[*]); int foo(int n, char buf[n]); or int foo(int n, char buf[n]); int foo(int n, char buf[*]); But the first version is useful if the size expression can not be specified in a header (e.g. because it uses a macro or variable not available there) and there is currently no easy way to avoid this. The warning for both cases was by design, but I suggest to limit the warning to the second case. Note that the logic currently applied by GCC is too simplistic anyway, as GCC does not warn for int foo(int x, int y, double m[*][y]); int foo(int x, int y, double m[x][*]); because the number of specified / unspecified bounds is the same. So I suggest to go with the attached patch now and add more precise warnings later if there is more experience with these warning in gernal and if this then still seems desirable. Martin Less warnings for parameters declared as arrays [PR98541, PR98536] To avoid false positivies, tune the warnings for parameters declared as arrays with size expressions. Only warn about null arguments with 'static'. Also do not warn when more bounds are specified in the new declaration than before. PR c/98541 PR c/98536 c-family/ * c-warn.cc (warn_parm_array_mismatch): Do not warn if more bounds are specified. gcc/ * gimple-ssa-warn-access.cc (pass_waccess::maybe_check_access_sizes): For VLA bounds in parameters, only warn about null pointers with 'static'. gcc/testsuite: * gcc.dg/Wnonnull-4: Adapt test. * gcc.dg/Wstringop-overflow-40.c: Adapt test. * gcc.dg/Wvla-parameter-4.c: Adapt test. * gcc.dg/attr-access-2.c: Adapt test. diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc index 9ac43a1af6e..f79fb876142 100644 --- a/gcc/c-family/c-warn.cc +++ b/gcc/c-family/c-warn.cc @@ -3599,23 +3599,13 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms) continue; } - if (newunspec != curunspec) + if (newunspec > curunspec) { location_t warnloc = newloc, noteloc = origloc; const char *warnparmstr = newparmstr.c_str (); const char *noteparmstr = curparmstr.c_str (); unsigned warnunspec = newunspec, noteunspec = curunspec; - if (newunspec < curunspec) - { - /* If the new declaration has fewer unspecified bounds -point the warning to the previous declaration to make -it clear that that's the one to change. Otherwise, -point it to the new decl. */ - std::swap (warnloc, noteloc); - std::swap (warnparmstr, noteparmstr); - std::swap (warnunspec, noteunspec); - } if (warning_n (warnloc, OPT_Wvla_parameter, warnunspec, "argument %u of type %s declared with " "%u unspecified variable bound", @@ -3641,16 +3631,11 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms) continue; } } - /* Iterate over the lists of VLA variable bounds, comparing each -pair for equality, and diagnosing mismatches. The case of -the lists having different lengths is handled above so at -this point they do . */ - for (tree newvbl = newa->size, curvbl = cura->size; newvbl; +pair for equality, and diagnosing mismatches. */ + for (tree newvbl = newa->size, curvbl = cura->size; newvbl && curvbl; newvbl = TREE_CHAIN (newvbl), curvbl = TREE_CHAIN (curvbl)) { - gcc_assert (curvbl); - tree newpos = TREE_PURPOSE (newvbl); tree curpos = TREE_PURPOSE (curvbl); @@ -3663,7 +3648,6 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms) and both are the same expression
[PATCH] Fortran: reject module variable as character length in PARAMETER [PR104349]
Dear all, the attached patch fixes an ICE-on-invalid for a PARAMETER expression where the character length was a MODULE variable. The ICE seemed strange, as we were catching related erroneous code for declarations in programs or subroutines. Removing a seemingly bogus check of restricted expressions is the simplest way to fix this. (We could also catch this differently in decl.cc). Besides, this also fixes an accepts-invalid, see testcase. :-) Regtested on x86_64-pc-linux-gnu. OK for mainline (13) or rather wait? Thanks, Harald From 37136ce94b44149dd013b3d7fed7adba769241e6 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Mon, 3 Apr 2023 21:34:01 +0200 Subject: [PATCH] Fortran: reject module variable as character length in PARAMETER [PR104349] gcc/fortran/ChangeLog: PR fortran/104349 * expr.cc (check_restricted): Adjust check for valid variables in restricted expressions: make no exception for module variables. gcc/testsuite/ChangeLog: PR fortran/104349 * gfortran.dg/der_charlen_1.f90: Adjust dg-patterns. * gfortran.dg/pr104349.f90: New test. --- gcc/fortran/expr.cc | 2 -- gcc/testsuite/gfortran.dg/der_charlen_1.f90 | 2 ++ gcc/testsuite/gfortran.dg/pr104349.f90 | 8 3 files changed, 10 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/pr104349.f90 diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc index 7fb33f81788..02028f993fd 100644 --- a/gcc/fortran/expr.cc +++ b/gcc/fortran/expr.cc @@ -3504,8 +3504,6 @@ check_restricted (gfc_expr *e) || sym->attr.implied_index || sym->attr.flavor == FL_PARAMETER || is_parent_of_current_ns (sym->ns) - || (sym->ns->proc_name != NULL - && sym->ns->proc_name->attr.flavor == FL_MODULE) || (gfc_is_formal_arg () && (sym->ns == gfc_current_ns))) { t = true; diff --git a/gcc/testsuite/gfortran.dg/der_charlen_1.f90 b/gcc/testsuite/gfortran.dg/der_charlen_1.f90 index 9f394c73f25..1246522d516 100644 --- a/gcc/testsuite/gfortran.dg/der_charlen_1.f90 +++ b/gcc/testsuite/gfortran.dg/der_charlen_1.f90 @@ -22,3 +22,5 @@ CONTAINS type(T), intent(in) :: X end subroutine end module another_core + +! { dg-prune-output "cannot appear in the expression" } diff --git a/gcc/testsuite/gfortran.dg/pr104349.f90 b/gcc/testsuite/gfortran.dg/pr104349.f90 new file mode 100644 index 000..2bea4a37214 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr104349.f90 @@ -0,0 +1,8 @@ +! { dg-do compile } +! PR fortran/104349 - reject module variable as character length in PARAMETER +! Contributed by G.Steinmetz + +module m + character(n), parameter :: a(1) = 'b' ! { dg-error "cannot appear" } + character(n), parameter :: c= 'b' ! { dg-error "cannot appear" } +end -- 2.35.3
Re: [PATCH 3/3] Fortran: Fix mpz and mpfr memory leaks
Hi Bernhard, there is neither context nor a related PR with a testcase showing that this patch fixes issues seen there. On 4/2/23 17:05, Bernhard Reutner-Fischer via Gcc-patches wrote: From: Bernhard Reutner-Fischer Cc: fort...@gcc.gnu.org gcc/fortran/ChangeLog: * array.cc (gfc_ref_dimen_size): Free mpz memory before ICEing. * expr.cc (find_array_section): Fix mpz memory leak. * simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in error paths. (gfc_simplify_set_exponent): Fix mpfr memory leak. --- gcc/fortran/array.cc| 3 +++ gcc/fortran/expr.cc | 8 gcc/fortran/simplify.cc | 7 ++- 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc index be5eb8b6a0f..8b1e816a859 100644 --- a/gcc/fortran/array.cc +++ b/gcc/fortran/array.cc @@ -2541,6 +2541,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, mpz_t *result, mpz_t *end) return t; default: + mpz_clear (lower); + mpz_clear (stride); + mpz_clear (upper); gfc_internal_error ("gfc_ref_dimen_size(): Bad dimen_type"); } What is the point of clearing variables before issuing a gfc_internal_error? diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc index 7fb33f81788..b4736804eda 100644 --- a/gcc/fortran/expr.cc +++ b/gcc/fortran/expr.cc @@ -1539,6 +1539,7 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) mpz_init_set_ui (delta_mpz, one); mpz_init_set_ui (nelts, one); mpz_init (tmp_mpz); + mpz_init (ptr); /* Do the initialization now, so that we can cleanup without keeping track of where we were. */ @@ -1682,7 +1683,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) mpz_mul (delta_mpz, delta_mpz, tmp_mpz); } - mpz_init (ptr); cons = gfc_constructor_first (base); /* Now clock through the array reference, calculating the index in @@ -1735,7 +1735,8 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) "at %L requires an increase of the allowed %d " "upper limit. See %<-fmax-array-constructor%> " "option", &expr->where, flag_max_array_constructor); - return false; + t = false; + goto cleanup; } cons = gfc_constructor_lookup (base, limit); @@ -1750,8 +1751,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) gfc_copy_expr (cons->expr), NULL); } - mpz_clear (ptr); - cleanup: mpz_clear (delta_mpz); @@ -1765,6 +1764,7 @@ cleanup: mpz_clear (ctr[d]); mpz_clear (stride[d]); } + mpz_clear (ptr); gfc_constructor_free (base); return t; } diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc index ecf0e3558df..d1f06335e79 100644 --- a/gcc/fortran/simplify.cc +++ b/gcc/fortran/simplify.cc @@ -6866,6 +6866,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, gfc_error ("The SHAPE array for the RESHAPE intrinsic at %L has a " "negative value %d for dimension %d", &shape_exp->where, shape[rank], rank+1); + mpz_clear (index); return &gfc_bad_expr; } @@ -6889,6 +6890,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, { gfc_error ("Shapes of ORDER at %L and SHAPE at %L are different", &order_exp->where, &shape_exp->where); + mpz_clear (index); return &gfc_bad_expr; } @@ -6902,6 +6904,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, { gfc_error ("Sizes of ORDER at %L and SHAPE at %L are different", &order_exp->where, &shape_exp->where); + mpz_clear (index); return &gfc_bad_expr; } @@ -6918,6 +6921,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, "in the range [1, ..., %d] for the RESHAPE intrinsic " "near %L", order[i], &order_exp->where, rank, &shape_exp->where); + mpz_clear (index); return &gfc_bad_expr; } @@ -6926,6 +6930,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp, { gfc_error ("ORDER at %L is not a permutation of the size of " "SHAPE at %L", &order_exp->where, &shape_exp->where); + mpz_clear (index); return &gfc_bad_expr; } x[order[i]] = 1; @@ -7408,7 +7413,7 @@ gfc_simplify_set_exponent (gfc_expr *x, gfc_expr *i) exp2 = (unsigned long) mpz_get_d (i->value.integer); mpfr_mul_2exp (result->value.real, frac, exp2, GFC_RND_MODE); - mpfr_clears (absv, log2, pow2, frac, NULL); + mpfr_clears (exp, absv, log2, pow2, frac, NULL); return range_check (result, "SET_EXPONENT"); }
[PATCH] range-op-float: Fix reverse ops of comparisons [PR109386]
Hi! I've missed one of my recent range-op-float.cc changes (likely the r13-6967 one) caused FAIL: libphobos.phobos/std/math/algebraic.d execution test FAIL: libphobos.phobos_shared/std/math/algebraic.d execution test regressions, distilled into a C testcase below. In the testcase, we have !(u >= v) condition where both u and v are results of fabs*, which guards t1 = u u<= __FLT_MAX__; and t2 = v u<= __FLT_MAX__; comparisons. From false u >= v where u and v have [0.0, +Inf] NAN ranges we (incorrectly deduce that one of them is [nextafterf (0.0, 1.0), +Inf] NAN and the other is [0.0, nextafterf (+Inf, 0.0)] NAN and from that deduce that one of the comparisons is always true, because UNLE_EXPR with the maximum representable number are false only if the value is +Inf and our ranges tell that is not possible. The bug is that the u >= v comparison determines a sensible range only when it is true - we then know neither operand can be NAN and it behaves correctly. But when the comparison is false, our current code gives sensible answers only if the other op can't be NAN. If it can be NAN, whenever it is NAN, the comparison is always false regardless of the other value, so the other value needs to be VARYING. Now, foperator_unordered_lt::op1_range etc. had code to deal with that for op?.known_nan (), but as the testcase shows, it is enough if it may be a NAN at runtime to make it VARYING. So, the following patch replaces for all those BRS_FALSE cases of the normal non-equality comparisons if (opOTHER.known_isnan ()) r.set_varying (type); to do it also if maybe_isnan (). For the unordered or ... comparisons, it is similar for BRS_TRUE. Those comparisons are true whenever either operand is NAN, or if neither is NAN, the corresponding normal comparison. So, if those comparisons are true and other operand might be a NAN, we can't tell (VARYING), if it is false, currently handling is correct. Bootstrapped/regtested on x86_64-linux and i686-linux, fixes those 2 D testcases and the newly added one. Ok for trunk? 2023-04-03 Jakub Jelinek PR tree-optimization/109386 * range-op-float.cc (foperator_lt::op1_range, foperator_lt::op2_range, foperator_le::op1_range, foperator_le::op2_range, foperator_gt::op1_range, foperator_gt::op2_range, foperator_ge::op1_range, foperator_ge::op2_range): Make r varying for BRS_FALSE case even if the other op is maybe_isnan, not just known_isnan. (foperator_unordered_lt::op1_range, foperator_unordered_lt::op2_range, foperator_unordered_le::op1_range, foperator_unordered_le::op2_range, foperator_unordered_gt::op1_range, foperator_unordered_gt::op2_range, foperator_unordered_ge::op1_range, foperator_unordered_ge::op2_range): Make r varying for BRS_TRUE case even if the other op is maybe_isnan, not just known_isnan. * gcc.c-torture/execute/ieee/pr109386.c: New test. --- gcc/range-op-float.cc.jj2023-04-03 10:42:54.0 +0200 +++ gcc/range-op-float.cc 2023-04-03 13:31:01.163216123 +0200 @@ -889,7 +889,7 @@ foperator_lt::op1_range (frange &r, case BRS_FALSE: // On the FALSE side of x < NAN, we know nothing about x. - if (op2.known_isnan ()) + if (op2.known_isnan () || op2.maybe_isnan ()) r.set_varying (type); else build_ge (r, type, op2); @@ -926,7 +926,7 @@ foperator_lt::op2_range (frange &r, case BRS_FALSE: // On the FALSE side of NAN < x, we know nothing about x. - if (op1.known_isnan ()) + if (op1.known_isnan () || op1.maybe_isnan ()) r.set_varying (type); else build_le (r, type, op1); @@ -1005,7 +1005,7 @@ foperator_le::op1_range (frange &r, case BRS_FALSE: // On the FALSE side of x <= NAN, we know nothing about x. - if (op2.known_isnan ()) + if (op2.known_isnan () || op2.maybe_isnan ()) r.set_varying (type); else build_gt (r, type, op2); @@ -1038,7 +1038,7 @@ foperator_le::op2_range (frange &r, case BRS_FALSE: // On the FALSE side of NAN <= x, we know nothing about x. - if (op1.known_isnan ()) + if (op1.known_isnan () || op1.maybe_isnan ()) r.set_varying (type); else if (op1.undefined_p ()) return false; @@ -1122,7 +1122,7 @@ foperator_gt::op1_range (frange &r, case BRS_FALSE: // On the FALSE side of x > NAN, we know nothing about x. - if (op2.known_isnan ()) + if (op2.known_isnan () || op2.maybe_isnan ()) r.set_varying (type); else if (op2.undefined_p ()) return false; @@ -1161,7 +1161,7 @@ foperator_gt::op2_range (frange &r, case BRS_FALSE: // On The FALSE side of NAN > x, we know nothing about x. - if (op1.known_isnan ()) + if (op1.known_isnan () || op1.maybe_isnan ()) r.set_varying (type); else if (op1.undefined_p ()) return false; @@ -1241,7 +1241,7 @@ foperator_ge::op1
Re: [PATCH] c++: ICE on loopy var tmpl auto deduction [PR109300]
On 4/3/23 12:28, Patrick Palka wrote: On Wed, 29 Mar 2023, Jason Merrill wrote: On 3/28/23 13:37, Patrick Palka wrote: Now that we resolve non-dependent variable template-ids ahead of time, cp_finish_decl needs to handle a new invalid situation: we can end up trying to instantiate a variable template with deduced return type before we fully parsed (and attached) its initializer. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this OK for trunK? PR c++/109300 gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Diagnose ordinary auto deduction with no initializer instead of asserting. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/var-templ79.C: New test. --- gcc/cp/decl.cc | 15 ++- gcc/testsuite/g++.dg/cpp1y/var-templ79.C | 5 + 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ79.C diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index 20b980f68c8..2c91693b99d 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -8276,7 +8276,20 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, return; } - gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (auto_node)); + if (CLASS_PLACEHOLDER_TEMPLATE (auto_node)) + /* Class deduction with no initializer is OK. */; + else + { + /* Ordinary auto deduction without an initializer, a situation +which grokdeclarator already catches and rejects for the most +part. But we can still get here if we're instantiating a +variable template before we've fully parsed (and attached) its +initializer, e.g. template auto x = x; */ In the case of recursively dependent instantiation I'd hope to have an error_mark_node initializer, rather than none? Do you mean setting the initializer to error_mark_node after the fact, e.g. @@ -8288,7 +8297,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p, error_at (DECL_SOURCE_LOCATION (decl), "declaration of %q#D has no initializer", decl); TREE_TYPE (decl) = error_mark_node; - return; + init = error_mark_node; } } d_init = init; or before the fact, i.e. setting DECL_INITIAL to error_mark_node as a sentinel value for detecting recursion before we begin parsing a variable initializer? The former should work I suppose, but the latter is problematic because we also call cp_finish_decl with init=error_mark_node when the initializer is generally invalid, so by overloading the meaning of error_mark_node here and checking for it from cp_finish_decl we would end up emitting a bogus extra diagnostic in a bunch of cases e.g. g++.dg/pr53055.C: int i = p ->* p ; // invalid initializer I guess we would need to use a different sentinel value for detecting recursion, or expose and inspect the 'lambda_scope' stack which already keeps track of whether we're in the middle of a variable initializer... Dunno if it's worth it just for sake of a better diagnostic for this corner case, I notice e.g. Clang doesn't give a great diagnostic either: src/gcc/testsuite/g++.dg/cpp1y/var-templ79.C:5:6: error: declaration of variable 'x' with deduced type 'auto' requires an initializer auto x = x; // { dg-error "" } ^ Yeah, let's just go with your patch, thanks. + error_at (DECL_SOURCE_LOCATION (decl), + "declaration of %q#D has no initializer", decl); + TREE_TYPE (decl) = error_mark_node; + return; + } } d_init = init; if (d_init) diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ79.C b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C new file mode 100644 index 000..3c0d276153a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ79.C @@ -0,0 +1,5 @@ +// PR c++/109300 +// { dg-do compile { target c++14 } } + +template +auto x = x; // { dg-error "" }
Re: [PATCH] c++: satisfaction and ARGUMENT_PACK_SELECT [PR105644]
On 4/3/23 10:49, Patrick Palka wrote: This testcase demonstrates we can legitimately enter satisfaction with an ARGUMENT_PACK_SELECT argument, which is problematic because we can't store such arguments in the satisfaction cache (or any other hash table). Since this appears to be possible only during constrained auto deduction for a return-type-requirement, the most appropriate spot to fix this seems to be from do_auto_deduction, by calling preserve_args to strip A_P_S args before entering satisfaction. +++ b/gcc/cp/pt.cc @@ -30965,6 +30965,12 @@ do_auto_deduction (tree type, tree init, tree auto_node, return type; } + /* We can see an ARGUMENT_PACK_SELECT argument when evaluating +a return-type-requirement. Get rid of them before entering +satisfaction, since the satisfaction cache can't handle them. */ + if (context == adc_requirement) + outer_targs = preserve_args (outer_targs); I'd like to get do_auto_deduction out of the business of handling return-type-requirements, since there is no longer any actual deduction involved (as there was in the TS). So I'd prefer not to add any more tweaks there. Maybe this should happen higher up, in tsubst_requires_expr? Maybe just before the call to add_extra_args? Jason
Re: [PATCH 3/3] Fortran: Fix mpz and mpfr memory leaks
On 3 April 2023 21:50:49 CEST, Harald Anlauf wrote: >Hi Bernhard, > >there is neither context nor a related PR with a testcase showing >that this patch fixes issues seen there. Yes, i forgot to mention the PR: PR fortran/68800 I did not construct individual test cases but it should be obvious that we should not leak these. > >On 4/2/23 17:05, Bernhard Reutner-Fischer via Gcc-patches wrote: >> From: Bernhard Reutner-Fischer >> >> Cc: fort...@gcc.gnu.org >> >> gcc/fortran/ChangeLog: >> >> * array.cc (gfc_ref_dimen_size): Free mpz memory before ICEing. >> * expr.cc (find_array_section): Fix mpz memory leak. >> * simplify.cc (gfc_simplify_reshape): Fix mpz memory leaks in >> error paths. >> (gfc_simplify_set_exponent): Fix mpfr memory leak. >> --- >> gcc/fortran/array.cc| 3 +++ >> gcc/fortran/expr.cc | 8 >> gcc/fortran/simplify.cc | 7 ++- >> 3 files changed, 13 insertions(+), 5 deletions(-) >> >> diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc >> index be5eb8b6a0f..8b1e816a859 100644 >> --- a/gcc/fortran/array.cc >> +++ b/gcc/fortran/array.cc >> @@ -2541,6 +2541,9 @@ gfc_ref_dimen_size (gfc_array_ref *ar, int dimen, >> mpz_t *result, mpz_t *end) >> return t; >> >> default: >> + mpz_clear (lower); >> + mpz_clear (stride); >> + mpz_clear (upper); >> gfc_internal_error ("gfc_ref_dimen_size(): Bad dimen_type"); >> } > >What is the point of clearing variables before issuing a gfc_internal_error? To make it obvious that we are aware that we allocated these. thanks, > >> diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc >> index 7fb33f81788..b4736804eda 100644 >> --- a/gcc/fortran/expr.cc >> +++ b/gcc/fortran/expr.cc >> @@ -1539,6 +1539,7 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) >> mpz_init_set_ui (delta_mpz, one); >> mpz_init_set_ui (nelts, one); >> mpz_init (tmp_mpz); >> + mpz_init (ptr); >> >> /* Do the initialization now, so that we can cleanup without >>keeping track of where we were. */ >> @@ -1682,7 +1683,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) >> mpz_mul (delta_mpz, delta_mpz, tmp_mpz); >> } >> >> - mpz_init (ptr); >> cons = gfc_constructor_first (base); >> >> /* Now clock through the array reference, calculating the index in >> @@ -1735,7 +1735,8 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) >> "at %L requires an increase of the allowed %d " >> "upper limit. See %<-fmax-array-constructor%> " >> "option", &expr->where, flag_max_array_constructor); >> - return false; >> + t = false; >> + goto cleanup; >> } >> >> cons = gfc_constructor_lookup (base, limit); >> @@ -1750,8 +1751,6 @@ find_array_section (gfc_expr *expr, gfc_ref *ref) >> gfc_copy_expr (cons->expr), NULL); >> } >> >> - mpz_clear (ptr); >> - >> cleanup: >> >> mpz_clear (delta_mpz); >> @@ -1765,6 +1764,7 @@ cleanup: >> mpz_clear (ctr[d]); >> mpz_clear (stride[d]); >> } >> + mpz_clear (ptr); >> gfc_constructor_free (base); >> return t; >> } >> diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc >> index ecf0e3558df..d1f06335e79 100644 >> --- a/gcc/fortran/simplify.cc >> +++ b/gcc/fortran/simplify.cc >> @@ -6866,6 +6866,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr >> *shape_exp, >>gfc_error ("The SHAPE array for the RESHAPE intrinsic at %L has a " >> "negative value %d for dimension %d", >> &shape_exp->where, shape[rank], rank+1); >> + mpz_clear (index); >>return &gfc_bad_expr; >> } >> >> @@ -6889,6 +6890,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr >> *shape_exp, >> { >>gfc_error ("Shapes of ORDER at %L and SHAPE at %L are different", >> &order_exp->where, &shape_exp->where); >> + mpz_clear (index); >>return &gfc_bad_expr; >> } >> >> @@ -6902,6 +6904,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr >> *shape_exp, >> { >>gfc_error ("Sizes of ORDER at %L and SHAPE at %L are different", >> &order_exp->where, &shape_exp->where); >> + mpz_clear (index); >>return &gfc_bad_expr; >> } >> >> @@ -6918,6 +6921,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr >> *shape_exp, >> "in the range [1, ..., %d] for the RESHAPE intrinsic " >> "near %L", order[i], &order_exp->where, rank, >> &shape_exp->where); >> + mpz_clear (index); >>return &gfc_bad_expr; >> } >> >> @@ -6926,6 +6930,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr >> *shape_exp, >> { >>gfc_error ("ORDER at %L is not a permutation of the size of " >> "SHAPE at %L", &order_exp->where, &shape_
Re: [PATCH] tree-optimization/109304 - properly handle instrumented aliases
> On Tue, 28 Mar 2023, Richard Biener wrote: > > > When adjusting calls to reflect instrumentation we failed to handle > > calls to aliases since they appear to have no body. Instead resort > > to symtab node availability. The patch also avoids touching > > internal function calls in a more obvious way (builtins might > > have a body available). > > > > profiledbootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > Honza - does this look OK? > > PR tree-optimization/109304 > > * tree-profile.cc (tree_profiling): Use symtab node > > availability to decide whether to skip adjusting calls. > > Do not adjust calls to internal functions. > > @@ -842,12 +842,15 @@ tree_profiling (void) > > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > { > > gcall *call = dyn_cast (gsi_stmt (gsi)); > > - if (!call) > > + if (!call || gimple_call_internal_p (call)) > > continue; > > > > /* We do not clear pure/const on decls without body. */ > > tree fndecl = gimple_call_fndecl (call); > > - if (fndecl && !gimple_has_body_p (fndecl)) > > + cgraph_node *callee; > > + if (fndecl > > + && (callee = cgraph_node::get (fndecl)) > > + && callee->get_availability (node) == AVAIL_NOT_AVAILABLE) As discussed earlier, the testcase I posted can be adjusted to put the const declared wrapper into another translation unit, so I think we will need to drop the visibility check completely. But as discussed, it is wrong code issue, but not a regression, so we may go with the availability check as you suggest. So the patch is OK. I wonder if we do not want to drop it everywhere (as we plan for next stage1 anyway). I think similar ICE as in the PR can be produced with LTO. In normal situation declaration merging will do the right thing: If you have unit A calling const foo externally, it won't get processed by the code above. However unit B declaring foo will get it downgraded to non-const. Now at WPA time we will read both A and B and in declaration merging B's definition will prevail. This won't happen if lto_symtab_merge_p returns false which can probably be triggered by adding warning/error attribute to B's declaration but not to A's. It is however really side case and I am worried about dropping pure/const from builtin declarations... Honza
Re: [PATCH v2] RISC-V: Add Z*inx imcompatible check in gcc.
On Tue, 28 Mar 2023, Jiawei wrote: > + // Zfinx is conflict with float extensions. > + if (TARGET_ZFINX && TARGET_HARD_FLOAT) > +error ("z*inx is conflict with float extensions"); > + While I'm not a native English speaker, "is conflict with" doesn't sound grammatically correct. Perhaps "conflicts with" or "is in conflict with"? brgds, H-P
Re: [GCC14 PATCH] LoongArch: Optimize additions with immediates
/* snip */ diff --git a/gcc/testsuite/gcc.target/loongarch/add-const.c b/gcc/testsuite/gcc.target/loongarch/add-const.c new file mode 100644 index 000..3a9f72fe83d --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/add-const.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O -mabi=lp64d" } */ + +/* None of these functions should load the const operand into a temp + register. */ + +/* { dg-final { scan-assembler-not "add\\.[dw]" } } */ + +unsigned long f01 (unsigned long x) { return x + 1; } +unsigned long f02 (unsigned long x) { return x - 1; } +unsigned long f03 (unsigned long x) { return x + 2047; } +unsigned long f04 (unsigned long x) { return x + 4094; } +unsigned long f05 (unsigned long x) { return x - 2048; } +unsigned long f06 (unsigned long x) { return x - 4096; } +unsigned long f07 (unsigned long x) { return x + 0x7fff; } +unsigned long f08 (unsigned long x) { return x - 0x8000l; } +unsigned long f09 (unsigned long x) { return x + 0x7fffl * 2; } +unsigned long f10 (unsigned long x) { return x - 0x8000l * 2; } +unsigned long f11 (unsigned long x) { return x - 0x8000l * 2; } These two test cases are duplicates. +unsigned long f12 (unsigned long x) { return x + 0x7fff + 0x1; } +unsigned long f13 (unsigned long x) { return x + 0x7fff - 0x1; } +unsigned long f14 (unsigned long x) { return x + 0x7fff + 0x7ff; } +unsigned long f15 (unsigned long x) { return x + 0x7fff - 0x800; } +unsigned long f16 (unsigned long x) { return x - 0x8000l - 1; } +unsigned long f17 (unsigned long x) { return x - 0x8000l + 1; } +unsigned long f18 (unsigned long x) { return x - 0x8000l - 0x800; } +unsigned long f19 (unsigned long x) { return x - 0x8000l + 0x7ff; } + +unsigned int g01 (unsigned int x) { return x + 1; } +unsigned int g02 (unsigned int x) { return x - 1; } +unsigned int g03 (unsigned int x) { return x + 2047; } +unsigned int g04 (unsigned int x) { return x + 4094; } +unsigned int g05 (unsigned int x) { return x - 2048; } +unsigned int g06 (unsigned int x) { return x - 4096; } +unsigned int g07 (unsigned int x) { return x + 0x7fff; } +unsigned int g08 (unsigned int x) { return x - 0x8000l; } +unsigned int g09 (unsigned int x) { return x + 0x7fffl * 2; } +unsigned int g10 (unsigned int x) { return x - 0x8000l * 2; } +unsigned int g11 (unsigned int x) { return x - 0x8000l * 2; } Ditto. I found that adding this log test case gcc.target/loongarch/stack-check-cfa-1.c and gcc.target/loongarch/stack-check-cfa-2.c test failed. Although the test fails, the generated assembly code is better, and there is no problem with the logic of the assembly code. I haven't checked the reason for this yet. Otherwise LGTM, thanks!
[pushed] c++: friend template matching [PR107484]
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- Here friend matching tries to find a matching non-template friend and fails, so we mark the friend as a template specialization to be determined later. Then cplus_decl_attributes tries again to find a matching function and gets confused by DECL_TEMPLATE_INSTANTIATION without DECL_TEMPLATE_INFO. But it doesn't make sense for find_last_decl to be trying to match anything with DECL_USE_TEMPLATE set; those are matched elsewhere. PR c++/107484 gcc/cp/ChangeLog: * decl2.cc (find_last_decl): Return early if DECL_USE_TEMPLATE. gcc/testsuite/ChangeLog: * g++.dg/lookup/friend25.C: New test. --- gcc/cp/decl2.cc| 5 + gcc/testsuite/g++.dg/lookup/friend25.C | 9 + 2 files changed, 14 insertions(+) create mode 100644 gcc/testsuite/g++.dg/lookup/friend25.C diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc index 2b195e7..9594be4092c 100644 --- a/gcc/cp/decl2.cc +++ b/gcc/cp/decl2.cc @@ -1613,6 +1613,11 @@ find_last_decl (tree decl) if (tree name = DECL_P (decl) ? DECL_NAME (decl) : NULL_TREE) { + /* Template specializations are matched elsewhere. */ + if (DECL_LANG_SPECIFIC (decl) + && DECL_USE_TEMPLATE (decl)) + return NULL_TREE; + /* Look up the declaration in its scope. */ tree pushed_scope = NULL_TREE; if (tree ctype = DECL_CONTEXT (decl)) diff --git a/gcc/testsuite/g++.dg/lookup/friend25.C b/gcc/testsuite/g++.dg/lookup/friend25.C new file mode 100644 index 000..74cf5dc3431 --- /dev/null +++ b/gcc/testsuite/g++.dg/lookup/friend25.C @@ -0,0 +1,9 @@ +// PR c++/107484 + +namespace qualified_friend_no_match { + void f(int); + template void f(T*); + struct X { +friend void qualified_friend_no_match::f(double); // { dg-error "does not match any template" } + }; +} base-commit: 59b4a555c3f1c3dba376da1c4886a9ea18ad208d -- 2.31.1
[PATCH] Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.
There's a potential performance issue when backend returns some unreasonable value for the mode which can be never be allocate with reg class. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk(or GCC14 stage1)? gcc/ChangeLog: PR rtl-optimization/109351 * ira.cc (setup_class_subset_and_memory_move_costs): Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes. --- gcc/ira.cc | 4 1 file changed, 4 insertions(+) diff --git a/gcc/ira.cc b/gcc/ira.cc index 6c7f4901e4c..02dea5d49ee 100644 --- a/gcc/ira.cc +++ b/gcc/ira.cc @@ -588,6 +588,10 @@ setup_class_subset_and_memory_move_costs (void) /* Costs for NO_REGS are used in cost calculation on the 1st pass when the preferred register classes are not known yet. In this case we take the best scenario. */ + if (!targetm.hard_regno_mode_ok (ira_class_hard_regs[cl][0], +(machine_mode) mode)) + continue; + if (ira_memory_move_cost[mode][NO_REGS][0] > ira_memory_move_cost[mode][cl][0]) ira_max_memory_move_cost[mode][NO_REGS][0] -- 2.39.1.388.g2fc9e9ca3c
Re: [PATCH] rs6000: Fix vector_set_var_p9 by considering BE [PR108807]
Hi Segher, Thanks for the review! on 2023/4/3 19:44, Segher Boessenkool wrote: > Hi! > > On Fri, Feb 17, 2023 at 05:55:04PM +0800, Kewen.Lin wrote: >> As PR108807 exposes, the current handling in function >> rs6000_expand_vector_set_var_p9 doesn't take care of big >> endianness. Currently the function is to rotate the >> target vector by moving element to-be-set to element 0, >> set element 0 with the given val, then rotate back. To >> get the permutation control vector for the rotation, it >> makes use of lvsr and lvsl, but the element ordering is >> different for BE and LE (like element 0 is the most >> significant one on BE while the least significant one on >> LE), this patch is to add consideration for BE and make >> sure permutation control vectors for rotations are expected. > >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -7235,22 +7235,26 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx >> val, rtx idx) >> >>machine_mode shift_mode; >>rtx (*gen_ashl)(rtx, rtx, rtx); >> - rtx (*gen_lvsl)(rtx, rtx); >> - rtx (*gen_lvsr)(rtx, rtx); >> + rtx (*gen_pcvr1)(rtx, rtx); >> + rtx (*gen_pcvr2)(rtx, rtx); > > Space before "(" btw, you can fix that at the same time? :-) > Good catch, fixed. > What does "pcvr" mean? You could put that in a short comment? > >> + /* Generate one permutation control vector used for rotating the element > > Ah. Yeah just "/* Permutation control vector */" for the above one > prevents all mysteries :-) One comment line added for gen_* function pointers. > > Patch looks good. Thanks! > Pushed as r13-6994-gd634e6088f139e, thanks! BR, Kewen
[PATCH] testsuite: Adjust powerpc test case pr83677.c for BE [PR108815]
Hi, The test case gcc.target/powerpc/pr83677.c was written for LE environment, this patch is to make it work on BE as well. Tested on BE and LE well, I'm going to push this soon if no objections. BR, Kewen - PR testsuite/108815 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr83677.c (v_expand_u8, v_expand_u16, v_load_deinterleave_f32, v_store_interleave_f32): Adjust some code by considering BE. --- gcc/testsuite/gcc.target/powerpc/pr83677.c | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/pr83677.c b/gcc/testsuite/gcc.target/powerpc/pr83677.c index c1a09687174..8b1caff3f98 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr83677.c +++ b/gcc/testsuite/gcc.target/powerpc/pr83677.c @@ -9,14 +9,24 @@ void v_expand_u8(vector unsigned char* a, vector unsigned short* b0, vector unsigned short* b1) { +#if __LITTLE_ENDIAN__ *b0 = (vector unsigned short)vec_mergeh(*a, vec_splats((unsigned char)0)); *b1 = (vector unsigned short)vec_mergel(*a, vec_splats((unsigned char)0)); +#else + *b0 = (vector unsigned short)vec_mergeh(vec_splats((unsigned char)0), *a); + *b1 = (vector unsigned short)vec_mergel(vec_splats((unsigned char)0), *a); +#endif } void v_expand_u16(vector unsigned short* a, vector unsigned int* b0, vector unsigned int* b1) { +#if __LITTLE_ENDIAN__ *b0 = (vector unsigned int)vec_mergeh(*a, vec_splats((unsigned short)0)); *b1 = (vector unsigned int)vec_mergel(*a, vec_splats((unsigned short)0)); +#else +*b0 = (vector unsigned int)vec_mergeh(vec_splats((unsigned short)0), *a); +*b1 = (vector unsigned int)vec_mergel(vec_splats((unsigned short)0), *a); +#endif } void v_load_deinterleave_u8(unsigned char *ptr, vector unsigned char* a, vector unsigned char* b, vector unsigned char* c) @@ -44,13 +54,23 @@ void v_load_deinterleave_f32(float *ptr, vector float* a, vector float* b, vecto vector float v2 = vec_xl(16, ptr); vector float v3 = vec_xl(32, ptr); +#if __LITTLE_ENDIAN__ +vector float t1 = vec_sld(v3, v2, 8); +vector float t2 = vec_sld(v1, v3, 8); +vector float t3 = vec_sld(v2, v1, 8); +#else +vector float t1 = vec_sld(v2, v3, 8); +vector float t2 = vec_sld(v3, v1, 8); +vector float t3 = vec_sld(v1, v2, 8); +#endif + static const vector unsigned char flp = {0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19, 28, 29, 30, 31}; -*a = vec_perm(v1, vec_sld(v3, v2, 8), flp); +*a = vec_perm(v1, t1, flp); static const vector unsigned char flp2 = {28, 29, 30, 31, 0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19}; -*b = vec_perm(v2, vec_sld(v1, v3, 8), flp2); +*b = vec_perm(v2, t2, flp2); -*c = vec_perm(vec_sld(v2, v1, 8), v3, flp); +*c = vec_perm(t3, v3, flp); } void v_store_interleave_f32(float *ptr, vector float a, vector float b, vector float c) @@ -61,7 +81,11 @@ void v_store_interleave_f32(float *ptr, vector float a, vector float b, vector f vec_xst(vec_perm(a, hbc, ahbc), 0, ptr); vector float lab = vec_mergel(a, b); +#if __LITTLE_ENDIAN__ vec_xst(vec_sld(lab, hbc, 8), 16, ptr); +#else +vec_xst(vec_sld(hbc, lab, 8), 16, ptr); +#endif static const vector unsigned char clab = {8, 9, 10, 11, 24, 25, 26, 27, 28, 29, 30, 31, 12, 13, 14, 15}; vec_xst(vec_perm(c, lab, clab), 32, ptr); -- 2.39.1
Re: [PATCH] Fortran: reject module variable as character length in PARAMETER [PR104349]
Hi Harald, OK for mainline. It is sufficiently small that, if there is any fallout in the next weeks, it can easily be reverted without great impact. Thanks for the patch. Paul On Mon, 3 Apr 2023 at 20:46, Harald Anlauf via Fortran wrote: > Dear all, > > the attached patch fixes an ICE-on-invalid for a PARAMETER expression > where the character length was a MODULE variable. The ICE seemed > strange, as we were catching related erroneous code for declarations in > programs or subroutines. Removing a seemingly bogus check of restricted > expressions is the simplest way to fix this. (We could also catch this > differently in decl.cc). > > Besides, this also fixes an accepts-invalid, see testcase. :-) > > Regtested on x86_64-pc-linux-gnu. OK for mainline (13) or rather wait? > > Thanks, > Harald > > -- "If you can't explain it simply, you don't understand it well enough" - Albert Einstein