Re: [PATCH] RISC-V: Support non-SLP unordered reduction
> @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *); > void emit_vlmax_masked_mu_insn (unsigned, int, rtx *); > void emit_scalar_move_insn (unsigned, rtx *); > void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx); > +//void emit_vlmax_reduction_insn (unsigned, rtx *); Plz drop this. > diff --git a/gcc/config/riscv/riscv-vsetvl.cc > b/gcc/config/riscv/riscv-vsetvl.cc > index 586dc8e5379..97a9dad8a77 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const > vl_vtype_info &info, rtx vl) > } > > static rtx > -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info) > +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info, > + rtx vl = NULL_RTX) > { >rtx new_pat; >vl_vtype_info new_info = info; > @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info > &info) >if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ())) > { >rtx dest = get_vl (rinsn); rtx dest = vl ? vl : get_vl (rinsn); > - new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest); > + new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest); and keep dest here. > } >else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only) > new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX); Should we handle vl is non-null case in else-if and else case? Add `assert (vl == NULL_RTX)` if not handle. > @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat) >print_rtl_single (dump_file, PATTERN (rinsn)); > } > > - validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > + bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > + gcc_assert (change_p); I think we could create a wrapper for validate_change to make sure that return true, and also use that wrapper for all other call sites? e.g. validate_change_or_fail?
Re: [PATCH] core: Support heap-based trampolines
> On 17 Jul 2023, at 07:58, Iain Sandoe wrote > >> On 17 Jul 2023, at 07:43, FX Coudert wrote: >> >> >> There is an alternate mechanism relying on system libraries that is possible >> on darwin specifically (I don’t know for other targets) but it will only >> work for signed binaries, and would require us to codesign everything >> produced by gcc. During development, it was deemed too big an ask and the >> current strategy was chosen (Iain can surely add more background on that if >> needed). > > I do not think that this solves the setjump/longjump issue - since there’s > still a notional allocation that takes place (it’s just that the mechanism > for determining permissions is different). > > It is also a big barrier for the general user - and prevents normal folks > from distributing GCC - since codesigning requires an external certificate > (i.e. I would really rather avoid it). > >>> Was there ever an attempt to provide a "generic" trampoline driven by >>> a more complex descriptor? > > We did look at the “unused address bits” mechanism that Ada has used - but > that is not really available to a non-private ABI (unless the system vendor > agrees to change ABI to leave a bit spare) for the base arch either the bits > are not there (e.g. X86) or reserved (e.g. AArch64). > > Andrew Burgess did the original work he might have comments on alternatives > we tried Although I will comment that the main barrier to data / descriptor based schemes is that we allow recursive use of nested functions and that means that each nest level needs a distinct target address to branch to / call. [that might also make the bytecode scheme hard(er)] Iain
[COMMITTED] Normalize irange_bitmask before union/intersect.
The bit twiddling in union/intersect for the value/mask pair must be normalized to have the unknown bits with a value of 0 in order to make the math simpler. Normalizing at construction slowed VRP by 1.5% so I opted to normalize before updating the bitmask in range-ops, since it was the only user. However, with upcoming changes there will be multiple setters of the mask (IPA and CCP), so we need something more general. I played with various alternatives, and settled on normalizing before union/intersect which were the ones needing the bits cleared. With this patch, there's no noticeable difference in performance either in VRP or in overall compilation. gcc/ChangeLog: * value-range.cc (irange_bitmask::verify_mask): Mask need not be normalized. * value-range.h (irange_bitmask::union_): Normalize beforehand. (irange_bitmask::intersect): Same. --- gcc/value-range.cc | 3 --- gcc/value-range.h | 12 ++-- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 011bdbdeae6..2abf57bcee8 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -1953,9 +1953,6 @@ void irange_bitmask::verify_mask () const { gcc_assert (m_value.get_precision () == m_mask.get_precision ()); - // Unknown bits must have their corresponding value bits cleared as - // it simplifies union and intersect. - gcc_assert (wi::bit_and (m_mask, m_value) == 0); } void diff --git a/gcc/value-range.h b/gcc/value-range.h index 0170188201b..d8af6fca7d7 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -211,8 +211,12 @@ irange_bitmask::operator== (const irange_bitmask &src) const } inline bool -irange_bitmask::union_ (const irange_bitmask &src) +irange_bitmask::union_ (const irange_bitmask &orig_src) { + // Normalize mask. + irange_bitmask src (orig_src.m_value & ~orig_src.m_mask, orig_src.m_mask); + m_value &= ~m_mask; + irange_bitmask save (*this); m_mask = (m_mask | src.m_mask) | (m_value ^ src.m_value); m_value = m_value & src.m_value; @@ -222,8 +226,12 @@ irange_bitmask::union_ (const irange_bitmask &src) } inline bool -irange_bitmask::intersect (const irange_bitmask &src) +irange_bitmask::intersect (const irange_bitmask &orig_src) { + // Normalize mask. + irange_bitmask src (orig_src.m_value & ~orig_src.m_mask, orig_src.m_mask); + m_value &= ~m_mask; + irange_bitmask save (*this); // If we have two known bits that are incompatible, the resulting // bit is undefined. It is unclear whether we should set the entire -- 2.40.1
[COMMITTED] Add global setter for value/mask pair for SSA names.
This patch provides a way to set the value/mask pair of known bits globally, similarly to how we can use set_nonzero_bits for known 0 bits. This can then be used by CCP and IPA to set value/mask info instead of throwing away the known 1 bits. In further clean-ups, I will see if it makes sense to remove set_nonzero_bits altogether, since it is subsumed by value/mask. gcc/ChangeLog: * tree-ssanames.cc (set_bitmask): New. * tree-ssanames.h (set_bitmask): New. --- gcc/tree-ssanames.cc | 15 +++ gcc/tree-ssanames.h | 1 + 2 files changed, 16 insertions(+) diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc index 5fdb6a37e9f..f54394363a0 100644 --- a/gcc/tree-ssanames.cc +++ b/gcc/tree-ssanames.cc @@ -465,6 +465,21 @@ set_nonzero_bits (tree name, const wide_int &mask) set_range_info (name, r); } +/* Update the known bits of NAME. + + Zero bits in MASK cover constant values. Set bits in MASK cover + unknown values. VALUE are the known bits. */ + +void +set_bitmask (tree name, const wide_int &value, const wide_int &mask) +{ + gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name))); + + int_range<2> r (TREE_TYPE (name)); + r.update_bitmask (irange_bitmask (value, mask)); + set_range_info (name, r); +} + /* Return a widest_int with potentially non-zero bits in SSA_NAME NAME, the constant for INTEGER_CST, or -1 if unknown. */ diff --git a/gcc/tree-ssanames.h b/gcc/tree-ssanames.h index f3fa609208a..b5e3f228ee8 100644 --- a/gcc/tree-ssanames.h +++ b/gcc/tree-ssanames.h @@ -59,6 +59,7 @@ struct GTY(()) ptr_info_def /* Sets the value range to SSA. */ extern bool set_range_info (tree, const vrange &); extern void set_nonzero_bits (tree, const wide_int &); +extern void set_bitmask (tree, const wide_int &value, const wide_int &mask); extern wide_int get_nonzero_bits (const_tree); extern bool ssa_name_has_boolean_range (tree); extern void init_ssanames (struct function *, int); -- 2.40.1
RE: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.
> -Original Message- > From: Richard Biener > Sent: Monday, July 17, 2023 7:19 AM > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in > tree-if-conv.cc. > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle > wrote: > > > > > > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 as > > > > the host compiler. Ok for mainline? [I might be missing something] > > OK. Btw, while I didn't spot this during review I would appreciate > if the code could use vec.[q]sort, this should work with a lambda as well I > think. That was my first use, but that hits https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 Regards, Tamar
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Jul 17 2023, Christoph Müllner wrote: > The build process shows a lot of warnings. Then you are using a bad compiler. The build is 100% -Werror clean. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.
On Mon, Jul 17, 2023 at 12:21 AM Tamar Christina via Gcc-patches wrote: > > > -Original Message- > > From: Richard Biener > > Sent: Monday, July 17, 2023 7:19 AM > > To: Roger Sayle > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in > > tree-if-conv.cc. > > > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle > > wrote: > > > > > > > > > > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 as > > > > > > the host compiler. Ok for mainline? [I might be missing something] > > > > OK. Btw, while I didn't spot this during review I would appreciate > > if the code could use vec.[q]sort, this should work with a lambda as well I > > think. > > That was my first use, but that hits > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 That is not hitting PR 99469 but rather it means your comparison is not correct for an (unstable) sort. That is qsort comparator should have this relationship `f(a,b) == !f(b, a)` and `f(a,a)` should also return false. If you are running into this for qsort here, you will most likely run into issues with std::sort later on too. Thanks, Andrew > > Regards, > Tamar
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Sun, Jul 16, 2023 at 11:49 PM Christoph Müllner wrote: > > On Fri, Jul 14, 2023 at 12:28 PM Andreas Schwab wrote: > > > > Why didn't you test that? > > Thanks for reporting, and sorry for introducing this warning. > > I test all patches before sending them. > In the case of RISC-V backend patches, I build a 2-stage > cross-toolchain and run all regression tests for RV32 and RV64 (using > QEMU). > Testing is done with and without patches applied to identify regressions. > > The build process shows a lot of warnings. Therefore I did not > investigate finding a way to use -Werror. > This means that looking for compiler warnings is a manual step, and I > might miss one while scrolling through the logs. If you are building a cross compiler, and want to clean up warnings, first build a native compiler and then build the cross using that. Also maybe it is finding a way to do native bootstraps on riscv to do testing of patches rather than doing just cross builds when testing backend patches. Especially when I have found the GCC testsuite but the bootstrap is more likely to find backend issues and such. Thanks, Andrew > > Sorry for the inconvenience, > Christoph > > > > > > ../../gcc/config/riscv/riscv.cc: In function 'int > > riscv_regno_ok_for_index_p(int)': > > ../../gcc/config/riscv/riscv.cc:864:33: error: unused parameter 'regno' > > [-Werror=unused-parameter] > > 864 | riscv_regno_ok_for_index_p (int regno) > > | ^ > > cc1plus: all warnings being treated as errors > > make[3]: *** [Makefile:2499: riscv.o] Error 1 > > > > -- > > Andreas Schwab, sch...@linux-m68k.org > > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > > "And now for something completely different."
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Mon, Jul 17, 2023 at 9:24 AM Andreas Schwab wrote: > > On Jul 17 2023, Christoph Müllner wrote: > > > The build process shows a lot of warnings. > > Then you are using a bad compiler. The build is 100% -Werror clean. My host compiler is: gcc version 13.1.1 20230614 (Red Hat 13.1.1-4) (GCC) Some examples: > /home/cm/src/gcc/riscv-mainline/gcc/text-art/table.cc: In member function > ‘int text_art::table_geometry::table_x_to_canvas_x(int) const’: > /home/cm/src/gcc/riscv-mainline/gcc/text-art/table.cc:561:15: warning: > comparison of integer expressions of different signedness: ‘int’ and > ‘std::vector::size_type’ {aka ‘long unsigned int’} [-Wsign-compare] > 561 | if (table_x == m_col_start_x.size ()) > | ^~~~ > /home/cm/src/gcc/riscv-mainline/gcc/text-art/table.cc: In function ‘void > selftest::test_spans_3()’: > /home/cm/src/gcc/riscv-mainline/gcc/text-art/table.cc:947:62: warning: > unquoted keyword ‘char’ in format [-Wformat-diag] > 947 | "'buf' > (char[%i])", > | ^~~~ > /home/cm/src/gcc/riscv-mainline/gcc/gengtype-lex.l: In function ‘int > yylex(const char**)’: > gengtype-lex.cc:356:15: warning: this statement may fall through > [-Wimplicit-fallthrough=] > 356 | */ > | ~~ ^ > In file included from /home/cm/src/gcc/riscv-mainline/libgcc/unwind-dw2.c:410: > ./md-unwind-support.h: In function 'riscv_fallback_frame_state': > ./md-unwind-support.h:67:6: warning: assignment to 'struct sigcontext *' from > incompatible pointer type 'mcontext_t *' [-Wincompatible-pointer-types] > 67 | sc = &rt_->uc.uc_mcontext; > | ^ > /home/cm/src/gcc/riscv-mainline/libgcc/config/riscv/atomic.c:36:8: warning: > conflicting types for built-in function '__sync_fetch_and_add_1'; expected > 'unsigned char(volatile void *, unsigned char)' > [-Wbuiltin-declaration-mismatch] >36 | type __sync_fetch_and_ ## opname ## _ ## size (type *p, type v) > \ > |^ Please let me know if I am doing something wrong. BR Christoph > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different."
RE: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.
> On Mon, Jul 17, 2023 at 12:21 AM Tamar Christina via Gcc-patches patc...@gcc.gnu.org> wrote: > > > > > -Original Message- > > > From: Richard Biener > > > Sent: Monday, July 17, 2023 7:19 AM > > > To: Roger Sayle > > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > > > > > > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if- > conv.cc. > > > > > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle > > > > > > wrote: > > > > > > > > > > > > > > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 > > > > as > > > > > > > > the host compiler. Ok for mainline? [I might be missing > > > > something] > > > > > > OK. Btw, while I didn't spot this during review I would appreciate > > > if the code could use vec.[q]sort, this should work with a lambda as > > > well I think. > > > > That was my first use, but that hits > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 > > That is not hitting PR 99469 but rather it means your comparison is not > correct for an (unstable) sort. > That is qsort comparator should have this relationship `f(a,b) == !f(b, a)` > and > `f(a,a)` should also return false. I'm using the standard std::pair comparator which indicates that f(a,a) is true, https://en.cppreference.com/w/cpp/utility/pair/operator_cmp > If you are running into this for qsort here, you will most likely run into > issues > with std::sort later on too. Don't see why or how. It needs to have a consistent relationship which std::pair maintains. So why would using the standard tuple comparator with a standard std::sort cause problem? Thanks, Tamar > > Thanks, > Andrew > > > > > Regards, > > Tamar
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Jul 17 2023, Christoph Müllner wrote: > My host compiler is: gcc version 13.1.1 20230614 (Red Hat 13.1.1-4) (GCC) Too old. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[PATCH] Remove # from one_cmpl2 assemble output.
optimize_insn_for_speed () in assemble output is not aligned with splitter condition, and it cause an ICE when building SPEC2017 blender_r. Not sure if ctrl is supposed to be reliable in assemble output, the patch just remove that as a walkaround. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} Ready to push to trunk. libpng/pngread.c: In function ‘png_read_image’: libpng/pngread.c:786:1: internal compiler error: in final_scan_insn_1, at final.cc:2813 786 | } | ^ 0x73ac3d final_scan_insn_1 ../../gcc/final.cc:2813 0xb3420b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) ../../gcc/final.cc:2887 0xb344c4 final_1 ../../gcc/final.cc:1979 0xb34f64 rest_of_handle_final ../../gcc/final.cc:4240 0xb34f64 execute ../../gcc/final.cc:4318 gcc/ChangeLog: PR target/110438 * config/i386/sse.md (one_cmpl2): Remove # from assemble output. --- gcc/config/i386/sse.md | 4 1 file changed, 4 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6bf9c99a2c1..e1158c5717a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17220,10 +17220,6 @@ (define_insn_and_split "one_cmpl2" || mode == SImode || mode == DImode)" { - if (! && which_alternative - && optimize_insn_for_speed_p ()) -return "#"; - if (TARGET_AVX512VL) return "vpternlog\t{$0x55, %1, %0, %0|%0, %0, %1, 0x55}"; else -- 2.39.1.388.g2fc9e9ca3c
RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> From: Jiang, Haochen > Sent: 17 July 2023 02:50 > > > From: Jiang, Haochen > > Sent: Friday, July 14, 2023 10:50 AM > > > > > The recent change in TImode parameter passing on x86_64 results in > > > the FAIL of pr91681-1.c. The issue is that with the extra > > > flexibility, the combine pass is now spoilt for choice between using > > > either the *add3_doubleword_concat or the > > > *add3_doubleword_zext patterns, when one operand is a *concat and > the other is a zero_extend. > > > The solution proposed below is provide an > > > *add3_doubleword_concat_zext define_insn_and_split, that can > > > benefit both from the register allocation of *concat, and still > > > avoid the xor normally required by zero extension. > > > > > > I'm investigating a follow-up refinement to improve register > > > allocation further by avoiding the early clobber in the =&r, and > > > handling (custom) reloads explicitly, but this piece resolves the > > > testcase > > failure. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make > > > bootstrap and make -k check, both with and without > > > --target_board=unix{-m32} with no new failures. Ok for mainline? > > > > > > > > > 2023-07-11 Roger Sayle > > > > > > gcc/ChangeLog > > > PR target/91681 > > > * config/i386/i386.md (*add3_doubleword_concat_zext): New > > > define_insn_and_split derived from *add3_doubleword_concat > > > and *add3_doubleword_zext. > > > > Hi Roger, > > > > This commit currently changed the codegen of testcase p443644-2.c from: > > Oops, a typo, I mean pr43644-2.c. > > Haochen I'm working on a fix and hope to have this resolved soon (unfortunately fixing things in a post-reload splitter isn't working out due to reload's choices, so the solution will likely be a peephole2). The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as written)! The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or as "xor x,x; add y,x". pr91681-1.c checks there isn't an xor, pr43644-2.c checks there isn't a mov. Doh! As the author of both these test cases, I've painted myself into a corner. The solution is that add $0,x should be generated (optimal) when y is already in x, and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add $0,x", both sequences being approximately equal performance-wise). > > movq%rdx, %rax > > xorl%edx, %edx > > addq%rdi, %rax > > adcq%rsi, %rdx > > to: > > movq%rdx, %rcx > > movq%rdi, %rax > > movq%rsi, %rdx > > addq%rcx, %rax > > adcq$0, %rdx > > > > which causes the testcase fail under -m64. > > Is this within your expectation? You're right that the original (using xor) is better for pr43644-2.c's test case. unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } but the closely related (swapping the argument order): unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return x+y; } is better using "adcq $0", than having a superfluous xor. Executive summary: This FAIL isn't serious. I'll silence it soon. > > BRs, > > Haochen > > > > > > > > > > > Thanks, > > > Roger > > > --
[PATCH] Export value/mask known bits from CCP.
Currently CCP throws away the known 1 bits because VRP and irange have traditionally only had a way of tracking known 0s (set_nonzero_bits). With the ability to keep all the known bits in the irange, we can now save this between passes. OK? gcc/ChangeLog: * tree-ssa-ccp.cc (ccp_finalize): Export value/mask known bits. --- gcc/tree-ssa-ccp.cc | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc index 0d0f02a8442..64d5fa81334 100644 --- a/gcc/tree-ssa-ccp.cc +++ b/gcc/tree-ssa-ccp.cc @@ -1020,11 +1020,9 @@ ccp_finalize (bool nonzero_p) else { unsigned int precision = TYPE_PRECISION (TREE_TYPE (val->value)); - wide_int nonzero_bits - = (wide_int::from (val->mask, precision, UNSIGNED) - | wi::to_wide (val->value)); - nonzero_bits &= get_nonzero_bits (name); - set_nonzero_bits (name, nonzero_bits); + wide_int value = wi::to_wide (val->value); + wide_int mask = wide_int::from (val->mask, precision, UNSIGNED); + set_bitmask (name, value, mask); } } -- 2.40.1
[PATCH] Export value/mask known bits from IPA.
Currently IPA throws away the known 1 bits because VRP and irange have traditionally only had a way of tracking known 0s (set_nonzero_bits). With the ability to keep all the known bits in the irange, we can now save this between passes. OK? gcc/ChangeLog: * ipa-prop.cc (ipcp_update_bits): Export value/mask known bits. --- gcc/ipa-prop.cc | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index d2b998f8af5..5d790ff1265 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, ipcp_transformation *ts) { unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef)); signop sgn = TYPE_SIGN (TREE_TYPE (ddef)); - - wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED) - | wide_int::from (bits[i]->value, prec, sgn); - set_nonzero_bits (ddef, nonzero_bits); + wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED); + wide_int value = wide_int::from (bits[i]->value, prec, sgn); + set_bitmask (ddef, value, mask); } else { -- 2.40.1
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Mon, Jul 17, 2023 at 9:31 AM Andrew Pinski wrote: > > On Sun, Jul 16, 2023 at 11:49 PM Christoph Müllner > wrote: > > > > On Fri, Jul 14, 2023 at 12:28 PM Andreas Schwab > > wrote: > > > > > > Why didn't you test that? > > > > Thanks for reporting, and sorry for introducing this warning. > > > > I test all patches before sending them. > > In the case of RISC-V backend patches, I build a 2-stage > > cross-toolchain and run all regression tests for RV32 and RV64 (using > > QEMU). > > Testing is done with and without patches applied to identify regressions. > > > > The build process shows a lot of warnings. Therefore I did not > > investigate finding a way to use -Werror. > > This means that looking for compiler warnings is a manual step, and I > > might miss one while scrolling through the logs. > > If you are building a cross compiler, and want to clean up warnings, > first build a native compiler and then build the cross using that. Ok, will adjust my workflow accordingly. > Also maybe it is finding a way to do native bootstraps on riscv to do > testing of patches rather than doing just cross builds when testing > backend patches. > Especially when I have found the GCC testsuite but the bootstrap is > more likely to find backend issues and such. Yes, using the patch-under-testing to build a toolchain can identify issues that the testsuite can't find. I did that a couple of times in a QEMU environment, but I prefer the cross-toolchain approach because it is faster. For patches that have a bigger impact, I test the toolchain with SPEC CPU 2017. Thanks, Christoph > > Thanks, > Andrew > > > > > Sorry for the inconvenience, > > Christoph > > > > > > > > > > ../../gcc/config/riscv/riscv.cc: In function 'int > > > riscv_regno_ok_for_index_p(int)': > > > ../../gcc/config/riscv/riscv.cc:864:33: error: unused parameter 'regno' > > > [-Werror=unused-parameter] > > > 864 | riscv_regno_ok_for_index_p (int regno) > > > | ^ > > > cc1plus: all warnings being treated as errors > > > make[3]: *** [Makefile:2499: riscv.o] Error 1 > > > > > > -- > > > Andreas Schwab, sch...@linux-m68k.org > > > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > > > "And now for something completely different."
Re: [gcc r14-2455] riscv: Prepare backend for index registers
On Mon, Jul 17, 2023 at 9:44 AM Andreas Schwab wrote: > > On Jul 17 2023, Christoph Müllner wrote: > > > My host compiler is: gcc version 13.1.1 20230614 (Red Hat 13.1.1-4) (GCC) > > Too old. Ok understood. Thanks, Christoph > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different."
[PATCH] Use substituted GDCFLAGS
Use the substituted value for GCDFLAGS instead of hardcoding $(CFLAGS) so that the subdir configure scripts use the configured value. * configure.ac (GDCFLAGS): Set default from ${CFLAGS}. * configure: Regenerate. * Makefile.in (GDCFLAGS): Substitute @GDCFLAGS@. --- Makefile.in | 2 +- configure| 1 + configure.ac | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile.in b/Makefile.in index 04307ca561b..144bccd2603 100644 --- a/Makefile.in +++ b/Makefile.in @@ -444,7 +444,7 @@ LIBCFLAGS = $(CFLAGS) CXXFLAGS = @CXXFLAGS@ LIBCXXFLAGS = $(CXXFLAGS) -fno-implicit-templates GOCFLAGS = $(CFLAGS) -GDCFLAGS = $(CFLAGS) +GDCFLAGS = @GDCFLAGS@ GM2FLAGS = $(CFLAGS) # Pass additional PGO and LTO compiler options to the PGO build. diff --git a/configure b/configure index 0d3f5c6455d..3269da9829f 100755 --- a/configure +++ b/configure @@ -12947,6 +12947,7 @@ fi +GDCFLAGS=${GDCFLAGS-${CFLAGS}} # Target tools. diff --git a/configure.ac b/configure.ac index dddab2a56d8..d07a0fa7698 100644 --- a/configure.ac +++ b/configure.ac @@ -3662,6 +3662,7 @@ AC_SUBST(CFLAGS) AC_SUBST(CXXFLAGS) AC_SUBST(GDC) AC_SUBST(GDCFLAGS) +GDCFLAGS=${GDCFLAGS-${CFLAGS}} # Target tools. AC_ARG_WITH([build-time-tools], -- 2.41.0 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[PATCH] tree-optimization/110669 - bogus matching of loop bitop
The matching code lacked a check that we end up with a PHI node in the loop header. This caused us to match a random PHI argument now catched by the extra PHI_ARG_DEF_FROM_EDGE checking. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/110669 * tree-scalar-evolution.cc (analyze_and_compute_bitop_with_inv_effect): Check we matched a header PHI. * gcc.dg/torture/pr110669.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr110669.c | 15 +++ gcc/tree-scalar-evolution.cc| 1 + 2 files changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr110669.c diff --git a/gcc/testsuite/gcc.dg/torture/pr110669.c b/gcc/testsuite/gcc.dg/torture/pr110669.c new file mode 100644 index 000..b0a9ea448f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr110669.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +int g_29, func_47_p_48, func_47_p_51_l_129; +void func_47_p_51() +{ + for (;;) +{ + func_47_p_51_l_129 = 0; + for (; func_47_p_51_l_129 <= 1; func_47_p_51_l_129 += 1) + { + short *l_160 = (short *)(__UINTPTR_TYPE__)(func_47_p_48 || *l_160); + *l_160 &= g_29; + } +} +} diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index ba47a684f4b..2abe8fa0b90 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -3674,6 +3674,7 @@ analyze_and_compute_bitop_with_inv_effect (class loop* loop, tree phidef, if (TREE_CODE (match_op[1]) != SSA_NAME || !expr_invariant_in_loop_p (loop, match_op[0]) || !(header_phi = dyn_cast (SSA_NAME_DEF_STMT (match_op[1]))) + || gimple_bb (header_phi) != loop->header || gimple_phi_num_args (header_phi) != 2) return NULL_TREE; -- 2.35.3
[PATCH V2] RISC-V: Support non-SLP unordered reduction
This patch add reduc_*_scal to support reduction auto-vectorization. Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. Consider this following case: int __attribute__((noipa)) and_loop (int32_t * __restrict x, int32_t n, int res) { for (int i = 0; i < n; ++i) res &= x[i]; return res; } ASM: and_loop: ble a1,zero,.L4 vsetvli a3,zero,e32,m1,ta,ma vmv.v.i v1,-1 .L3: vsetvli a5,a1,e32,m1,tu,ma > MUST BE "TU". sllia4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vand.vv v1,v2,v1 bne a1,zero,.L3 vsetivlizero,1,e32,m1,ta,ma vmv.v.i v2,-1 vsetvli a3,zero,e32,m1,ta,ma vredand.vs v1,v1,v2 vmv.x.s a5,v1 and a0,a2,a5 ret .L4: mv a0,a2 ret Fix bug of VSETVL PASS which is caused by reduction testcase. SLP reduction and floating-point in-order reduction are not supported yet. gcc/ChangeLog: * config/riscv/autovec.md (reduc_plus_scal_): New pattern. (reduc_smax_scal_): Ditto. (reduc_umax_scal_): Ditto. (reduc_smin_scal_): Ditto. (reduc_umin_scal_): Ditto. (reduc_and_scal_): Ditto. (reduc_ior_scal_): Ditto. (reduc_xor_scal_): Ditto. * config/riscv/riscv-protos.h (enum insn_type): Add reduction. (expand_reduction): New function. * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto. (emit_vlmax_fp_reduction_insn): Ditto. (get_m1_mode): Ditto. (expand_cond_len_binop): Fix name. (expand_reduction): New function * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG. (validate_change_or_fail): New function. (change_insn): Fix VSETVL BUG. (change_vsetvl_insn): Ditto. (pass_vsetvl::backward_demand_fusion): Ditto. (pass_vsetvl::df_post_optimization): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add reduction tests. * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test. --- gcc/config/riscv/autovec.md | 138 ++ gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv-v.cc | 84 ++- gcc/config/riscv/riscv-vsetvl.cc | 57 ++-- .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++ .../riscv/rvv/autovec/reduc/reduc-2.c | 129 .../riscv/rvv/autovec/reduc/reduc-3.c | 65 + .../riscv/rvv/autovec/reduc/reduc-4.c | 59 .../riscv/rvv/autovec/reduc/reduc_run-1.c | 56 +++ .../riscv/rvv/autovec/reduc/reduc_run-2.c | 79 ++ .../riscv/rvv/autovec/reduc/reduc_run-3.c | 49 +++ .../riscv/rvv/autovec/reduc/reduc_run-4.c | 66 + gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + 13 files changed, 887 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 64a41bd7101..8cdec75bacf 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -1554,3 +1554,141 @@ riscv_vector::expand_cond_len_ternop (icode, operands); DONE; }) + +;; = +;; == Reductions +;; = + +;; - +;; [INT] Tree reductions +;; - +;; Includes: +;; - vredsum.vs +;; - vredmaxu.vs +;; - vredmax.vs +;; - vredminu.vs +;; - vredmin.vs +;; - vredand.vs +;; - vredor.vs +;; - vredxor.vs +;; -
Re: [WIP RFC] Add support for keyword-based attributes
Richard Biener writes: > On Fri, Jul 14, 2023 at 5:58 PM Richard Sandiford via Gcc-patches > wrote: >> >> Summary: We'd like to be able to specify some attributes using >> keywords, rather than the traditional __attribute__ or [[...]] >> syntax. Would that be OK? >> >> In more detail: >> >> We'd like to add some new target-specific attributes for Arm SME. >> These attributes affect semantics and code generation and so they >> can't simply be ignored. >> >> Traditionally we've done this kind of thing by adding GNU attributes, >> via TARGET_ATTRIBUTE_TABLE in GCC's case. The problem is that both >> GCC and Clang have traditionally only warned about unrecognised GNU >> attributes, rather than raising an error. Older compilers might >> therefore be able to look past some uses of the new attributes and >> still produce object code, even though that object code is almost >> certainly going to be wrong. (The compilers will also emit a default-on >> warning, but that might go unnoticed when building a big project.) >> >> There are some existing attributes that similarly affect semantics >> in ways that cannot be ignored. vector_size is one obvious example. >> But that doesn't make it a good thing. :) >> >> Also, C++ says this for standard [[...]] attributes: >> >> For an attribute-token (including an attribute-scoped-token) >> not specified in this document, the behavior is implementation-defined; >> any such attribute-token that is not recognized by the implementation >> is ignored. >> >> which doubles down on the idea that attributes should not be used >> for necessary semantic information. >> >> One of the attributes we'd like to add provides a new way of compiling >> existing code. The attribute doesn't require SME to be available; >> it just says that the code must be compiled so that it can run in either >> of two modes. This is probably the most dangerous attribute of the set, >> since compilers that ignore it would just produce normal code. That >> code might work in some test scenarios, but it would fail in others. >> >> The feeling from the Clang community was therefore that these SME >> attributes should use keywords instead, so that the keywords trigger >> an error with older compilers. >> >> However, it seemed wrong to define new SME-specific grammar rules, >> since the underlying problem is pretty generic. We therefore >> proposed having a type of keyword that can appear exactly where >> a standard [[...]] attribute can appear and that appertains to >> exactly what a standard [[...]] attribute would appertain to. >> No divergence or cherry-picking is allowed. >> >> For example: >> >> [[arm::foo]] >> >> would become: >> >> __arm_foo >> >> and: >> >> [[arm::bar(args)]] >> >> would become: >> >> __arm_bar(args) >> >> It wouldn't be possible to retrofit arguments to a keyword that >> previously didn't take arguments, since that could lead to parsing >> ambiguities. So when a keyword is first added, a binding decision >> would need to be made whether the keyword always takes arguments >> or is always standalone. >> >> For that reason, empty argument lists are allowed for keywords, >> even though they're not allowed for [[...]] attributes. >> >> The argument-less version was accepted into Clang, and I have a follow-on >> patch for handling arguments. Would the same thing be OK for GCC, >> in both the C and C++ frontends? >> >> The patch below is a proof of concept for the C frontend. It doesn't >> bootstrap due to warnings about uninitialised fields. And it doesn't >> have tests. But I did test it locally with various combinations of >> attribute_spec and it seemed to work as expected. >> >> The impact on the C frontend seems to be pretty small. It looks like >> the impact on the C++ frontend would be a bit bigger, but not much. >> >> The patch contains a logically unrelated change: c-common.h set aside >> 16 keywords for address spaces, but of the in-tree ports, the maximum >> number of keywords used is 6 (for amdgcn). The patch therefore changes >> the limit to 8 and uses 8 keywords for the new attributes. This keeps >> the number of reserved ids <= 256. > > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only > need one additional keyword - we could set aside a similar one for each > target then. I realize that double-nesting of arguments might prove a bit > challenging but still. Yeah, that would work. > In any case I also think that attributes are what you want and their > ugliness/issues are not worse than the ugliness/issues of the keyword > approach IMHO. I guess the ugliness of keywords is the double underscore? What are the issues with the keyword approach though? If it's two underscores vs miscompilation then it's not obvious that two underscores should lose. Richard
Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction
Address comment. V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624638.html I added: +/* Change insn and Assert the change always happens. */ +static void +validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group) +{ + bool change_p = validate_change (object, loc, new_rtx, in_group); + gcc_assert (change_p); +} as you suggested. Could you take a look again? juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-07-17 15:00 To: juzhe.zhong CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction > @@ -247,6 +248,7 @@ void emit_vlmax_cmp_mu_insn (unsigned, rtx *); > void emit_vlmax_masked_mu_insn (unsigned, int, rtx *); > void emit_scalar_move_insn (unsigned, rtx *); > void emit_nonvlmax_integer_move_insn (unsigned, rtx *, rtx); > +//void emit_vlmax_reduction_insn (unsigned, rtx *); Plz drop this. > diff --git a/gcc/config/riscv/riscv-vsetvl.cc > b/gcc/config/riscv/riscv-vsetvl.cc > index 586dc8e5379..97a9dad8a77 100644 > --- a/gcc/config/riscv/riscv-vsetvl.cc > +++ b/gcc/config/riscv/riscv-vsetvl.cc > @@ -646,7 +646,8 @@ gen_vsetvl_pat (enum vsetvl_type insn_type, const > vl_vtype_info &info, rtx vl) > } > > static rtx > -gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info) > +gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info &info, > + rtx vl = NULL_RTX) > { >rtx new_pat; >vl_vtype_info new_info = info; > @@ -657,7 +658,7 @@ gen_vsetvl_pat (rtx_insn *rinsn, const vector_insn_info > &info) >if (vsetvl_insn_p (rinsn) || vlmax_avl_p (info.get_avl ())) > { >rtx dest = get_vl (rinsn); rtx dest = vl ? vl : get_vl (rinsn); > - new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, dest); > + new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl ? vl : dest); and keep dest here. > } >else if (INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only) > new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX); Should we handle vl is non-null case in else-if and else case? Add `assert (vl == NULL_RTX)` if not handle. > @@ -818,7 +819,8 @@ change_insn (rtx_insn *rinsn, rtx new_pat) >print_rtl_single (dump_file, PATTERN (rinsn)); > } > > - validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > + bool change_p = validate_change (rinsn, &PATTERN (rinsn), new_pat, false); > + gcc_assert (change_p); I think we could create a wrapper for validate_change to make sure that return true, and also use that wrapper for all other call sites? e.g. validate_change_or_fail?
Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.
I'd like to ping for this patch (only patch 1/2, for patch 2/2, I think that may not be necessary). On Mon, May 15, 2023 at 9:20 AM Hongtao Liu wrote: > > ping. > > On Fri, Apr 21, 2023 at 9:55 PM liuhongt wrote: > > > > > > + if (!TARGET_SSE2) > > > > +{ > > > > + if (c_dialect_cxx () > > > > + && cxx_dialect > cxx20) > > > > > > Formatting, both conditions are short, so just put them on one line. > > Changed. > > > > > But for the C++23 macros, more importantly I think we really should > > > also in ix86_target_macros_internal add > > > if (c_dialect_cxx () > > > && cxx_dialect > cxx20 > > > && (isa_flag & OPTION_MASK_ISA_SSE2)) > > > { > > > def_or_undef (parse_in, "__STDCPP_FLOAT16_T__"); > > > def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__"); > > > } > > > plus associated libstdc++ changes. It can be done incrementally though. > > Added in PATCH 2/2 > > > > > > + if (flag_building_libgcc) > > > > + { > > > > + /* libbid uses __LIBGCC_HAS_HF_MODE__ and __LIBGCC_HAS_BF_MODE__ > > > > + to check backend support of _Float16 and __bf16 type. */ > > > > > > That is actually the case only for HFmode, but not for BFmode right now. > > > So, we need further work. One is to add the BFmode support in there, > > > and another one is make sure the _Float16 <-> _Decimal* and __bf16 <-> > > > _Decimal* conversions are compiled in also if not -msse2 by default. > > > One way to do that is wrap the HF and BF mode related functions on x86 > > > #ifndef __SSE2__ into the pragmas like intrin headers use (but then > > > perhaps we don't need to undef this stuff here), another is not provide > > > the hf/bf support in that case from the TUs where they are provided now, > > > but from a different one which would be compiled with -msse2. > > Add CFLAGS-_hf_to_sd.c += -msse2, similar for other files in libbid, just > > like > > we did before for HFtype softfp. Then no need to undef libgcc macros. > > > > > >/* We allowed the user to turn off SSE for kernel mode. Don't crash > > > > if > > > > some less clueful developer tries to use floating-point anyway. > > > > */ > > > > - if (needed_sseregs && !TARGET_SSE) > > > > + if (needed_sseregs > > > > + && (!TARGET_SSE > > > > + || (VALID_SSE2_TYPE_MODE (mode) > > > > + && !TARGET_SSE2))) > > > > > > Formatting, no need to split this up that much. > > > if (needed_sseregs > > > && (!TARGET_SSE > > > || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2))) > > > or even better > > > if (needed_sseregs > > > && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2))) > > > will do it. > > Changed. > > > > > Instead of this, just use > > > if (!float16_type_node) > > > { > > > float16_type_node = ix86_float16_type_node; > > > callback (float16_type_node); > > > float16_type_node = NULL_TREE; > > > } > > > if (!bfloat16_type_node) > > > { > > > bfloat16_type_node = ix86_bf16_type_node; > > > callback (bfloat16_type_node); > > > bfloat16_type_node = NULL_TREE; > > > } > > Changed. > > > > > > > > +static const char * > > > > +ix86_invalid_conversion (const_tree fromtype, const_tree totype) > > > > +{ > > > > + if (element_mode (fromtype) != element_mode (totype)) > > > > +{ > > > > + /* Do no allow conversions to/from BFmode/HFmode scalar types > > > > + when TARGET_SSE2 is not available. */ > > > > + if ((TYPE_MODE (fromtype) == BFmode > > > > +|| TYPE_MODE (fromtype) == HFmode) > > > > + && !TARGET_SSE2) > > > > > > First of all, not really sure if this should be purely about scalar > > > modes, not also complex and vector modes involving those inner modes. > > > Because complex or vector modes with BF/HF elements will be without > > > TARGET_SSE2 for sure lowered into scalar code and that can't be handled > > > either. > > > So if (!TARGET_SSE2 && GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode) > > > or even better > > > if (!TARGET_SSE2 && element_mode (fromtype) == BFmode) > > > ? > > > Or even better remember the 2 modes above into machine_mode temporaries > > > and just use those in the != comparison and for the checks? > > > > > > Also, I think it is weird to tell user %<__bf16%> or %<_Float16%> when > > > we know which one it is. Just return separate messages? > > Changed. > > > > > > + /* Reject all single-operand operations on BFmode/HFmode except for & > > > > + when TARGET_SSE2 is not available. */ > > > > + if ((element_mode (type) == BFmode || element_mode (type) == HFmode) > > > > + && !TARGET_SSE2 && op != ADDR_EXPR) > > > > +return N_("operation not permitted on type %<__bf16%> " > > > > + "or %<_Float16%> without option %<-msse2%>"); > > > > > > Similarly. Also, check !TARGET_SSE2 first as inexpensive one. > > Changed. > > > > > > Bootstrapped and regtested on
Re: [PATCH] vect: Initialize new_temp to avoid false positive warning [PR110652]
on 2023/7/17 14:39, Richard Biener wrote: > On Mon, Jul 17, 2023 at 4:22 AM Kewen.Lin wrote: >> >> Hi, >> >> As PR110652 and its duplicate PRs show, there could be one >> build error >> >> error: 'new_temp' may be used uninitialized >> >> for some build configurations. It's a false positive warning >> (or error at -Werror), but in order to make the build succeed, >> this patch is to initialize the reported variable 'new_temp' >> as NULL_TREE. >> >> Confirmed this patch fixed the reported issue in PR110652 >> (with the same configuration). >> >> Is it ok for trunk? > > OK. Thanks Richi, pushed as r14-2560. BR, Kewen
[PATCH] riscv: Fix warning in riscv_regno_ok_for_index_p
From: Christoph Müllner The variable `regno` is currently not used in riscv_regno_ok_for_index_p(), which triggers a compiler warning. Let's address this. Fixes: 423604278ed5 ("riscv: Prepare backend for index registers") Reported-by: Juzhe Zhong Reported-by: Andreas Schwab Signed-off-by: Christoph Müllner gcc/ChangeLog: * config/riscv/riscv.cc (riscv_regno_ok_for_index_p): Remove parameter name from declaration of unused parameter. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 6ed735d6983..ae3c034e76e 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -861,7 +861,7 @@ riscv_index_reg_class () but extensions might support that. */ int -riscv_regno_ok_for_index_p (int regno) +riscv_regno_ok_for_index_p (int) { return 0; } -- 2.41.0
[PATCH 0/9] Native complex operations
Hi, I have recently started a discussion about exposing complex operations directly to the backends, to better exploit ISA with complex instructions. The title of the original message is "[RFC] Exposing complex numbers to target backends" [1]. This message starts a serie of 9 patches of the implementation that I've done. 8 patches are about generic code, split by features. The last one is an experimental update of the x86 backend which exploits the newly exposed complex operations. My original work was on the KVX backend from Kalray, where the ISA has complex instructions. So I have obtained huge performance gains, on par with a code which uses builtins. On x86 there are gains without -ffast-math because less calls to helpers are performed, but gains are marginal with -ffast-math due to the lack of complex instructions. [1] https://gcc.gnu.org/pipermail/gcc/2023-July/241998.html Summary of the 9 patches: 1/9: Conditional lowering of complex operations using the backend + update on the TREE complex constants 2/9: Move of read_complex_part and write_complex_part to target hooks to let the backend decide 3/9: Add a gen_rtx_complex target hook to let the backend use its preferred representation of complex in rtl 4/9: Support and optimize the use of classical registers to represent complex 5/9: Expose the conjugate operation down to the backend 6/9: Expose and optimize complex rotations using internals functions and conditional lowering 7/9: Allow the vectorizer to work on complex type like it does on scalars 8/9: Add explicit vectors of complex. This remains optional 9/9: Experimental update on the x86 backend to exploit some of the previous features The following sections explains the features added by each patch and illustrates them with examples on KVX, because of the backend support all the new features. All examples are compiled with -O2 -ffast-math. Patches 1 to 4 are required to have the minimal set of features which allows a backend to exploit native complex operations. PATCH 1/9: - Change the TREE complex constants by adding a new field called "both" in the tree_complex struct, which holds a vector of the real and imaginary parts. This make the handling of constants during the cplxlower and expand passes easier. Any change to the one part will also affect the vector, so very few changes are needed elsewhere. - Check in the optab for a complex pattern for almost all operations in the cplxlower pass. The lowering is done only if an optab code is found. Some conditions on presence on constants in the operands were also added, which can be a subject of discussions. - Add a complex component for both parts in the cplxlower pass. When an operation is lowered, the both part is recreated using a COMPLEX_EXPR. When an operation is kept non-lowerd, real and imaginary parts are extracted using REALPART_EXPR and IMAGPART_EXPR. PATCH 2/9: - Move the inner implementation of read_complex_part and write_complex_part to target hooks. This allows each backend to have its own implementation, while the default ones are almost the same as before. Going back to standard functions may be a point to discuss if no incompatible change are done by the target to the default implementation. - Change the signature of read_complex_part and write_complex_part, to allow both parts as a part. This affects all the calls to these functions. PATCH 3/9: - Add a new target hook to replace gen_rtx_CONCAT when a complex element needs to be created. The default implementation uses gen_rtx_CONCAT, but the KVX implementation simply created a register with a complex type. A previous attempt was to deal with generating_concat_p in gen_rtx_reg, but no good solutions was found. PATCH 4/9: - Adapt and optimize for the use of native complex operation in rtl, aswell as register of complex types. After this patch, it's now possible to re-implement the three new hooks and write some complex pattern. Considering the following example: _Complex float mul(_Complex float a, _Complex float b) { return a * b; } Previously, the generated code was: mul: copyw $r3 = $r0 extfz $r5 = $r0, 32+32-1, 32 ; extract imag part ;; # (end cycle 0) fmulw $r4 = $r3, $r1 ; float mul ;; # (end cycle 1) fmulw $r2 = $r5, $r1 ; float mul extfz $r1 = $r1, 32+32-1, 32 ; extract real part ;; # (end cycle 2) ffmsw $r4 = $r5, $r1 ; float FMS ;; # (end cycle 5) ffmaw $r2 = $r3, $r1 : float FMA ;; # (end cycle 6) insf $r0 = $r4, 32+0-1, 0; insert real part ;; # (end cycle 9) insf $r0 = $r2, 32+32-1, 32 ; insert imag part ret ;; # (end cycle 10) The KV
[PATCH 1/9] Native complex operations: Conditional lowering
Allow the cplxlower pass to identify if an operation does not need to be lowered through optabs. In this case, lowering is not performed. The cplxlower pass now has to handle a mix of lowered and non-lowered operations. A quick access to both parts of a complex constant is also implemented. gcc/lto/ChangeLog: * lto-common.cc (compare_tree_sccs_1): Handle both parts of a complex constant gcc/ChangeLog: * coretypes.h: Add enum for complex parts * gensupport.cc (match_pattern): Add complex types * lto-streamer-out.cc (DFS::DFS_write_tree_body): (hash_tree): Handle both parts of a complex constant * tree-complex.cc (get_component_var): Support handling of both parts of a complex (get_component_ssa_name): Likewise (set_component_ssa_name): Likewise (extract_component): Likewise (update_complex_components): Likewise (update_complex_components_on_edge): Likewise (update_complex_assignment): Likewise (update_phi_components): Likewise (expand_complex_move): Likewise (expand_complex_asm): Update with complex_part_t (complex_component_cst_p): New: check if a complex component is a constant (target_native_complex_operation): New: Check if complex operation is supported natively by the backend, through the optab (expand_complex_operations_1): Condionally lowered ops (tree_lower_complex): Support handling of both parts of a complex * tree-core.h (struct GTY): Add field for both parts of the tree_complex struct * tree-streamer-in.cc (lto_input_ts_complex_tree_pointers): Handle both parts of a complex constant * tree-streamer-out.cc (write_ts_complex_tree_pointers): Likewise * tree.cc (build_complex): likewise * tree.h (class auto_suppress_location_wrappers): (type_has_mode_precision_p): Add special case for complex --- gcc/coretypes.h | 9 + gcc/gensupport.cc| 2 + gcc/lto-streamer-out.cc | 2 + gcc/lto/lto-common.cc| 2 + gcc/tree-complex.cc | 434 +-- gcc/tree-core.h | 1 + gcc/tree-streamer-in.cc | 1 + gcc/tree-streamer-out.cc | 1 + gcc/tree.cc | 8 + gcc/tree.h | 15 +- 10 files changed, 363 insertions(+), 112 deletions(-) diff --git a/gcc/coretypes.h b/gcc/coretypes.h index ca8837cef67..a000c104b53 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -443,6 +443,15 @@ enum optimize_size_level OPTIMIZE_SIZE_MAX }; +/* part of a complex */ + +typedef enum +{ + REAL_P = 0, + IMAG_P = 1, + BOTH_P = 2 +} complex_part_t; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second either NULL if the pointer to pointer points into a GC object or the actual pointer address if diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc index 959d1d9c83c..9aa2ba69fcd 100644 --- a/gcc/gensupport.cc +++ b/gcc/gensupport.cc @@ -3746,9 +3746,11 @@ match_pattern (optab_pattern *p, const char *name, const char *pat) break; if (*p == 0 && (! force_int || mode_class[i] == MODE_INT + || mode_class[i] == MODE_COMPLEX_INT || mode_class[i] == MODE_VECTOR_INT) && (! force_partial_int || mode_class[i] == MODE_INT + || mode_class[i] == MODE_COMPLEX_INT || mode_class[i] == MODE_PARTIAL_INT || mode_class[i] == MODE_VECTOR_INT) && (! force_float diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc index 5ffa8954022..38c48e44867 100644 --- a/gcc/lto-streamer-out.cc +++ b/gcc/lto-streamer-out.cc @@ -985,6 +985,7 @@ DFS::DFS_write_tree_body (struct output_block *ob, { DFS_follow_tree_edge (TREE_REALPART (expr)); DFS_follow_tree_edge (TREE_IMAGPART (expr)); + DFS_follow_tree_edge (TREE_COMPLEX_BOTH_PARTS (expr)); } if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL)) @@ -1417,6 +1418,7 @@ hash_tree (struct streamer_tree_cache_d *cache, hash_map *map, { visit (TREE_REALPART (t)); visit (TREE_IMAGPART (t)); + visit (TREE_COMPLEX_BOTH_PARTS (t)); } if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL)) diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc index 703e665b698..f647ee62f9e 100644 --- a/gcc/lto/lto-common.cc +++ b/gcc/lto/lto-common.cc @@ -1408,6 +1408,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map) { compare_tree_edges (TREE_REALPART (t1), TREE_REALPART (t2)); compare_tree_edges (TREE_IMAGPART (t1), TREE_IMAGPART (t2)); + compare_tree_edges (TREE_COMPLEX_BOTH_PARTS (t1), + TREE_COMPLEX_BOTH_PARTS (t2)); } if (CODE_
[PATCH 2/9] Native complex operations: Move functions to hooks
Move read_complex_part and write_complex_part to target hooks. Their signature also change because of the type of argument part is now complex_part_t. Calls to theses functions are updated accordingly. gcc/ChangeLog: * target.def: Define hooks for read_complex_part and write_complex_part * targhooks.cc (default_read_complex_part): New: default implementation of read_complex_part (default_write_complex_part): New: default implementation if write_complex_part * targhooks.h: Add default_read_complex_part and default_write_complex_part * doc/tm.texi: Document the new TARGET_READ_COMPLEX_PART and TARGET_WRITE_COMPLEX_PART hooks * doc/tm.texi.in: Add TARGET_READ_COMPLEX_PART and TARGET_WRITE_COMPLEX_PART * expr.cc (write_complex_part): Call TARGET_READ_COMPLEX_PART hook (read_complex_part): Call TARGET_WRITE_COMPLEX_PART hook * expr.h: Update function signatures of read_complex_part and write_complex_part * builtins.cc (expand_ifn_atomic_compare_exchange_into_call): Update calls to read_complex_part and write_complex_part (expand_ifn_atomic_compare_exchange): Likewise * expmed.cc (flip_storage_order): Likewise (clear_storage_hints): Likewise and write_complex_part (emit_move_complex_push): Likewise (emit_move_complex_parts): Likewise (expand_assignment): Likewise (expand_expr_real_2): Likewise (expand_expr_real_1): Likewise (const_vector_from_tree): Likewise * internal-fn.cc (expand_arith_set_overflow): Likewise (expand_arith_overflow_result_store): Likewise (expand_addsub_overflow): Likewise (expand_neg_overflow): Likewise (expand_mul_overflow): Likewise (expand_arith_overflow): Likewise (expand_UADDC): Likewise --- gcc/builtins.cc| 8 +-- gcc/doc/tm.texi| 10 +++ gcc/doc/tm.texi.in | 4 ++ gcc/expmed.cc | 4 +- gcc/expr.cc| 164 + gcc/expr.h | 5 +- gcc/internal-fn.cc | 20 +++--- gcc/target.def | 18 + gcc/targhooks.cc | 139 ++ gcc/targhooks.h| 5 ++ 10 files changed, 224 insertions(+), 153 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index 6dff5214ff8..37da6bcae6f 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -6347,8 +6347,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode) if (GET_MODE (boolret) != mode) boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1); x = force_reg (mode, x); - write_complex_part (target, boolret, true, true); - write_complex_part (target, x, false, false); + write_complex_part (target, boolret, IMAG_P, true); + write_complex_part (target, x, REAL_P, false); } } @@ -6403,8 +6403,8 @@ expand_ifn_atomic_compare_exchange (gcall *call) rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); if (GET_MODE (boolret) != mode) boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1); - write_complex_part (target, boolret, true, true); - write_complex_part (target, oldval, false, false); + write_complex_part (target, boolret, IMAG_P, true); + write_complex_part (target, oldval, REAL_P, false); } } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 95ba56e05ae..87997b76338 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4605,6 +4605,16 @@ to return a nonzero value when it is required, the compiler will run out of spill registers and print a fatal error message. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part}) +This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}. + @var{part} can be the real part, the imaginary part, or both of them. +@end deftypefn + +@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx @var{val}, complex_part_t @var{part}, bool @var{undefined_p}) +This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}. + @var{var} can be the real part, the imaginary part, or both of them. +@end deftypefn + @node Scalar Return @subsection How Scalar Function Values Are Returned @cindex return values in registers diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 4ac96dc357d..efbf972e6a7 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3390,6 +3390,10 @@ stack. @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P +@hook TARGET_READ_COMPLEX_PART + +@hook TARGET_WRITE_COMPLEX_PART + @node Scalar Return @subsection How Scalar Function Values Are Returned @cindex return values in registers diff --git a/gcc/expmed.cc b/gcc/expmed.cc index fbd4ce2d42f..2f787cc
[PATCH 3/9] Native complex operations: Add gen_rtx_complex hook
Add a new target hook for complex element creation during the expand pass, called gen_rtx_complex. The default implementation calls gen_rtx_CONCAT like before. Then calls to gen_rtx_CONCAT for complex handling are replaced by calls to targetm.gen_rtx_complex. gcc/ChangeLog: * target.def: Add gen_rtx_complex target hook * targhooks.cc (default_gen_rtx_complex): New: Default implementation for gen_rtx_complex * targhooks.h: Add default_gen_rtx_complex * doc/tm.texi: Document TARGET_GEN_RTX_COMPLEX * doc/tm.texi.in: Add TARGET_GEN_RTX_COMPLEX * emit-rtl.cc (gen_reg_rtx): Replace call to gen_rtx_CONCAT by call to gen_rtx_complex (init_emit_once): Likewise * expmed.cc (flip_storage_order): Likewise * optabs.cc (expand_doubleword_mod): Likewise --- gcc/doc/tm.texi| 6 ++ gcc/doc/tm.texi.in | 2 ++ gcc/emit-rtl.cc| 26 +- gcc/expmed.cc | 2 +- gcc/optabs.cc | 12 +++- gcc/target.def | 10 ++ gcc/targhooks.cc | 27 +++ gcc/targhooks.h| 2 ++ 8 files changed, 64 insertions(+), 23 deletions(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 87997b76338..b73147aea9f 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4605,6 +4605,12 @@ to return a nonzero value when it is required, the compiler will run out of spill registers and print a fatal error message. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_GEN_RTX_COMPLEX (machine_mode @var{mode}, rtx @var{real_part}, rtx @var{imag_part}) +This hook should return an rtx representing a complex of mode @var{machine_mode} built from @var{real_part} and @var{imag_part}. + If both arguments are @code{NULL}, create them as registers. + The default is @code{gen_rtx_CONCAT}. +@end deftypefn + @deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part}) This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}. @var{part} can be the real part, the imaginary part, or both of them. diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index efbf972e6a7..dd39e450903 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3390,6 +3390,8 @@ stack. @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P +@hook TARGET_GEN_RTX_COMPLEX + @hook TARGET_READ_COMPLEX_PART @hook TARGET_WRITE_COMPLEX_PART diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc index f6276a2d0b6..22012bfea13 100644 --- a/gcc/emit-rtl.cc +++ b/gcc/emit-rtl.cc @@ -1190,19 +1190,7 @@ gen_reg_rtx (machine_mode mode) if (generating_concat_p && (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)) -{ - /* For complex modes, don't make a single pseudo. -Instead, make a CONCAT of two pseudos. -This allows noncontiguous allocation of the real and imaginary parts, -which makes much better code. Besides, allocating DCmode -pseudos overstrains reload on some machines like the 386. */ - rtx realpart, imagpart; - machine_mode partmode = GET_MODE_INNER (mode); - - realpart = gen_reg_rtx (partmode); - imagpart = gen_reg_rtx (partmode); - return gen_rtx_CONCAT (mode, realpart, imagpart); -} +return targetm.gen_rtx_complex (mode, NULL, NULL); /* Do not call gen_reg_rtx with uninitialized crtl. */ gcc_assert (crtl->emit.regno_pointer_align_length); @@ -6274,14 +6262,18 @@ init_emit_once (void) FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_INT) { - rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)]; - const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner); + machine_mode imode = GET_MODE_INNER (mode); + rtx inner = const_tiny_rtx[0][(int) imode]; + const_tiny_rtx[0][(int) mode] = + targetm.gen_rtx_complex (mode, inner, inner); } FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_FLOAT) { - rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)]; - const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner); + machine_mode imode = GET_MODE_INNER (mode); + rtx inner = const_tiny_rtx[0][(int) imode]; + const_tiny_rtx[0][(int) mode] = + targetm.gen_rtx_complex (mode, inner, inner); } FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL) diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 2f787cc28f9..8a18161827b 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -400,7 +400,7 @@ flip_storage_order (machine_mode mode, rtx x) real = flip_storage_order (GET_MODE_INNER (mode), real); imag = flip_storage_order (GET_MODE_INNER (mode), imag); - return gen_rtx_CONCAT (mode, real, imag); + return targetm.gen_rtx_complex (mode, real, imag); } if (UNLIKELY (reverse_storage_order_supported < 0)) diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 4e9f58f8060..18900e8113e 1006
[PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl
Support registers of complex types in rtl. Also adapt the functions called during the expand pass to support native complex operations. gcc/ChangeLog: * explow.cc (trunc_int_for_mode): Allow complex int modes * expr.cc (emit_move_complex_parts): Move both parts at the same time if it is supported by the backend (emit_move_complex): Do not move via integer if not int mode corresponds. For complex floats, relax the constraint on the number of registers for targets with pairs of registers, and use native moves if it is supported by the backend. (expand_expr_real_2): Move both parts at the same time if it is supported by the backend (expand_expr_real_1): Update the expand of complex constants (const_vector_from_tree): Add the expand of both parts of a complex constant * real.h: update FLOAT_MODE_FORMAT * machmode.h: Add COMPLEX_INT_MODE_P and COMPLEX_FLOAT_MODE_P predicates * optabs-libfuncs.cc (gen_int_libfunc): Add support for complex modes (gen_intv_fp_libfunc): Likewise * recog.cc (general_operand): Likewise --- gcc/explow.cc | 2 +- gcc/expr.cc| 84 -- gcc/machmode.h | 6 +++ gcc/optabs-libfuncs.cc | 29 --- gcc/real.h | 3 +- gcc/recog.cc | 1 + 6 files changed, 105 insertions(+), 20 deletions(-) diff --git a/gcc/explow.cc b/gcc/explow.cc index 6424c0802f0..48572a40eab 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -56,7 +56,7 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode) int width = GET_MODE_PRECISION (smode); /* You want to truncate to a _what_? */ - gcc_assert (SCALAR_INT_MODE_P (mode)); + gcc_assert (SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode)); /* Canonicalize BImode to 0 and STORE_FLAG_VALUE. */ if (smode == BImode) diff --git a/gcc/expr.cc b/gcc/expr.cc index e1a0892b4d9..e94de8a05b5 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -3847,8 +3847,14 @@ emit_move_complex_parts (rtx x, rtx y) && REG_P (x) && !reg_overlap_mentioned_p (x, y)) emit_clobber (x); - write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true); - write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false); + machine_mode mode = GET_MODE (x); + if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) +write_complex_part (x, read_complex_part (y, BOTH_P), BOTH_P, false); + else +{ + write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true); + write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false); +} return get_last_insn (); } @@ -3868,14 +3874,14 @@ emit_move_complex (machine_mode mode, rtx x, rtx y) /* See if we can coerce the target into moving both values at once, except for floating point where we favor moving as parts if this is easy. */ - if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT + scalar_int_mode imode; + if (!int_mode_for_mode (mode).exists (&imode)) +try_int = false; + else if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing - && !(REG_P (x) - && HARD_REGISTER_P (x) - && REG_NREGS (x) == 1) - && !(REG_P (y) - && HARD_REGISTER_P (y) - && REG_NREGS (y) == 1)) + && optab_handler (mov_optab, mode) != CODE_FOR_nothing + && !(REG_P (x) && HARD_REGISTER_P (x)) + && !(REG_P (y) && HARD_REGISTER_P (y))) try_int = false; /* Not possible if the values are inherently not adjacent. */ else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT) @@ -10246,9 +10252,14 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, break; } - /* Move the real (op0) and imaginary (op1) parts to their location. */ - write_complex_part (target, op0, REAL_P, true); - write_complex_part (target, op1, IMAG_P, false); + if ((op0 == op1) && (GET_CODE (op0) == CONST_VECTOR)) + write_complex_part (target, op0, BOTH_P, false); + else + { + /* Move the real (op0) and imaginary (op1) parts to their location. */ + write_complex_part (target, op0, REAL_P, true); + write_complex_part (target, op1, IMAG_P, false); + } return target; @@ -11001,6 +11012,51 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, return original_target; } + else if (original_target && (GET_CODE (original_target) == REG) + && + ((GET_MODE_CLASS (GET_MODE (original_target)) == +MODE_COMPLEX_INT) + || (GET_MODE_CLASS (GET_MODE (original_target)) == + MODE_COMPLEX_FLOAT))) + { + mode = TYPE_MODE (TREE_TYPE (exp)); + + /* Move both parts at the same time if possible */ + if (T
[PATCH 6/9] Native complex operations: Update how complex rotations are handled
Catch complex rotation by 90° and 270° in fold-const.cc like before, but now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270 internal functions. Also add crot90 and crot270 optabs to expose these operation the backends. So conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270 by checking if crot90/crot270 are in the optab. Finally, convert a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs. gcc/ChangeLog: * internal-fn.def: Add COMPLEX_ROT90 and COMPLEX_ROT270 * fold-const.cc (fold_binary_loc): Update the folding of complex rotations to generate called to COMPLEX_ROT90 and COMPLEX_ROT270 * optabs.def: add crot90/crot270 optabs * tree-complex.cc (init_dont_simulate_again): Catch calls to COMPLEX_ROT90 and COMPLEX_ROT270 (expand_complex_rotation): Conditionally lower complex rotations if no pattern is present in the backend (expand_complex_operations_1): Likewise (convert_crot): Likewise * tree-ssa-math-opts.cc (convert_crot_1): Catch complex rotations with additions in a similar way the FMAs. (math_opts_dom_walker::after_dom_children): Call convert_crot if a COMPLEX_ROT90 or COMPLEX_ROT270 is identified --- gcc/fold-const.cc | 115 ++--- gcc/internal-fn.def | 2 + gcc/optabs.def| 2 + gcc/tree-complex.cc | 79 ++- gcc/tree-ssa-math-opts.cc | 129 ++ 5 files changed, 302 insertions(+), 25 deletions(-) diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index a02ede79fed..f1224b6a548 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -11609,30 +11609,6 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type, } else { - /* Fold z * +-I to __complex__ (-+__imag z, +-__real z). -This is not the same for NaNs or if signed zeros are -involved. */ - if (!HONOR_NANS (arg0) - && !HONOR_SIGNED_ZEROS (arg0) - && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0)) - && TREE_CODE (arg1) == COMPLEX_CST - && real_zerop (TREE_REALPART (arg1))) - { - tree rtype = TREE_TYPE (TREE_TYPE (arg0)); - if (real_onep (TREE_IMAGPART (arg1))) - return - fold_build2_loc (loc, COMPLEX_EXPR, type, - negate_expr (fold_build1_loc (loc, IMAGPART_EXPR, -rtype, arg0)), - fold_build1_loc (loc, REALPART_EXPR, rtype, arg0)); - else if (real_minus_onep (TREE_IMAGPART (arg1))) - return - fold_build2_loc (loc, COMPLEX_EXPR, type, - fold_build1_loc (loc, IMAGPART_EXPR, rtype, arg0), - negate_expr (fold_build1_loc (loc, REALPART_EXPR, -rtype, arg0))); - } - /* Optimize z * conj(z) for floating point complex numbers. Guarded by flag_unsafe_math_optimizations as non-finite imaginary components don't produce scalar results. */ @@ -11645,6 +11621,97 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type, && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0)) return fold_mult_zconjz (loc, type, arg0); } + + /* Fold z * +-I to __complex__ (-+__imag z, +-__real z). +This is not the same for NaNs or if signed zeros are +involved. */ + if (!HONOR_NANS (arg0) + && !HONOR_SIGNED_ZEROS (arg0) + && TREE_CODE (arg1) == COMPLEX_CST + && (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0)) + && real_zerop (TREE_REALPART (arg1 + { + if (real_onep (TREE_IMAGPART (arg1))) + { + tree rtype = TREE_TYPE (TREE_TYPE (arg0)); + tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type, +negate_expr (fold_build1_loc (loc, IMAGPART_EXPR, + rtype, arg0)), + fold_build1_loc (loc, REALPART_EXPR, rtype, arg0)); + if (cplx_build && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR) + return cplx_build; + + if ((TREE_CODE (arg0) == COMPLEX_EXPR) && real_zerop (TREE_OPERAND (arg0, 1))) + return fold_build2_loc (loc, COMPLEX_EXPR, type, + TREE_OPERAND (arg0, 1), TREE_OPERAND (arg0, 0)); + + if (TREE_CODE (arg0) == CALL_EXPR) + { + if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90) + return negate_expr (CALL_EXPR_ARG (arg0, 0)); + else if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX
[PATCH 8/9] Native complex operations: Add explicit vector of complex
Allow the creation and usage of builtins vectors of complex in C, using __attribute__ ((vector_size ())) gcc/c-family/ChangeLog: * c-attribs.cc (vector_mode_valid_p): Add cases for vectors of complex (handle_mode_attribute): Likewise (type_valid_for_vector_size): Likewise * c-common.cc (c_common_type_for_mode): Likewise (vector_types_compatible_elements_p): Likewise gcc/ChangeLog: * fold-const.cc (fold_binary_loc): Likewise gcc/c/ChangeLog: * c-typeck.cc (build_unary_op): Likewise --- gcc/c-family/c-attribs.cc | 12 ++-- gcc/c-family/c-common.cc | 20 +++- gcc/c/c-typeck.cc | 8 ++-- gcc/fold-const.cc | 1 + 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index e2792ca6898..d4de85160c1 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -2019,6 +2019,8 @@ vector_mode_valid_p (machine_mode mode) /* Doh! What's going on? */ if (mclass != MODE_VECTOR_INT && mclass != MODE_VECTOR_FLOAT + && mclass != MODE_VECTOR_COMPLEX_INT + && mclass != MODE_VECTOR_COMPLEX_FLOAT && mclass != MODE_VECTOR_FRACT && mclass != MODE_VECTOR_UFRACT && mclass != MODE_VECTOR_ACCUM @@ -2125,6 +2127,8 @@ handle_mode_attribute (tree *node, tree name, tree args, case MODE_VECTOR_INT: case MODE_VECTOR_FLOAT: + case MODE_VECTOR_COMPLEX_INT: + case MODE_VECTOR_COMPLEX_FLOAT: case MODE_VECTOR_FRACT: case MODE_VECTOR_UFRACT: case MODE_VECTOR_ACCUM: @@ -4361,9 +4365,13 @@ type_valid_for_vector_size (tree type, tree atname, tree args, if ((!INTEGRAL_TYPE_P (type) && !SCALAR_FLOAT_TYPE_P (type) + && !COMPLEX_INTEGER_TYPE_P (type) + && !COMPLEX_FLOAT_TYPE_P (type) && !FIXED_POINT_TYPE_P (type)) - || (!SCALAR_FLOAT_MODE_P (orig_mode) - && GET_MODE_CLASS (orig_mode) != MODE_INT + || ((!SCALAR_FLOAT_MODE_P (orig_mode) + && GET_MODE_CLASS (orig_mode) != MODE_INT) + && (!COMPLEX_FLOAT_MODE_P (orig_mode) + && GET_MODE_CLASS (orig_mode) != MODE_COMPLEX_INT) && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode)) || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)) || TREE_CODE (type) == BOOLEAN_TYPE) diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc index 6ab63dae997..9574c074d26 100644 --- a/gcc/c-family/c-common.cc +++ b/gcc/c-family/c-common.cc @@ -2430,7 +2430,23 @@ c_common_type_for_mode (machine_mode mode, int unsignedp) : make_signed_type (precision)); } - if (COMPLEX_MODE_P (mode)) + if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL + && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) +{ + unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode), + GET_MODE_NUNITS (mode)); + tree bool_type = build_nonstandard_boolean_type (elem_bits); + return build_vector_type_for_mode (bool_type, mode); +} + else if (VECTOR_MODE_P (mode) + && valid_vector_subparts_p (GET_MODE_NUNITS (mode))) +{ + machine_mode inner_mode = GET_MODE_INNER (mode); + tree inner_type = c_common_type_for_mode (inner_mode, unsignedp); + if (inner_type != NULL_TREE) + return build_vector_type_for_mode (inner_type, mode); +} + else if (COMPLEX_MODE_P (mode)) { machine_mode inner_mode; tree inner_type; @@ -8104,9 +8120,11 @@ vector_types_compatible_elements_p (tree t1, tree t2) gcc_assert ((INTEGRAL_TYPE_P (t1) || c1 == REAL_TYPE + || c1 == COMPLEX_TYPE || c1 == FIXED_POINT_TYPE) && (INTEGRAL_TYPE_P (t2) || c2 == REAL_TYPE + || c2 == COMPLEX_TYPE || c2 == FIXED_POINT_TYPE)); t1 = c_common_signed_type (t1); diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 7cf411155c6..68a9646cf5b 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -4584,7 +4584,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg, /* ~ works on integer types and non float vectors. */ if (typecode == INTEGER_TYPE || (gnu_vector_type_p (TREE_TYPE (arg)) - && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg + && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg)) + && !COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg))) + && !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg) { tree e = arg; @@ -4607,7 +4609,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg, if (!noconvert) arg = default_conversion (arg); } - else if (typecode == COMPLEX_TYPE) + else if (typecode == COMPLEX_TYPE + || COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg))) + || COMPL
[PATCH 5/9] Native complex operations: Add the conjugate op in optabs
Add an optab and rtl operation for the conjugate, called conj, to expand CONJ_EXPR. gcc/ChangeLog: * rtl.def: Add a conj operation in rtl * optabs.def: Add a conj optab * optabs-tree.cc (optab_for_tree_code): use the conj_optab to convert a CONJ_EXPR * expr.cc (expand_expr_real_2): Add a case to expand native CONJ_EXPR (expand_expr_real_1): Likewise --- gcc/expr.cc| 17 - gcc/optabs-tree.cc | 3 +++ gcc/optabs.def | 3 +++ gcc/rtl.def| 3 +++ 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/gcc/expr.cc b/gcc/expr.cc index e94de8a05b5..be153be0b71 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -10498,6 +10498,18 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode, return dst; } +case CONJ_EXPR: + op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); + if (modifier == EXPAND_STACK_PARM) + target = 0; + temp = expand_unop (mode, + optab_for_tree_code (CONJ_EXPR, type, + optab_default), + op0, target, 0); + gcc_assert (temp); + return REDUCE_BIT_FIELD (temp); + + default: gcc_unreachable (); } @@ -12064,6 +12076,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, op0 = expand_normal (treeop0); return read_complex_part (op0, IMAG_P); +case CONJ_EXPR: + op0 = expand_normal (treeop0); + return op0; + case RETURN_EXPR: case LABEL_EXPR: case GOTO_EXPR: @@ -12087,7 +12103,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, case VA_ARG_EXPR: case BIND_EXPR: case INIT_EXPR: -case CONJ_EXPR: case COMPOUND_EXPR: case PREINCREMENT_EXPR: case PREDECREMENT_EXPR: diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc index e6ae15939d3..c646b3667d4 100644 --- a/gcc/optabs-tree.cc +++ b/gcc/optabs-tree.cc @@ -271,6 +271,9 @@ optab_for_tree_code (enum tree_code code, const_tree type, return TYPE_UNSIGNED (type) ? usneg_optab : ssneg_optab; return trapv ? negv_optab : neg_optab; +case CONJ_EXPR: + return conj_optab; + case ABS_EXPR: return trapv ? absv_optab : abs_optab; diff --git a/gcc/optabs.def b/gcc/optabs.def index 3dae228fba6..31475c8afcc 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -160,6 +160,9 @@ OPTAB_NL(umax_optab, "umax$I$a3", UMAX, "umax", '3', gen_int_libfunc) OPTAB_NL(neg_optab, "neg$P$a2", NEG, "neg", '2', gen_int_fp_fixed_libfunc) OPTAB_NX(neg_optab, "neg$F$a2") OPTAB_NX(neg_optab, "neg$Q$a2") +OPTAB_NL(conj_optab, "conj$P$a2", CONJ, "conj", '2', gen_int_fp_fixed_libfunc) +OPTAB_NX(conj_optab, "conj$F$a2") +OPTAB_NX(conj_optab, "conj$Q$a2") OPTAB_VL(negv_optab, "negv$I$a2", NEG, "neg", '2', gen_intv_fp_libfunc) OPTAB_VX(negv_optab, "neg$F$a2") OPTAB_NL(ssneg_optab, "ssneg$Q$a2", SS_NEG, "ssneg", '2', gen_signed_fixed_libfunc) diff --git a/gcc/rtl.def b/gcc/rtl.def index 88e2b198503..4280f727286 100644 --- a/gcc/rtl.def +++ b/gcc/rtl.def @@ -460,6 +460,9 @@ DEF_RTL_EXPR(MINUS, "minus", "ee", RTX_BIN_ARITH) /* Minus operand 0. */ DEF_RTL_EXPR(NEG, "neg", "e", RTX_UNARY) +/* Conj operand 0 */ +DEF_RTL_EXPR(CONJ, "conj", "e", RTX_UNARY) + DEF_RTL_EXPR(MULT, "mult", "ee", RTX_COMM_ARITH) /* Multiplication with signed saturation */ -- 2.17.1
[PATCH 7/9] Native complex operations: Vectorization of native complex operations
Add vectors of complex types to vectorize native operations. Because of the vectorize was designed to work with scalar elements, several functions and target hooks have to be adapted or duplicated to support complex types. After that, the vectorization of native complex operations follows exactly the same flow as scalars operations. gcc/ChangeLog: * target.def: Add preferred_simd_mode_complex and related_mode_complex by duplicating their scalar counterparts * targhooks.h: Add default_preferred_simd_mode_complex and default_vectorize_related_mode_complex * targhooks.cc (default_preferred_simd_mode_complex): New: Default implementation of preferred_simd_mode_complex (default_vectorize_related_mode_complex): New: Default implementation of related_mode_complex * doc/tm.texi: Document TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX and TARGET_VECTORIZE_RELATED_MODE_COMPLEX * doc/tm.texi.in: Add TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX and TARGET_VECTORIZE_RELATED_MODE_COMPLEX * emit-rtl.cc (init_emit_once): Add the zero constant for vectors of complex modes * genmodes.cc (vector_class): Add case for vectors of complex (complete_mode): Likewise (make_complex_modes): Likewise * gensupport.cc (match_pattern): Likewise * machmode.h: Add vectors of complex in predicates and redefine mode_for_vector and related_vector_mode for complex types * mode-classes.def: Add MODE_VECTOR_COMPLEX_INT and MODE_VECTOR_COMPLEX_FLOAT classes * simplify-rtx.cc (simplify_context::simplify_binary_operation): FIXME: do not simplify binary operations with complex vector modes. * stor-layout.cc (mode_for_vector): Adapt for complex modes using sub-functions calling a common one (related_vector_mode): Implement the function for complex modes * tree-vect-generic.cc (type_for_widest_vector_mode): Add cases for complex modes * tree-vect-stmts.cc (get_related_vectype_for_scalar_type): Adapt for complex modes * tree.cc (build_vector_type_for_mode): Add cases for complex modes --- gcc/doc/tm.texi | 31 gcc/doc/tm.texi.in | 4 gcc/emit-rtl.cc | 10 gcc/genmodes.cc | 8 +++ gcc/gensupport.cc| 3 +++ gcc/machmode.h | 19 +++ gcc/mode-classes.def | 2 ++ gcc/simplify-rtx.cc | 4 gcc/stor-layout.cc | 43 + gcc/target.def | 39 ++ gcc/targhooks.cc | 29 ++ gcc/targhooks.h | 4 gcc/tree-vect-generic.cc | 4 gcc/tree-vect-stmts.cc | 52 +++- gcc/tree.cc | 2 ++ 15 files changed, 230 insertions(+), 24 deletions(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b73147aea9f..955a1f983d0 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6229,6 +6229,13 @@ equal to @code{word_mode}, because the vectorizer can do some transformations even in absence of specialized @acronym{SIMD} hardware. @end deftypefn +@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX (complex_mode @var{mode}) +This hook should return the preferred mode for vectorizing complex +mode @var{mode}. The default is +equal to @code{word_mode}, because the vectorizer can do some +transformations even in absence of specialized @acronym{SIMD} hardware. +@end deftypefn + @deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_SPLIT_REDUCTION (machine_mode) This hook should return the preferred mode to split the final reduction step on @var{mode} to. The reduction is then carried out reducing upper @@ -6291,6 +6298,30 @@ requested mode, returning a mode with the same size as @var{vector_mode} when @var{nunits} is zero. This is the correct behavior for most targets. @end deftypefn +@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_RELATED_MODE_COMPLEX (machine_mode @var{vector_mode}, complex_mode @var{element_mode}, poly_uint64 @var{nunits}) +If a piece of code is using vector mode @var{vector_mode} and also wants +to operate on elements of mode @var{element_mode}, return the vector mode +it should use for those elements. If @var{nunits} is nonzero, ensure that +the mode has exactly @var{nunits} elements, otherwise pick whichever vector +size pairs the most naturally with @var{vector_mode}. Return an empty +@code{opt_machine_mode} if there is no supported vector mode with the +required properties. + +There is no prescribed way of handling the case in which @var{nunits} +is zero. One common choice is to pick a vector mode with the same size +as @var{vector_mode}; this is the natural choice if the target has a +fixed vector size. Another option is to choose
Re: [WIP RFC] Add support for keyword-based attributes
On Mon, Jul 17, 2023 at 10:21 AM Richard Sandiford wrote: > > Richard Biener writes: > > On Fri, Jul 14, 2023 at 5:58 PM Richard Sandiford via Gcc-patches > > wrote: > >> > >> Summary: We'd like to be able to specify some attributes using > >> keywords, rather than the traditional __attribute__ or [[...]] > >> syntax. Would that be OK? > >> > >> In more detail: > >> > >> We'd like to add some new target-specific attributes for Arm SME. > >> These attributes affect semantics and code generation and so they > >> can't simply be ignored. > >> > >> Traditionally we've done this kind of thing by adding GNU attributes, > >> via TARGET_ATTRIBUTE_TABLE in GCC's case. The problem is that both > >> GCC and Clang have traditionally only warned about unrecognised GNU > >> attributes, rather than raising an error. Older compilers might > >> therefore be able to look past some uses of the new attributes and > >> still produce object code, even though that object code is almost > >> certainly going to be wrong. (The compilers will also emit a default-on > >> warning, but that might go unnoticed when building a big project.) > >> > >> There are some existing attributes that similarly affect semantics > >> in ways that cannot be ignored. vector_size is one obvious example. > >> But that doesn't make it a good thing. :) > >> > >> Also, C++ says this for standard [[...]] attributes: > >> > >> For an attribute-token (including an attribute-scoped-token) > >> not specified in this document, the behavior is implementation-defined; > >> any such attribute-token that is not recognized by the implementation > >> is ignored. > >> > >> which doubles down on the idea that attributes should not be used > >> for necessary semantic information. > >> > >> One of the attributes we'd like to add provides a new way of compiling > >> existing code. The attribute doesn't require SME to be available; > >> it just says that the code must be compiled so that it can run in either > >> of two modes. This is probably the most dangerous attribute of the set, > >> since compilers that ignore it would just produce normal code. That > >> code might work in some test scenarios, but it would fail in others. > >> > >> The feeling from the Clang community was therefore that these SME > >> attributes should use keywords instead, so that the keywords trigger > >> an error with older compilers. > >> > >> However, it seemed wrong to define new SME-specific grammar rules, > >> since the underlying problem is pretty generic. We therefore > >> proposed having a type of keyword that can appear exactly where > >> a standard [[...]] attribute can appear and that appertains to > >> exactly what a standard [[...]] attribute would appertain to. > >> No divergence or cherry-picking is allowed. > >> > >> For example: > >> > >> [[arm::foo]] > >> > >> would become: > >> > >> __arm_foo > >> > >> and: > >> > >> [[arm::bar(args)]] > >> > >> would become: > >> > >> __arm_bar(args) > >> > >> It wouldn't be possible to retrofit arguments to a keyword that > >> previously didn't take arguments, since that could lead to parsing > >> ambiguities. So when a keyword is first added, a binding decision > >> would need to be made whether the keyword always takes arguments > >> or is always standalone. > >> > >> For that reason, empty argument lists are allowed for keywords, > >> even though they're not allowed for [[...]] attributes. > >> > >> The argument-less version was accepted into Clang, and I have a follow-on > >> patch for handling arguments. Would the same thing be OK for GCC, > >> in both the C and C++ frontends? > >> > >> The patch below is a proof of concept for the C frontend. It doesn't > >> bootstrap due to warnings about uninitialised fields. And it doesn't > >> have tests. But I did test it locally with various combinations of > >> attribute_spec and it seemed to work as expected. > >> > >> The impact on the C frontend seems to be pretty small. It looks like > >> the impact on the C++ frontend would be a bit bigger, but not much. > >> > >> The patch contains a logically unrelated change: c-common.h set aside > >> 16 keywords for address spaces, but of the in-tree ports, the maximum > >> number of keywords used is 6 (for amdgcn). The patch therefore changes > >> the limit to 8 and uses 8 keywords for the new attributes. This keeps > >> the number of reserved ids <= 256. > > > > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only > > need one additional keyword - we could set aside a similar one for each > > target then. I realize that double-nesting of arguments might prove a bit > > challenging but still. > > Yeah, that would work. > > > In any case I also think that attributes are what you want and their > > ugliness/issues are not worse than the ugliness/issues of the keyword > > approach IMHO. > > I guess the ugliness of keywords is the double underscore? > What are the issues with the keyword approach thou
[PATCH 9/9] Native complex operation: Experimental support in x86 backend
Add an experimental support for native complex operation handling in the x86 backend. For now it only support add, sub, mul, conj, neg, mov in SCmode (complex float). Performance gains are still marginal on this target because there are no particular instructions to speedup complex operation, except some SIMD tricks. gcc/ChangeLog: * config/i386/i386.cc (classify_argument): Align complex element to the whole size, not size of the parts (ix86_return_in_memory): Handle complex modes like a scalar with the same size (ix86_class_max_nregs): Likewise (ix86_hard_regno_nregs): Likewise (function_value_ms_64): Add case for SCmode (ix86_build_const_vector): Likewise (ix86_build_signbit_mask): Likewise (x86_gen_rtx_complex): New: Implement the gen_rtx_complex hook, use registers of complex modes to represent complex elements in rtl (x86_read_complex_part): New: Implement the read_complex_part hook, handle registers of complex modes (x86_write_complex_part): New: Implement the write_complex_part hook, handle registers of complex modes * config/i386/i386.h: Add SCmode in several predicates * config/i386/sse.md: Add pattern for some complex operations in SCmode. This includes movsc, addsc3, subsc3, negsc2, mulsc3, and conjsc2 --- gcc/config/i386/i386.cc | 296 +++- gcc/config/i386/i386.h | 11 +- gcc/config/i386/sse.md | 144 +++ 3 files changed, 440 insertions(+), 11 deletions(-) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index f0d6167e667..a65ac92a4a9 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -2339,8 +2339,8 @@ classify_argument (machine_mode mode, const_tree type, mode_alignment = 128; else if (mode == XCmode) mode_alignment = 256; - if (COMPLEX_MODE_P (mode)) - mode_alignment /= 2; + /*if (COMPLEX_MODE_P (mode)) + mode_alignment /= 2;*/ /* Misaligned fields are always returned in memory. */ if (bit_offset % mode_alignment) return 0; @@ -3007,6 +3007,7 @@ pass_in_reg: case E_V4BFmode: case E_V2SImode: case E_V2SFmode: +case E_SCmode: case E_V1TImode: case E_V1DImode: if (!type || !AGGREGATE_TYPE_P (type)) @@ -3257,6 +3258,7 @@ pass_in_reg: case E_V4BFmode: case E_V2SImode: case E_V2SFmode: +case E_SCmode: case E_V1TImode: case E_V1DImode: if (!type || !AGGREGATE_TYPE_P (type)) @@ -4158,8 +4160,8 @@ function_value_ms_64 (machine_mode orig_mode, machine_mode mode, && !INTEGRAL_TYPE_P (valtype) && !VECTOR_FLOAT_TYPE_P (valtype)) break; - if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)) - && !COMPLEX_MODE_P (mode)) + if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))) +// && !COMPLEX_MODE_P (mode)) regno = FIRST_SSE_REG; break; case 8: @@ -4266,7 +4268,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED) || INTEGRAL_TYPE_P (type) || VECTOR_FLOAT_TYPE_P (type)) && (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)) - && !COMPLEX_MODE_P (mode) + //&& !COMPLEX_MODE_P (mode) && (GET_MODE_SIZE (mode) == 16 || size == 16)) return false; @@ -15722,6 +15724,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value) case E_V8SFmode: case E_V4SFmode: case E_V2SFmode: +case E_SCmode: case E_V8DFmode: case E_V4DFmode: case E_V2DFmode: @@ -15770,6 +15773,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert) case E_V8SFmode: case E_V4SFmode: case E_V2SFmode: +case E_SCmode: case E_V2SImode: vec_mode = mode; imode = SImode; @@ -19821,7 +19825,8 @@ ix86_class_max_nregs (reg_class_t rclass, machine_mode mode) else { if (COMPLEX_MODE_P (mode)) - return 2; + return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD); + //return 2; else return 1; } @@ -20157,7 +20162,8 @@ ix86_hard_regno_nregs (unsigned int regno, machine_mode mode) return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD); } if (COMPLEX_MODE_P (mode)) -return 2; +return 1; +//return 2; /* Register pair for mask registers. */ if (mode == P2QImode || mode == P2HImode) return 2; @@ -23613,6 +23619,273 @@ ix86_preferred_simd_mode (scalar_mode mode) } } +static rtx +x86_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part) +{ + machine_mode imode = GET_MODE_INNER (mode); + + if ((real_part == imag_part) && (real_part == CONST0_RTX (imode))) +{ + if (CONST_DOUBLE_P (real_part)) + return const_double_from_real_value (dcons
Re: Loop-ch improvements, part 3
On Fri, 14 Jul 2023, Jan Hubicka wrote: > Hi, > loop-ch currently does analysis using ranger for all loops to identify > candidates and then follows by phase where headers are duplicated (which > breaks SSA and ranger). The second stage does more analysis (to see how > many BBs we want to duplicate) but can't use ranger and thus misses > information about static conditionals. > > This patch pushes all analysis into the first stage. We record how many > BBs to duplicate and the second stage just duplicats as it is told so. > This makes it possible to also extend range query done also to basic > blocks that are not headers. This is easy to do, since we already do > path specific query so we only need to extend the path by headers we > decided to dulicate earlier. > > This makes it possible to track situations where exit that is always > false in the first iteration for tests not in the original loop header. > Doing so lets us to update profile better and do better heuristics. In > particular I changed logic as follows > 1) should_duplicate_loop_header_p counts size of duplicated region. When we > know that a given conditional will be constant true or constant false > either > in the duplicated region, by range query, or in the loop body after > duplication (since it is loop invariant), we do not account it to code > size > costs > 2) don't need account loop invariant compuations that will be duplicated > as they will become fully invariant > (maybe we want to have some cap for register pressure eventually?) > 3) optimize_size logic is now different. Originally we started duplicating > iff the first conditional was known to be true by ranger query, but then > we used same limits as for -O2. > > I now simply lower limits to 0. This means that every conditional > in duplicated sequence must be either loop invariant or constant when > duplicated and we only duplicate statements computing loop invariants > and those we account to 0 size anyway, > > This makes code IMO more streamlined (and hopefully will let us to merge > ibts with loop peeling logic), but makes little difference in practice. > The problem is that in loop: > > void test2(); > void test(int n) > { > for (int i = 0; n && i < 10; i++) > test2(); > } > > We produce: >[local count: 1073741824 freq: 9.090909]: > # i_4 = PHI <0(2), i_9(3)> > _1 = n_7(D) != 0; > _2 = i_4 <= 9; > _3 = _1 & _2; > if (_3 != 0) > goto ; [89.00%] > else > goto ; [11.00%] > > and do not understand that the final conditional is a combination of a > conditional > that is always true in first iteration and a conditional that is loop > invariant. > > This is also the case of > void test2(); > void test(int n) > { > for (int i = 0; n; i++) > { > if (i > 10) > break; > test2(); > } > } > Which we turn to the earlier case in ifcombine. > > With disabled ifcombine things however works as exepcted. This is something > I plan to handle incrementally. However extending loop-ch and peeling passes > to understand such combined conditionals is still not good enough: at the > time ifcombine > merged the two conditionals we lost profile information on how often n is 0, > so we can't recover correct profile or know what is expected number of > iterations > after the transofrm. > > Bootstrapped/regtested x86_64-linux, OK? OK. Thanks, Richard. > Honza > > > gcc/ChangeLog: > > * tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready > for queries not in headers. > (static_loop_exit): Add basic blck parameter; update use of > edge_range_query > (should_duplicate_loop_header_p): Add ranger and static_exits > parameter. Do not account statements that will be optimized > out after duplicaiton in overall size. Add ranger query to > find static exits. > (update_profile_after_ch): Take static_exits has set instead of > single eliminated_edge. > (ch_base::copy_headers): Do all analysis in the first pass; > remember invariant_exits and static_exits. > > diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc > index 24e7fbc805a..e0139cb432c 100644 > --- a/gcc/tree-ssa-loop-ch.cc > +++ b/gcc/tree-ssa-loop-ch.cc > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3. If not see > the range of the solved conditional in R. */ > > static void > -edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger) > +edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger > &ranger) > { > - auto_vec path (2); > - path.safe_push (e->dest); > - path.safe_push (e->src); > + auto_vec path; > + for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = > single_pred_edge (bb)->src) > +path.safe_push (bb); > + path.safe_push (loop->header); > + path.safe_push (loop_preheader_edge (loop)->src); >path_range_q
Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid
On Fri, 14 Jul 2023, Andrew MacLeod wrote: > > On 7/14/23 09:37, Richard Biener wrote: > > On Fri, 14 Jul 2023, Aldy Hernandez wrote: > > > >> I don't know what you're trying to accomplish here, as I haven't been > >> following the PR, but adding all these helper functions to the ranger > >> header > >> file seems wrong, especially since there's only one use of them. I see > >> you're > >> tweaking the irange API, adding helper functions to range-op (which is only > >> for code dealing with implementing range operators for tree codes), etc > >> etc. > >> > >> If you need these helper functions, I suggest you put them closer to their > >> uses (i.e. wherever the match.pd support machinery goes). > > Note I suggested the opposite beacuse I thought these kind of helpers > > are closer to value-range support than to match.pd. > > > probably vr-values.{cc.h} and the simply_using_ranges paradigm would be the > most sensible place to put these kinds of auxiliary routines? > > > > > > But I take away from your answer that there's nothing close in the > > value-range machinery that answers the question whether A op B may > > overflow? > > we dont track it in ranges themselves. During calculation of a range we > obviously know, but propagating that generally when we rarely care doesn't > seem worthwhile. The very first generation of irange 6 years ago had an > overflow_p() flag, but it was removed as not being worth keeping. easier > to simply ask the question when it matters > > As the routines show, it pretty easy to figure out when the need arises so I > think that should suffice. At least for now, > > Should we decide we would like it in general, it wouldnt be hard to add to > irange. wi_fold() cuurently returns null, it could easily return a bool > indicating if an overflow happened, and wi_fold_in_parts and fold_range would > simply OR the results all together of the compoent wi_fold() calls. It would > require updating/audfiting a number of range-op entries and adding an > overflowed_p() query to irange. Ah, yeah - the folding APIs would be a good fit I guess. I was also looking to have the "new" helpers to be somewhat consistent with the ranger API. So if we had a fold_range overload with either an output argument or a flag that makes it return false on possible overflow that would work I guess? Since we have a virtual class setup we might be able to provide a default failing method and implement workers for plus and mult (as needed for this patch) as the need arises? Thanks, Richard.
Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
On Fri, 14 Jul 2023, Andrew Pinski wrote: > On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches > wrote: > > > > The following makes sure that FP x > y ? x : y style max/min operations > > are if-converted at the GIMPLE level. While we can neither match > > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE > > than the ternary ?: operation we can make sure to maintain this form > > as a COND_EXPR so backends have the chance to match this to instructions > > their ISA offers. > > > > The patch does this in phiopt where we recognize min/max and instead > > of giving up when we have to honor NaNs we alter the generated code > > to a COND_EXPR. > > > > This resolves PR88540 and we can then SLP vectorize the min operation > > for its testcase. It also resolves part of the regressions observed > > with the change matching bit-inserts of bit-field-refs to vec_perm. > > > > Expansion from a COND_EXPR rather than from compare-and-branch > > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c > > by producing extra moves while the corresponding min/max operations > > are now already synthesized by RTL expansion, register selection > > isn't optimal. This can be also provoked without this change by > > altering the operand order in the source. > > > > It regresses gcc.target/i386/pr110170.c where we end up CSEing the > > condition which makes RTL expansion no longer produce the min/max > > directly and code generation is obfuscated enough to confuse > > RTL if-conversion. > > > > It also regresses gcc.target/i386/ssefp-[12].c where oddly one > > variant isn't if-converted and ix86_expand_fp_movcc doesn't > > match directly (the FP constants get expanded twice). A fix > > could be in emit_conditional_move where both prepare_cmp_insn > > and emit_conditional_move_1 force the constants to (different) > > registers. > > > > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > PR tree-optimization/88540 > > * tree-ssa-phiopt.cc (minmax_replacement): Do not give up > > with NaNs but handle the simple case by if-converting to a > > COND_EXPR. > > One thing which I was thinking about adding to phiopt is having the > last pass do the conversion to COND_EXPR if the target supports a > conditional move for that expression. That should fix this one right? > This was one of things I was working towards with the moving to use > match-and-simplify too. Note the if-conversion has to happen before BB SLP but the last phiopt is too late for this (yes, BB SLP could also be enhanced to handle conditionals and do if-conversion on-the-fly). For BB SLP there's also usually jump threading making a mess of same condition chain of if-convertible ops ... As for the min + max case that regresses due to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding _1 = _2 < _3; _4 = _1 ? _2 : _3; _5 = _1 ? _3 : _2; to something more clever would be appropriate anyway. We could adjust this to either duplicate _1 or expand the COND_EXPRs back to a single CFG diamond. I suppose force-duplicating non-vector compares of COND_EXPRs to make TER work again would fix similar regressions we might already observe (but I'm not aware of many COND_EXPR generators). Richard. > Thanks, > Andrew > > > > > * gcc.target/i386/pr88540.c: New testcase. > > * gcc.target/i386/pr54855-12.c: Adjust. > > * gcc.target/i386/pr54855-13.c: Likewise. > > --- > > gcc/testsuite/gcc.target/i386/pr54855-12.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr54855-13.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr88540.c| 10 ++ > > gcc/tree-ssa-phiopt.cc | 21 - > > 4 files changed, 28 insertions(+), 7 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c > > b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > index 2f8af392c83..09e8ab8ae39 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-options "-O2 -mavx512fp16" } */ > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ > > /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ > > /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } > > } } */ > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c > > b/gcc/testsuite/gcc.target/i386/pr54855-13.c > > index 87b4f459a5a..a4f25066f81 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-options "-O2 -mavx512fp16" } */ > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[
Re: [PATCH] riscv: Fix warning in riscv_regno_ok_for_index_p
pushed, thanks :) On Mon, Jul 17, 2023 at 4:59 PM Christoph Muellner wrote: > > From: Christoph Müllner > > The variable `regno` is currently not used in riscv_regno_ok_for_index_p(), > which triggers a compiler warning. Let's address this. > > Fixes: 423604278ed5 ("riscv: Prepare backend for index registers") > > Reported-by: Juzhe Zhong > Reported-by: Andreas Schwab > Signed-off-by: Christoph Müllner > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_regno_ok_for_index_p): > Remove parameter name from declaration of unused parameter. > > Signed-off-by: Christoph Müllner > --- > gcc/config/riscv/riscv.cc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 6ed735d6983..ae3c034e76e 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -861,7 +861,7 @@ riscv_index_reg_class () > but extensions might support that. */ > > int > -riscv_regno_ok_for_index_p (int regno) > +riscv_regno_ok_for_index_p (int) > { >return 0; > } > -- > 2.41.0 >
Re: [PATCH V2] RISC-V: Support non-SLP unordered reduction
LGTM, thanks :) On Mon, Jul 17, 2023 at 4:20 PM Juzhe-Zhong wrote: > > This patch add reduc_*_scal to support reduction auto-vectorization. > > Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. > > Consider this following case: > int __attribute__((noipa)) > and_loop (int32_t * __restrict x, > int32_t n, int res) > { > for (int i = 0; i < n; ++i) > res &= x[i]; > return res; > } > > ASM: > and_loop: > ble a1,zero,.L4 > vsetvli a3,zero,e32,m1,ta,ma > vmv.v.i v1,-1 > .L3: > vsetvli a5,a1,e32,m1,tu,ma > MUST BE "TU". > sllia4,a5,2 > sub a1,a1,a5 > vle32.v v2,0(a0) > add a0,a0,a4 > vand.vv v1,v2,v1 > bne a1,zero,.L3 > vsetivlizero,1,e32,m1,ta,ma > vmv.v.i v2,-1 > vsetvli a3,zero,e32,m1,ta,ma > vredand.vs v1,v1,v2 > vmv.x.s a5,v1 > and a0,a2,a5 > ret > .L4: > mv a0,a2 > ret > > Fix bug of VSETVL PASS which is caused by reduction testcase. > > SLP reduction and floating-point in-order reduction are not supported yet. > > gcc/ChangeLog: > > * config/riscv/autovec.md (reduc_plus_scal_): New pattern. > (reduc_smax_scal_): Ditto. > (reduc_umax_scal_): Ditto. > (reduc_smin_scal_): Ditto. > (reduc_umin_scal_): Ditto. > (reduc_and_scal_): Ditto. > (reduc_ior_scal_): Ditto. > (reduc_xor_scal_): Ditto. > * config/riscv/riscv-protos.h (enum insn_type): Add reduction. > (expand_reduction): New function. > * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto. > (emit_vlmax_fp_reduction_insn): Ditto. > (get_m1_mode): Ditto. > (expand_cond_len_binop): Fix name. > (expand_reduction): New function > * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG. > (validate_change_or_fail): New function. > (change_insn): Fix VSETVL BUG. > (change_vsetvl_insn): Ditto. > (pass_vsetvl::backward_demand_fusion): Ditto. > (pass_vsetvl::df_post_optimization): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/rvv.exp: Add reduction tests. > * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test. > > --- > gcc/config/riscv/autovec.md | 138 ++ > gcc/config/riscv/riscv-protos.h | 2 + > gcc/config/riscv/riscv-v.cc | 84 ++- > gcc/config/riscv/riscv-vsetvl.cc | 57 ++-- > .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++ > .../riscv/rvv/autovec/reduc/reduc-2.c | 129 > .../riscv/rvv/autovec/reduc/reduc-3.c | 65 + > .../riscv/rvv/autovec/reduc/reduc-4.c | 59 > .../riscv/rvv/autovec/reduc/reduc_run-1.c | 56 +++ > .../riscv/rvv/autovec/reduc/reduc_run-2.c | 79 ++ > .../riscv/rvv/autovec/reduc/reduc_run-3.c | 49 +++ > .../riscv/rvv/autovec/reduc/reduc_run-4.c | 66 + > gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + > 13 files changed, 887 insertions(+), 17 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > index 64a41bd7101..8cdec75bacf 100644 > --- a/gcc/config/riscv/autovec.md > +++ b/gcc/config/riscv/autovec.md > @@ -1554,3 +1554,141 @@ >riscv_vector::expand_cond_len_ternop (icode, operands); >DONE; > }) > + > +;; = > +;; == Reductions > +;; = > + > +;; - >
Re: [PATCH] Export value/mask known bits from IPA.
Hi Aldy, On Mon, Jul 17 2023, Aldy Hernandez wrote: > Currently IPA throws away the known 1 bits because VRP and irange have > traditionally only had a way of tracking known 0s (set_nonzero_bits). > With the ability to keep all the known bits in the irange, we can now > save this between passes. > > OK? > > gcc/ChangeLog: > > * ipa-prop.cc (ipcp_update_bits): Export value/mask known bits. OK, thanks. Martin > --- > gcc/ipa-prop.cc | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc > index d2b998f8af5..5d790ff1265 100644 > --- a/gcc/ipa-prop.cc > +++ b/gcc/ipa-prop.cc > @@ -5853,10 +5853,9 @@ ipcp_update_bits (struct cgraph_node *node, > ipcp_transformation *ts) > { > unsigned prec = TYPE_PRECISION (TREE_TYPE (ddef)); > signop sgn = TYPE_SIGN (TREE_TYPE (ddef)); > - > - wide_int nonzero_bits = wide_int::from (bits[i]->mask, prec, UNSIGNED) > - | wide_int::from (bits[i]->value, prec, sgn); > - set_nonzero_bits (ddef, nonzero_bits); > + wide_int mask = wide_int::from (bits[i]->mask, prec, UNSIGNED); > + wide_int value = wide_int::from (bits[i]->value, prec, sgn); > + set_bitmask (ddef, value, mask); > } >else > { > -- > 2.40.1
[PATCH] RISC-V: Ensure all implied extensions are included[PR110696]
Hi, This patch fix target/PR110696, recursively add all implied extensions. Best, Lehua PR target/110696 gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_subset_list::handle_implied_ext): recur add all implied extensions. (riscv_subset_list::check_implied_ext): Add new method. (riscv_subset_list::parse): Call checker check_implied_ext. * config/riscv/riscv-subset.h: Add new method. gcc/testsuite/ChangeLog: * gcc.target/riscv/attribute-20.c: New test. * gcc.target/riscv/pr110696.c: New test. --- gcc/common/config/riscv/riscv-common.cc | 33 +-- gcc/config/riscv/riscv-subset.h | 3 +- gcc/testsuite/gcc.target/riscv/attribute-20.c | 7 gcc/testsuite/gcc.target/riscv/pr110696.c | 7 4 files changed, 46 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c create mode 100644 gcc/testsuite/gcc.target/riscv/pr110696.c diff --git a/gcc/common/config/riscv/riscv-common.cc b/gcc/common/config/riscv/riscv-common.cc index 28c8f0c1489..19075c0b241 100644 --- a/gcc/common/config/riscv/riscv-common.cc +++ b/gcc/common/config/riscv/riscv-common.cc @@ -949,14 +949,14 @@ riscv_subset_list::parse_std_ext (const char *p) /* Check any implied extensions for EXT. */ void -riscv_subset_list::handle_implied_ext (riscv_subset_t *ext) +riscv_subset_list::handle_implied_ext (const char *ext) { const riscv_implied_info_t *implied_info; for (implied_info = &riscv_implied_info[0]; implied_info->ext; ++implied_info) { - if (strcmp (ext->name.c_str (), implied_info->ext) != 0) + if (strcmp (ext, implied_info->ext) != 0) continue; /* Skip if implied extension already present. */ @@ -966,6 +966,9 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext) /* Version of implied extension will get from current ISA spec version. */ add (implied_info->implied_ext, true); + + /* Recursively add implied extension by implied_info->implied_ext. */ + handle_implied_ext (implied_info->implied_ext); } /* For RISC-V ISA version 2.2 or earlier version, zicsr and zifence is @@ -980,6 +983,27 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t *ext) } } +/* Check that all implied extensions are included. */ +bool +riscv_subset_list::check_implied_ext () +{ + riscv_subset_t *itr; + for (itr = m_head; itr != NULL; itr = itr->next) +{ + const riscv_implied_info_t *implied_info; + for (implied_info = &riscv_implied_info[0]; implied_info->ext; + ++implied_info) + { + if (strcmp (itr->name.c_str(), implied_info->ext) != 0) + continue; + + if (!lookup (implied_info->implied_ext)) + return false; + } +} + return true; +} + /* Check any combine extensions for EXT. */ void riscv_subset_list::handle_combine_ext () @@ -1194,9 +1218,12 @@ riscv_subset_list::parse (const char *arch, location_t loc) for (itr = subset_list->m_head; itr != NULL; itr = itr->next) { - subset_list->handle_implied_ext (itr); + subset_list->handle_implied_ext (itr->name.c_str ()); } + /* Make sure all implied extensions are included. */ + gcc_assert (subset_list->check_implied_ext ()); + subset_list->handle_combine_ext (); if (subset_list->lookup ("zfinx") && subset_list->lookup ("f")) diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h index 92e4fb31692..84a7a82db63 100644 --- a/gcc/config/riscv/riscv-subset.h +++ b/gcc/config/riscv/riscv-subset.h @@ -67,7 +67,8 @@ private: const char *parse_multiletter_ext (const char *, const char *, const char *); - void handle_implied_ext (riscv_subset_t *); + void handle_implied_ext (const char *); + bool check_implied_ext (); void handle_combine_ext (); public: diff --git a/gcc/testsuite/gcc.target/riscv/attribute-20.c b/gcc/testsuite/gcc.target/riscv/attribute-20.c new file mode 100644 index 000..f7d0b29b71c --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/attribute-20.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvl65536b -mabi=lp64d" } */ +int foo() +{ +} + +/* { dg-final { scan-assembler ".attribute arch, \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl16384b1p0_zvl2048b1p0_zvl256b1p0_zvl32768b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0_zvl65536b1p0_zvl8192b1p0\"" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/pr110696.c b/gcc/testsuite/gcc.target/riscv/pr110696.c new file mode 100644 index 000..a630f04e74f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr110696.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvl4096b -mabi=lp64d" } */ +int foo() +{ +} + +/* { dg-final { scan-assembl
RE: [PATCH V2] RISC-V: Support non-SLP unordered reduction
Committed, thanks Kito. Pan -Original Message- From: Gcc-patches On Behalf Of Kito Cheng via Gcc-patches Sent: Monday, July 17, 2023 5:33 PM To: Juzhe-Zhong Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH V2] RISC-V: Support non-SLP unordered reduction LGTM, thanks :) On Mon, Jul 17, 2023 at 4:20 PM Juzhe-Zhong wrote: > > This patch add reduc_*_scal to support reduction auto-vectorization. > > Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization. > > Consider this following case: > int __attribute__((noipa)) > and_loop (int32_t * __restrict x, > int32_t n, int res) > { > for (int i = 0; i < n; ++i) > res &= x[i]; > return res; > } > > ASM: > and_loop: > ble a1,zero,.L4 > vsetvli a3,zero,e32,m1,ta,ma > vmv.v.i v1,-1 > .L3: > vsetvli a5,a1,e32,m1,tu,ma > MUST BE "TU". > sllia4,a5,2 > sub a1,a1,a5 > vle32.v v2,0(a0) > add a0,a0,a4 > vand.vv v1,v2,v1 > bne a1,zero,.L3 > vsetivlizero,1,e32,m1,ta,ma > vmv.v.i v2,-1 > vsetvli a3,zero,e32,m1,ta,ma > vredand.vs v1,v1,v2 > vmv.x.s a5,v1 > and a0,a2,a5 > ret > .L4: > mv a0,a2 > ret > > Fix bug of VSETVL PASS which is caused by reduction testcase. > > SLP reduction and floating-point in-order reduction are not supported yet. > > gcc/ChangeLog: > > * config/riscv/autovec.md (reduc_plus_scal_): New pattern. > (reduc_smax_scal_): Ditto. > (reduc_umax_scal_): Ditto. > (reduc_smin_scal_): Ditto. > (reduc_umin_scal_): Ditto. > (reduc_and_scal_): Ditto. > (reduc_ior_scal_): Ditto. > (reduc_xor_scal_): Ditto. > * config/riscv/riscv-protos.h (enum insn_type): Add reduction. > (expand_reduction): New function. > * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto. > (emit_vlmax_fp_reduction_insn): Ditto. > (get_m1_mode): Ditto. > (expand_cond_len_binop): Fix name. > (expand_reduction): New function > * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix VSETVL BUG. > (validate_change_or_fail): New function. > (change_insn): Fix VSETVL BUG. > (change_vsetvl_insn): Ditto. > (pass_vsetvl::backward_demand_fusion): Ditto. > (pass_vsetvl::df_post_optimization): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/rvv.exp: Add reduction tests. > * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test. > * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test. > > --- > gcc/config/riscv/autovec.md | 138 ++ > gcc/config/riscv/riscv-protos.h | 2 + > gcc/config/riscv/riscv-v.cc | 84 ++- > gcc/config/riscv/riscv-vsetvl.cc | 57 ++-- > .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++ > .../riscv/rvv/autovec/reduc/reduc-2.c | 129 > .../riscv/rvv/autovec/reduc/reduc-3.c | 65 + > .../riscv/rvv/autovec/reduc/reduc-4.c | 59 > .../riscv/rvv/autovec/reduc/reduc_run-1.c | 56 +++ > .../riscv/rvv/autovec/reduc/reduc_run-2.c | 79 ++ > .../riscv/rvv/autovec/reduc/reduc_run-3.c | 49 +++ > .../riscv/rvv/autovec/reduc/reduc_run-4.c | 66 + > gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 2 + > 13 files changed, 887 insertions(+), 17 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md > index 64a41bd7101..8cdec75bacf 100644 > --- a/gcc/config/riscv/autovec.md > +++ b/gcc/config/riscv/autovec.md
Re: [PATCH] RISC-V: Ensure all implied extensions are included[PR110696]
LGTM, thanks for the patch :) On Mon, Jul 17, 2023 at 5:53 PM Lehua Ding wrote: > > Hi, > > This patch fix target/PR110696, recursively add all implied extensions. > > Best, > Lehua > > PR target/110696 > > gcc/ChangeLog: > > * common/config/riscv/riscv-common.cc > (riscv_subset_list::handle_implied_ext): recur add all implied extensions. > (riscv_subset_list::check_implied_ext): Add new method. > (riscv_subset_list::parse): Call checker check_implied_ext. > * config/riscv/riscv-subset.h: Add new method. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/attribute-20.c: New test. > * gcc.target/riscv/pr110696.c: New test. > > --- > gcc/common/config/riscv/riscv-common.cc | 33 +-- > gcc/config/riscv/riscv-subset.h | 3 +- > gcc/testsuite/gcc.target/riscv/attribute-20.c | 7 > gcc/testsuite/gcc.target/riscv/pr110696.c | 7 > 4 files changed, 46 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/attribute-20.c > create mode 100644 gcc/testsuite/gcc.target/riscv/pr110696.c > > diff --git a/gcc/common/config/riscv/riscv-common.cc > b/gcc/common/config/riscv/riscv-common.cc > index 28c8f0c1489..19075c0b241 100644 > --- a/gcc/common/config/riscv/riscv-common.cc > +++ b/gcc/common/config/riscv/riscv-common.cc > @@ -949,14 +949,14 @@ riscv_subset_list::parse_std_ext (const char *p) > > /* Check any implied extensions for EXT. */ > void > -riscv_subset_list::handle_implied_ext (riscv_subset_t *ext) > +riscv_subset_list::handle_implied_ext (const char *ext) > { >const riscv_implied_info_t *implied_info; >for (implied_info = &riscv_implied_info[0]; > implied_info->ext; > ++implied_info) > { > - if (strcmp (ext->name.c_str (), implied_info->ext) != 0) > + if (strcmp (ext, implied_info->ext) != 0) > continue; > >/* Skip if implied extension already present. */ > @@ -966,6 +966,9 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t > *ext) >/* Version of implied extension will get from current ISA spec > version. */ >add (implied_info->implied_ext, true); > + > + /* Recursively add implied extension by implied_info->implied_ext. */ > + handle_implied_ext (implied_info->implied_ext); > } > >/* For RISC-V ISA version 2.2 or earlier version, zicsr and zifence is > @@ -980,6 +983,27 @@ riscv_subset_list::handle_implied_ext (riscv_subset_t > *ext) > } > } > > +/* Check that all implied extensions are included. */ > +bool > +riscv_subset_list::check_implied_ext () > +{ > + riscv_subset_t *itr; > + for (itr = m_head; itr != NULL; itr = itr->next) > +{ > + const riscv_implied_info_t *implied_info; > + for (implied_info = &riscv_implied_info[0]; implied_info->ext; > + ++implied_info) > + { > + if (strcmp (itr->name.c_str(), implied_info->ext) != 0) > + continue; > + > + if (!lookup (implied_info->implied_ext)) > + return false; > + } > +} > + return true; > +} > + > /* Check any combine extensions for EXT. */ > void > riscv_subset_list::handle_combine_ext () > @@ -1194,9 +1218,12 @@ riscv_subset_list::parse (const char *arch, location_t > loc) > >for (itr = subset_list->m_head; itr != NULL; itr = itr->next) > { > - subset_list->handle_implied_ext (itr); > + subset_list->handle_implied_ext (itr->name.c_str ()); > } > > + /* Make sure all implied extensions are included. */ > + gcc_assert (subset_list->check_implied_ext ()); > + >subset_list->handle_combine_ext (); > >if (subset_list->lookup ("zfinx") && subset_list->lookup ("f")) > diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h > index 92e4fb31692..84a7a82db63 100644 > --- a/gcc/config/riscv/riscv-subset.h > +++ b/gcc/config/riscv/riscv-subset.h > @@ -67,7 +67,8 @@ private: >const char *parse_multiletter_ext (const char *, const char *, > const char *); > > - void handle_implied_ext (riscv_subset_t *); > + void handle_implied_ext (const char *); > + bool check_implied_ext (); >void handle_combine_ext (); > > public: > diff --git a/gcc/testsuite/gcc.target/riscv/attribute-20.c > b/gcc/testsuite/gcc.target/riscv/attribute-20.c > new file mode 100644 > index 000..f7d0b29b71c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/attribute-20.c > @@ -0,0 +1,7 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv_zvl65536b -mabi=lp64d" } */ > +int foo() > +{ > +} > + > +/* { dg-final { scan-assembler ".attribute arch, > \"rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl1024b1p0_zvl128b1p0_zvl16384b1p0_zvl2048b1p0_zvl256b1p0_zvl32768b1p0_zvl32b1p0_zvl4096b1p0_zvl512b1p0_zvl64b1p0_zvl65536b1p0_zvl8192b1p0\"" > } } */ > diff --git a/gcc/testsuite
Fix optimize_mask_stores profile update
Hi, While looking into sphinx3 regression I noticed that vectorizer produces BBs with overall probability count 120%. This patch fixes it. Richi, I don't know how to create a testcase, but having one would be nice. Bootstrapped/regtested x86_64-linux, commited last night (sorry for late email) gcc/ChangeLog: PR tree-optimization/110649 * tree-vect-loop.cc (optimize_mask_stores): Set correctly probability of the if-then-else construct. diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7d917bfd72c..b44fb9c7712 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11680,6 +11679,7 @@ optimize_mask_stores (class loop *loop) efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); /* Put STORE_BB to likely part. */ efalse->probability = profile_probability::unlikely (); + e->probability = efalse->probability.invert (); store_bb->count = efalse->count (); make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU); if (dom_info_available_p (CDI_DOMINATORS))
Fix profile update in scale_profile_for_vect_loop
Hi, when vectorizing 4 times, we sometimes do for <4x vectorized body> for <2x vectorized body> for <1x vectorized body> Here the second two fors handling epilogue never iterates. Currently vecotrizer thinks that the middle for itrates twice. This turns out to be scale_profile_for_vect_loop that uses niter_for_unrolled_loop. At that time we know epilogue will iterate at most 2 times but niter_for_unrolled_loop does not know that the last iteration will be taken by the epilogue-of-epilogue and thus it think that the loop may iterate once and exit in middle of second iteration. We already do correct job updating niter bounds and this is just ordering issue. This patch makes us to first update the bounds and then do updating of the loop. I re-implemented the function more correctly and precisely. The loop reducing iteration factor for overly flat profiles is bit funny, but only other method I can think of is to compute sreal scale that would have similar overhead I think. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/110649 * tree-vect-loop.cc (scale_profile_for_vect_loop): (vect_transform_loop): (optimize_mask_stores): diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7d917bfd72c..b44fb9c7712 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10842,31 +10842,30 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, static void scale_profile_for_vect_loop (class loop *loop, unsigned vf) { - edge preheader = loop_preheader_edge (loop); - /* Reduce loop iterations by the vectorization factor. */ - gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf); - profile_count freq_h = loop->header->count, freq_e = preheader->count (); - - if (freq_h.nonzero_p ()) -{ - profile_probability p; - - /* Avoid dropping loop body profile counter to 0 because of zero count -in loop's preheader. */ - if (!(freq_e == profile_count::zero ())) -freq_e = freq_e.force_nonzero (); - p = (freq_e * (new_est_niter + 1)).probability_in (freq_h); - scale_loop_frequencies (loop, p); -} - + /* Loop body executes VF fewer times and exit increases VF times. */ edge exit_e = single_exit (loop); - exit_e->probability = profile_probability::always () / (new_est_niter + 1); - - edge exit_l = single_pred_edge (loop->latch); - profile_probability prob = exit_l->probability; - exit_l->probability = exit_e->probability.invert (); - if (prob.initialized_p () && exit_l->probability.initialized_p ()) -scale_bbs_frequencies (&loop->latch, 1, exit_l->probability / prob); + profile_count entry_count = loop_preheader_edge (loop)->count (); + + /* If we have unreliable loop profile avoid dropping entry + count bellow header count. This can happen since loops + has unrealistically low trip counts. */ + while (vf > 1 +&& loop->header->count > entry_count +&& loop->header->count < entry_count * vf) +vf /= 2; + + if (entry_count.nonzero_p ()) +set_edge_probability_and_rescale_others + (exit_e, +entry_count.probability_in (loop->header->count / vf)); + /* Avoid producing very large exit probability when we do not have + sensible profile. */ + else if (exit_e->probability < profile_probability::always () / (vf * 2)) +set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf); + loop->latch->count = single_pred_edge (loop->latch)->count (); + + scale_loop_profile (loop, profile_probability::always () / vf, + get_likely_max_loop_iterations_int (loop)); } /* For a vectorized stmt DEF_STMT_INFO adjust all vectorized PHI @@ -11476,7 +11475,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) niters_vector_mult_vf, !niters_no_overflow); unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo); - scale_profile_for_vect_loop (loop, assumed_vf); /* True if the final iteration might not handle a full vector's worth of scalar iterations. */ @@ -11547,6 +11545,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) assumed_vf) - 1 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed, assumed_vf) - 1); + scale_profile_for_vect_loop (loop, assumed_vf); if (dump_enabled_p ()) {
Avoid double profile udpate in try_peel_loop
Hi, try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which subtracts the profile from the original loop. However then it tries to scale the profile in a wrong way (it forces header count to be entry count). This eliminates to profile misupdates in the internal loop of sphinx3. gcc/ChangeLog: PR middle-end/110649 * tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile update. diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index 0117dbfc91b..bdb738af7a8 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++ b/gcc/tree-ssa-loop-ivcanon.cc @@ -1152,6 +1152,7 @@ try_peel_loop (class loop *loop, } if (may_be_zero) bitmap_clear_bit (wont_exit, 1); + if (!gimple_duplicate_loop_body_to_header_edge ( loop, loop_preheader_edge (loop), npeel, wont_exit, exit, &edges_to_remove, DLTHE_FLAG_UPDATE_FREQ)) @@ -1168,18 +1169,6 @@ try_peel_loop (class loop *loop, adjust_loop_info_after_peeling (loop, npeel, true); profile_count entry_count = profile_count::zero (); - edge e; - edge_iterator ei; - FOR_EACH_EDGE (e, ei, loop->header->preds) -if (e->src != loop->latch) - { - if (e->src->count.initialized_p ()) - entry_count += e->src->count; - gcc_assert (!flow_bb_inside_loop_p (loop, e->src)); - } - profile_probability p; - p = entry_count.probability_in (loop->header->count); - scale_loop_profile (loop, p, -1); bitmap_set_bit (peeled_loops, loop->num); return true; }
Re: Fix optimize_mask_stores profile update
On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches wrote: > > Hi, > While looking into sphinx3 regression I noticed that vectorizer produces > BBs with overall probability count 120%. This patch fixes it. > Richi, I don't know how to create a testcase, but having one would > be nice. > > Bootstrapped/regtested x86_64-linux, commited last night (sorry for > late email) This should trigger with sth like for (i) if (cond[i]) out[i] = 1.; so a masked store and then using AVX2+. ISTR we disable AVX masked stores on zen (but not AVX512). Richard. > gcc/ChangeLog: > > PR tree-optimization/110649 > * tree-vect-loop.cc (optimize_mask_stores): Set correctly > probability of the if-then-else construct. > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 7d917bfd72c..b44fb9c7712 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -11680,6 +11679,7 @@ optimize_mask_stores (class loop *loop) >efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); >/* Put STORE_BB to likely part. */ >efalse->probability = profile_probability::unlikely (); > + e->probability = efalse->probability.invert (); >store_bb->count = efalse->count (); isn't the count also wrong? Or rather efalse should be likely(). We're testing doing if (!mask all zeros) masked-store because a masked store with all zero mask can end up invoking COW page fault handling multiple times (because it doesn't actually write). Note -Ofast allows store data races and thus does RMW instead of a masked store. >make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU); >if (dom_info_available_p (CDI_DOMINATORS))
RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
> -Original Message- > From: Richard Biener > Sent: Friday, July 14, 2023 2:35 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV > updates for early break. > > On Thu, 13 Jul 2023, Tamar Christina wrote: > > > > -Original Message- > > > From: Richard Biener > > > Sent: Thursday, July 13, 2023 6:31 PM > > > To: Tamar Christina > > > Cc: gcc-patches@gcc.gnu.org; nd ; > j...@ventanamicro.com > > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV > > > updates for early break. > > > > > > On Wed, 28 Jun 2023, Tamar Christina wrote: > > > > > > > Hi All, > > > > > > > > This patch updates the peeling code to maintain LCSSA during peeling. > > > > The rewrite also naturally takes into account multiple exits and so it > > > > didn't > > > > make sense to split them off. > > > > > > > > For the purposes of peeling the only change for multiple exits is that > > > > the > > > > secondary exits are all wired to the start of the new loop preheader > > > > when > > > doing > > > > epilogue peeling. > > > > > > > > When doing prologue peeling the CFG is kept in tact. > > > > > > > > For both epilogue and prologue peeling we wire through between the > two > > > loops any > > > > PHI nodes that escape the first loop into the second loop if flow_loops > > > > is > > > > specified. The reason for this conditionality is because > > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 > > > > ways: > > > > - prologue peeling > > > > - epilogue peeling > > > > - loop distribution > > > > > > > > for the last case the loops should remain independent, and so not be > > > connected. > > > > Because of this propagation of only used phi nodes get_current_def can > be > > > used > > > > to easily find the previous definitions. However live statements that > > > > are > > > > not used inside the loop itself are not propagated (since if unused, the > > > moment > > > > we add the guard in between the two loops the value across the bypass > edge > > > can > > > > be wrong if the loop has been peeled.) > > > > > > > > This is dealt with easily enough in find_guard_arg. > > > > > > > > For multiple exits, while we are in LCSSA form, and have a correct DOM > tree, > > > the > > > > moment we add the guard block we will change the dominators again. To > > > deal with > > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the > blocks > > > to > > > > update without having to recompute the list of blocks to update again. > > > > > > > > When multiple exits and doing epilogue peeling we will also temporarily > have > > > an > > > > incorrect VUSES chain for the secondary exits as it anticipates the > > > > final > result > > > > after the VDEFs have been moved. This will thus be corrected once the > code > > > > motion is applied. > > > > > > > > Lastly by doing things this way we can remove the helper functions that > > > > previously did lock step iterations to update things as it went along. > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > Ok for master? > > > > > > Not sure if I get through all of this in one go - so be prepared that > > > the rest of the review follows another day. > > > > No worries, I appreciate the reviews! > > Just giving some quick replies for when you continue. > > Continueing. > > > > > > > > Thanks, > > > > Tamar > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops > > > > = > > > false. > > > > * tree-ssa-loop-niter.cc (loop_only_exit_p): Fix bug when > > > > exit==null. > > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add > > > additional > > > > assert. > > > > (vect_set_loop_condition_normal): Skip modifying loop IV for > > > > multiple > > > > exits. > > > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit > > > peeling. > > > > (slpeel_can_duplicate_loop_p): Likewise. > > > > (vect_update_ivs_after_vectorizer): Don't enter this... > > > > (vect_update_ivs_after_early_break): ...but instead enter here. > > > > (find_guard_arg): Update for new peeling code. > > > > (slpeel_update_phi_nodes_for_loops): Remove. > > > > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 > > > checks. > > > > (slpeel_update_phi_nodes_for_lcssa): Remove. > > > > (vect_do_peeling): Fix VF for multiple exits and force epilogue. > > > > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize > > > > non_break_control_flow and early_breaks. > > > > (vect_need_peeling_or_partial_vectors_p): Force partial vector > > > > if > > > > multiple exits and VLA. > > > > (vect_analyze_loop_form): Support inner loop multiple exits.
Re: [PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.
On Mon, Jul 17, 2023 at 8:44 AM Hongtao Liu wrote: > > Ping. > > On Tue, Jul 11, 2023 at 5:16 PM liuhongt via Gcc-patches > wrote: > > > > Similar like we did for CMPXCHG, but extended to all > > ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same > > as CMP. > > > > When operand order in CMP insn is same as that in CMPCCXADD, > > CMP insn can be eliminated directly. > > > > When operand order is swapped in CMP insn, only optimize > > cmpccxadd + cmpl + jcc/setcc to cmpccxadd + jcc/setcc when FLAGS_REG is dead > > after jcc/setcc plus adjusting code for jcc/setcc. > > > > gcc/ChangeLog: > > > > PR target/110591 > > * config/i386/sync.md (cmpccxadd_): Adjust the pattern > > to explicitly set FLAGS_REG like *cmp_1, also add extra > > 3 define_peephole2 after the pattern. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr110591.c: New test. > > * gcc.target/i386/pr110591-2.c: New test. LGTM. Thanks, Uros. > > --- > > gcc/config/i386/sync.md| 160 - > > gcc/testsuite/gcc.target/i386/pr110591-2.c | 90 > > gcc/testsuite/gcc.target/i386/pr110591.c | 66 + > > 3 files changed, 315 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110591-2.c > > create mode 100644 gcc/testsuite/gcc.target/i386/pr110591.c > > > > diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md > > index e1fa1504deb..e84226cf895 100644 > > --- a/gcc/config/i386/sync.md > > +++ b/gcc/config/i386/sync.md > > @@ -1093,7 +1093,9 @@ (define_insn "cmpccxadd_" > > UNSPECV_CMPCCXADD)) > > (set (match_dup 1) > > (unspec_volatile:SWI48x [(const_int 0)] UNSPECV_CMPCCXADD)) > > - (clobber (reg:CC FLAGS_REG))] > > + (set (reg:CC FLAGS_REG) > > + (compare:CC (match_dup 1) > > + (match_dup 2)))] > >"TARGET_CMPCCXADD && TARGET_64BIT" > > { > >char buf[128]; > > @@ -1105,3 +1107,159 @@ (define_insn "cmpccxadd_" > >output_asm_insn (buf, operands); > >return ""; > > }) > > + > > +(define_peephole2 > > + [(set (match_operand:SWI48x 0 "register_operand") > > + (match_operand:SWI48x 1 "x86_64_general_operand")) > > + (parallel [(set (match_dup 0) > > + (unspec_volatile:SWI48x > > +[(match_operand:SWI48x 2 "memory_operand") > > + (match_dup 0) > > + (match_operand:SWI48x 3 "register_operand") > > + (match_operand:SI 4 "const_int_operand")] > > +UNSPECV_CMPCCXADD)) > > + (set (match_dup 2) > > + (unspec_volatile:SWI48x [(const_int 0)] > > UNSPECV_CMPCCXADD)) > > + (set (reg:CC FLAGS_REG) > > + (compare:CC (match_dup 2) > > + (match_dup 0)))]) > > + (set (reg FLAGS_REG) > > + (compare (match_operand:SWI48x 5 "register_operand") > > +(match_operand:SWI48x 6 "x86_64_general_operand")))] > > + "TARGET_CMPCCXADD && TARGET_64BIT > > + && rtx_equal_p (operands[0], operands[5]) > > + && rtx_equal_p (operands[1], operands[6])" > > + [(set (match_dup 0) > > + (match_dup 1)) > > + (parallel [(set (match_dup 0) > > + (unspec_volatile:SWI48x > > +[(match_dup 2) > > + (match_dup 0) > > + (match_dup 3) > > + (match_dup 4)] > > +UNSPECV_CMPCCXADD)) > > + (set (match_dup 2) > > + (unspec_volatile:SWI48x [(const_int 0)] > > UNSPECV_CMPCCXADD)) > > + (set (reg:CC FLAGS_REG) > > + (compare:CC (match_dup 2) > > + (match_dup 0)))]) > > + (set (match_dup 7) > > + (match_op_dup 8 > > + [(match_dup 9) (const_int 0)]))]) > > + > > +(define_peephole2 > > + [(set (match_operand:SWI48x 0 "register_operand") > > + (match_operand:SWI48x 1 "x86_64_general_operand")) > > + (parallel [(set (match_dup 0) > > + (unspec_volatile:SWI48x > > +[(match_operand:SWI48x 2 "memory_operand") > > + (match_dup 0) > > + (match_operand:SWI48x 3 "register_operand") > > + (match_operand:SI 4 "const_int_operand")] > > +UNSPECV_CMPCCXADD)) > > + (set (match_dup 2) > > + (unspec_volatile:SWI48x [(const_int 0)] > > UNSPECV_CMPCCXADD)) > > + (set (reg:CC FLAGS_REG) > > + (compare:CC (match_dup 2) > > + (match_dup 0)))]) > > + (set (reg FLAGS_REG) > > + (compare (match_operand:SWI48x 5 "register_operand") > > +(match_operand:SWI48x 6 "x86_64_general_operand"))) > > + (set (match_operand:QI 7 "nonimmediate_operand") > > + (match_operator:QI 8 "ix86_comparison_int_o
Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.
On Mon, Jul 17, 2023 at 10:28 AM Hongtao Liu wrote: > > I'd like to ping for this patch (only patch 1/2, for patch 2/2, I > think that may not be necessary). > > On Mon, May 15, 2023 at 9:20 AM Hongtao Liu wrote: > > > > ping. > > > > On Fri, Apr 21, 2023 at 9:55 PM liuhongt wrote: > > > > > > > > + if (!TARGET_SSE2) > > > > > +{ > > > > > + if (c_dialect_cxx () > > > > > + && cxx_dialect > cxx20) > > > > > > > > Formatting, both conditions are short, so just put them on one line. > > > Changed. > > > > > > > But for the C++23 macros, more importantly I think we really should > > > > also in ix86_target_macros_internal add > > > > if (c_dialect_cxx () > > > > && cxx_dialect > cxx20 > > > > && (isa_flag & OPTION_MASK_ISA_SSE2)) > > > > { > > > > def_or_undef (parse_in, "__STDCPP_FLOAT16_T__"); > > > > def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__"); > > > > } > > > > plus associated libstdc++ changes. It can be done incrementally though. > > > Added in PATCH 2/2 > > > > > > > > + if (flag_building_libgcc) > > > > > + { > > > > > + /* libbid uses __LIBGCC_HAS_HF_MODE__ and > > > > > __LIBGCC_HAS_BF_MODE__ > > > > > + to check backend support of _Float16 and __bf16 type. */ > > > > > > > > That is actually the case only for HFmode, but not for BFmode right now. > > > > So, we need further work. One is to add the BFmode support in there, > > > > and another one is make sure the _Float16 <-> _Decimal* and __bf16 <-> > > > > _Decimal* conversions are compiled in also if not -msse2 by default. > > > > One way to do that is wrap the HF and BF mode related functions on x86 > > > > #ifndef __SSE2__ into the pragmas like intrin headers use (but then > > > > perhaps we don't need to undef this stuff here), another is not provide > > > > the hf/bf support in that case from the TUs where they are provided now, > > > > but from a different one which would be compiled with -msse2. > > > Add CFLAGS-_hf_to_sd.c += -msse2, similar for other files in libbid, just > > > like > > > we did before for HFtype softfp. Then no need to undef libgcc macros. > > > > > > > >/* We allowed the user to turn off SSE for kernel mode. Don't > > > > > crash if > > > > > some less clueful developer tries to use floating-point anyway. > > > > > */ > > > > > - if (needed_sseregs && !TARGET_SSE) > > > > > + if (needed_sseregs > > > > > + && (!TARGET_SSE > > > > > + || (VALID_SSE2_TYPE_MODE (mode) > > > > > + && !TARGET_SSE2))) > > > > > > > > Formatting, no need to split this up that much. > > > > if (needed_sseregs > > > > && (!TARGET_SSE > > > > || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2))) > > > > or even better > > > > if (needed_sseregs > > > > && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2))) > > > > will do it. > > > Changed. > > > > > > > Instead of this, just use > > > > if (!float16_type_node) > > > > { > > > > float16_type_node = ix86_float16_type_node; > > > > callback (float16_type_node); > > > > float16_type_node = NULL_TREE; > > > > } > > > > if (!bfloat16_type_node) > > > > { > > > > bfloat16_type_node = ix86_bf16_type_node; > > > > callback (bfloat16_type_node); > > > > bfloat16_type_node = NULL_TREE; > > > > } > > > Changed. > > > > > > > > > > > +static const char * > > > > > +ix86_invalid_conversion (const_tree fromtype, const_tree totype) > > > > > +{ > > > > > + if (element_mode (fromtype) != element_mode (totype)) > > > > > +{ > > > > > + /* Do no allow conversions to/from BFmode/HFmode scalar types > > > > > + when TARGET_SSE2 is not available. */ > > > > > + if ((TYPE_MODE (fromtype) == BFmode > > > > > +|| TYPE_MODE (fromtype) == HFmode) > > > > > + && !TARGET_SSE2) > > > > > > > > First of all, not really sure if this should be purely about scalar > > > > modes, not also complex and vector modes involving those inner modes. > > > > Because complex or vector modes with BF/HF elements will be without > > > > TARGET_SSE2 for sure lowered into scalar code and that can't be handled > > > > either. > > > > So if (!TARGET_SSE2 && GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode) > > > > or even better > > > > if (!TARGET_SSE2 && element_mode (fromtype) == BFmode) > > > > ? > > > > Or even better remember the 2 modes above into machine_mode temporaries > > > > and just use those in the != comparison and for the checks? > > > > > > > > Also, I think it is weird to tell user %<__bf16%> or %<_Float16%> when > > > > we know which one it is. Just return separate messages? > > > Changed. > > > > > > > > + /* Reject all single-operand operations on BFmode/HFmode except > > > > > for & > > > > > + when TARGET_SSE2 is not available. */ > > > > > + if ((element_mode (type) == BFmode || element_mode (type) == > > > > > HFmode) >
Re: [PATCH] RISC-V: Ensure all implied extensions are included[PR110696]
Commited to the trunk, thank you. -- Original -- From: "Kito Cheng"
Re: [PATCH] Export value/mask known bits from CCP.
On Mon, Jul 17, 2023 at 9:57 AM Aldy Hernandez via Gcc-patches wrote: > > Currently CCP throws away the known 1 bits because VRP and irange have > traditionally only had a way of tracking known 0s (set_nonzero_bits). > With the ability to keep all the known bits in the irange, we can now > save this between passes. > > OK? OK. > gcc/ChangeLog: > > * tree-ssa-ccp.cc (ccp_finalize): Export value/mask known bits. > --- > gcc/tree-ssa-ccp.cc | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc > index 0d0f02a8442..64d5fa81334 100644 > --- a/gcc/tree-ssa-ccp.cc > +++ b/gcc/tree-ssa-ccp.cc > @@ -1020,11 +1020,9 @@ ccp_finalize (bool nonzero_p) >else > { > unsigned int precision = TYPE_PRECISION (TREE_TYPE (val->value)); > - wide_int nonzero_bits > - = (wide_int::from (val->mask, precision, UNSIGNED) > - | wi::to_wide (val->value)); > - nonzero_bits &= get_nonzero_bits (name); > - set_nonzero_bits (name, nonzero_bits); > + wide_int value = wi::to_wide (val->value); > + wide_int mask = wide_int::from (val->mask, precision, UNSIGNED); > + set_bitmask (name, value, mask); > } > } > > -- > 2.40.1 >
Re: [PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]
On 11/07/2023 23:28, Jeff Law wrote: On 7/11/23 04:37, Andre Vieira (lists) via Gcc-patches wrote: Hi, This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. This will lead to insn-opinit.h, among the other OPTABS_H header files, being installed in the plugin directory. Bootstrapped aarch64-unknown-linux-gnu. @Jakub: could you check to see if it also addresses PR 110284? gcc/ChangeLog: PR 110610 * Makefile.in (INTERNAL_FN_H): Add OPTABS_H. Why use OPTABS_H here? Isn't the new dependency just on insn-opinit.h and insn-codes.h and neither of those #include other headers do they? Yeah, there was no particular reason other than I just felt the Makefile structure sort of lend itself that way. I checked genopinit.cc and it seems insn-opinit.h doesn't include any other header files, only the sources do, so I've changed the patch to only add insn-opinit.h to INTERNAL_FN_H. --- This patch fixes PR110610 by including insn-opinit.h in the INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. This will lead to insn-opinit.h, among the other OPTABS_H header files, being installed in the plugin directory. Bootstrapped aarch64-unknown-linux-gnu. gcc/ChangeLog: PR 110610 * Makefile.in (INTERNAL_FN_H): Add insn-opinit.h.diff --git a/gcc/Makefile.in b/gcc/Makefile.in index c478ec852013eae65b9f3ec0a443e023c7d8b452..683774ad446d545362644d2dbdc37723eea55bc3 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -976,7 +976,7 @@ READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \ gtm-builtins.def sanitizer.def INTERNAL_FN_DEF = internal-fn.def -INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) +INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) insn-opinit.h TREE_CORE_H = tree-core.h $(CORETYPES_H) all-tree.def tree.def \ c-family/c-common.def $(lang_tree_files) \ $(BUILTINS_DEF) $(INPUT_H) statistics.h \
Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.
On Mon, Jul 17, 2023 at 9:35 AM Tamar Christina wrote: > > > On Mon, Jul 17, 2023 at 12:21 AM Tamar Christina via Gcc-patches > patc...@gcc.gnu.org> wrote: > > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Monday, July 17, 2023 7:19 AM > > > > To: Roger Sayle > > > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina > > > > > > > > Subject: Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if- > > conv.cc. > > > > > > > > On Fri, Jul 14, 2023 at 8:56 PM Roger Sayle > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 > > > > > as > > > > > > > > > > the host compiler. Ok for mainline? [I might be missing > > > > > something] > > > > > > > > OK. Btw, while I didn't spot this during review I would appreciate > > > > if the code could use vec.[q]sort, this should work with a lambda as > > > > well I think. > > > > > > That was my first use, but that hits > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 > > > > That is not hitting PR 99469 but rather it means your comparison is not > > correct for an (unstable) sort. > > That is qsort comparator should have this relationship `f(a,b) == !f(b, a)` > > and > > `f(a,a)` should also return false. > > I'm using the standard std::pair comparator which indicates that f(a,a) is > true, > https://en.cppreference.com/w/cpp/utility/pair/operator_cmp > > > If you are running into this for qsort here, you will most likely run into > > issues > > with std::sort later on too. > > Don't see why or how. It needs to have a consistent relationship which > std::pair > maintains. So why would using the standard tuple comparator with a standard > std::sort cause problem? At least for return left.second < right.second; f(a,a) doesn't hold. Note qsort can end up comparing an element to itself (not sure if GCCs implementation now can). Richard. > Thanks, > Tamar > > > > > Thanks, > > Andrew > > > > > > > > Regards, > > > Tamar
[RFC] [v2] Extend fold_vec_perm to handle VLA vectors
Hi Richard, This is reworking of patch to extend fold_vec_perm to handle VLA vectors. The attached patch unifies handling of VLS and VLA vector_csts, while using fallback code for ctors. For VLS vector, the patch ignores underlying encoding, and uses npatterns = nelts, and nelts_per_pattern = 1. For VLA patterns, if sel has a stepped sequence, then it only chooses elements from a particular pattern of a particular input vector. To make things simpler, the patch imposes following constraints: (a) op0_npatterns, op1_npatterns and sel_npatterns are powers of 2. (b) The step size for a stepped sequence is a power of 2, and multiple of npatterns of chosen input vector. (c) Runtime vector length of sel is a multiple of sel_npatterns. So, we don't handle sel.length = 2 + 2x and npatterns = 4. Eg: op0, op1: npatterns = 2, nelts_per_pattern = 3 op0_len = op1_len = 16 + 16x. sel = { 0, 0, 2, 0, 4, 0, ... } npatterns = 2, nelts_per_pattern = 3. For pattern {0, 2, 4, ...} Let, a1 = 2 S = step size = 2 Let Esel denote number of elements per pattern in sel at runtime. Esel = (16 + 16x) / npatterns_sel = (16 + 16x) / 2 = (8 + 8x) So, last element of pattern: ae = a1 + (Esel - 2) * S = 2 + (8 + 8x - 2) * 2 = 14 + 16x a1 /trunc arg0_len = 2 / (16 + 16x) = 0 ae /trunc arg0_len = (14 + 16x) / (16 + 16x) = 0 Since both are equal with quotient = 0, we select elements from op0. Since step size (S) is a multiple of npatterns(op0), we select all elements from same pattern of op0. res_npatterns = max (op0_npatterns, max (op1_npatterns, sel_npatterns)) = max (2, max (2, 2) = 2 res_nelts_per_pattern = max (op0_nelts_per_pattern, max (op1_nelts_per_pattern, sel_nelts_per_pattern)) = max (3, max (3, 3)) = 3 So res has encoding with npatterns = 2, nelts_per_pattern = 3. res: { op0[0], op0[0], op0[2], op0[0], op0[4], op0[0], ... } Unfortunately, this results in an issue for poly_int_cst index: For example, op0, op1: npatterns = 1, nelts_per_pattern = 3 op0_len = op1_len = 4 + 4x sel: { 4 + 4x, 5 + 4x, 6 + 4x, ... } // should choose op1 In this case, a1 = 5 + 4x S = (6 + 4x) - (5 + 4x) = 1 Esel = 4 + 4x ae = a1 + (esel - 2) * S = (5 + 4x) + (4 + 4x - 2) * 1 = 7 + 8x IIUC, 7 + 8x will always be index for last element of op1 ? if x = 0, len = 4, 7 + 8x = 7 if x = 1, len = 8, 7 + 8x = 15, etc. So the stepped sequence will always choose elements from op1 regardless of vector length for above case ? However, ae /trunc op0_len = (7 + 8x) / (4 + 4x) which is not defined because 7/4 != 8/4 and we return NULL_TREE, but I suppose the expected result would be: res: { op1[0], op1[1], op1[2], ... } ? The patch passes bootstrap+test on aarch64-linux-gnu with and without sve, and on x86_64-unknown-linux-gnu. I would be grateful for suggestions on how to proceed. Thanks, Prathamesh diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index a02ede79fed..8028b3e8e9a 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3. If not see #include "vec-perm-indices.h" #include "asan.h" #include "gimple-range.h" +#include +#include "tree-pretty-print.h" +#include "gimple-pretty-print.h" +#include "print-tree.h" /* Nonzero if we are folding constants inside an initializer or a C++ manifestly-constant-evaluated context; zero otherwise. @@ -10493,15 +10497,9 @@ fold_mult_zconjz (location_t loc, tree type, tree expr) static bool vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts) { - unsigned HOST_WIDE_INT i, nunits; + unsigned HOST_WIDE_INT i; - if (TREE_CODE (arg) == VECTOR_CST - && VECTOR_CST_NELTS (arg).is_constant (&nunits)) -{ - for (i = 0; i < nunits; ++i) - elts[i] = VECTOR_CST_ELT (arg, i); -} - else if (TREE_CODE (arg) == CONSTRUCTOR) + if (TREE_CODE (arg) == CONSTRUCTOR) { constructor_elt *elt; @@ -10519,6 +10517,230 @@ vec_cst_ctor_to_array (tree arg, unsigned int nelts, tree *elts) return true; } +/* Return a vector with (NPATTERNS, NELTS_PER_PATTERN) encoding. */ + +static tree +vector_cst_reshape (tree vec, unsigned npatterns, unsigned nelts_per_pattern) +{ + gcc_assert (pow2p_hwi (npatterns)); + + if (VECTOR_CST_NPATTERNS (vec) == npatterns + && VECTOR_CST_NELTS_PER_PATTERN (vec) == nelts_per_pattern) +return vec; + + tree v = make_vector (exact_log2 (npatterns), nelts_per_pattern); + TREE_TYPE (v) = TREE_TYPE (vec); + + unsigned nelts = npatterns * nelts_per_pattern; + for (unsigned i = 0; i < nelts; i++) +VECTOR_CST_ENCODED_ELT(v, i) = vector_cst_elt (vec, i); + return v; +} + +/* Helper routine for fold_vec_perm_vla to check if ARG is a suitable + operand for VLA vec_perm folding. If arg is VLS, then set +
Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
On Fri, Jul 14, 2023 at 12:18 PM Tejas Belagod wrote: > > On 7/13/23 4:05 PM, Richard Biener wrote: > > On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod > > wrote: > >> > >> On 7/3/23 1:31 PM, Richard Biener wrote: > >>> On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod > >>> wrote: > > On 6/29/23 6:55 PM, Richard Biener wrote: > > On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod > > wrote: > >> > >> > >> > >> > >> > >> From: Richard Biener > >> Date: Tuesday, June 27, 2023 at 12:58 PM > >> To: Tejas Belagod > >> Cc: gcc-patches@gcc.gnu.org > >> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >> > >> On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod > >> wrote: > >>> > >>> > >>> > >>> > >>> > >>> From: Richard Biener > >>> Date: Monday, June 26, 2023 at 2:23 PM > >>> To: Tejas Belagod > >>> Cc: gcc-patches@gcc.gnu.org > >>> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors > >>> > >>> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches > >>> wrote: > > Hi, > > Packed Boolean Vectors > -- > > I'd like to propose a feature addition to GNU Vector extensions to > add packed > boolean vectors (PBV). This has been discussed in the past here[1] > and a variant has > been implemented in Clang recently[2]. > > With predication features being added to vector architectures (SVE, > MVE, AVX), > it is a useful feature to have to model predication on targets. > This could > find its use in intrinsics or just used as is as a GNU vector > extension being > mapped to underlying target features. For example, the packed > boolean vector > could directly map to a predicate register on SVE. > > Also, this new packed boolean type GNU extension can be used with > SVE ACLE > intrinsics to replace a fixed-length svbool_t. > > Here are a few options to represent the packed boolean vector type. > >>> > >>> The GIMPLE frontend uses a new 'vector_mask' attribute: > >>> > >>> typedef int v8si __attribute__((vector_size(8*sizeof(int; > >>> typedef v8si v8sib __attribute__((vector_mask)); > >>> > >>> it get's you a vector type that's the appropriate (dependent on the > >>> target) vector > >>> mask type for the vector data type (v8si in this case). > >>> > >>> > >>> > >>> Thanks Richard. > >>> > >>> Having had a quick look at the implementation, it does seem to tick > >>> the boxes. > >>> > >>> I must admit I haven't dug deep, but if the target hook allows the > >>> mask to be > >>> > >>> defined in way that is target-friendly (and I don't know how much > >>> effort it will > >>> > >>> be to migrate the attribute to more front-ends), it should do the job > >>> nicely. > >>> > >>> Let me go back and dig a bit deeper and get back with questions if > >>> any. > >> > >> > >> Let me add that the advantage of this is the compiler doesn't need > >> to support weird explicitely laid out packed boolean vectors that do > >> not match what the target supports and the user doesn't need to know > >> what the target supports (and thus have an #ifdef maze around > >> explicitely > >> specified layouts). > >> > >> Sorry for the delayed response – I spent a day experimenting with > >> vector_mask. > >> > >> > >> > >> Yeah, this is what option 4 in the RFC is trying to achieve – be > >> portable enough > >> > >> to avoid having to sprinkle the code with ifdefs. > >> > >> > >> It does remove some flexibility though, for example with -mavx512f > >> -mavx512vl > >> you'll get AVX512 style masks for V4SImode data vectors but of course > >> the > >> target sill supports SSE2/AVX2 style masks as well, but those would > >> not be > >> available as "packed boolean vectors", though they are of course in > >> fact > >> equal to V4SImode data vectors with -1 or 0 values, so in this > >> particular > >> case it might not matter. > >> > >> That said, the vector_mask attribute will get you V4SImode vectors with > >> signed boolean elements of 32 bits for V4SImode data vectors with > >> SSE2/AVX2. > >> > >> > >> > >> This sounds very much like what the scenario would be with NEON vs > >> SVE. Coming to think > >> > >> of it, vector_mask resembles option 4 in the proposal with ‘n’ implied > >> by the ‘base’ vector type > >> > >> and a ‘w’ specified for the type. > >> > >> > >> > >> Given its current implementation,
[committed] Restore bootstrap by removing unused variable in tree-ssa-loop-ivcanon.cc
Hi, This restores bootstrap by removing the variable causing: /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc: In function ‘bool try_peel_loop(loop*, edge, tree, bool, long int)’: /home/mjambor/gcc/trunk/src/gcc/tree-ssa-loop-ivcanon.cc:1170:17: error: variable ‘entry_count’ set but not used [-Werror=unused-but-set-variable] 1170 | profile_count entry_count = profile_count::zero (); | ^~~ cc1plus: all warnings being treated as errors ACKed by Honza in a chat, passed a bootstrap on x86_64-linux, committed. Thanks, Martin gcc/ChangeLog: 2023-07-17 Martin Jambor * tree-ssa-loop-ivcanon.cc (try_peel_loop): Remove unused variable entry_count. --- gcc/tree-ssa-loop-ivcanon.cc | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index bdb738af7a8..a895e8e65be 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++ b/gcc/tree-ssa-loop-ivcanon.cc @@ -1167,7 +1167,6 @@ try_peel_loop (class loop *loop, loop->num, (int) npeel); } adjust_loop_info_after_peeling (loop, npeel, true); - profile_count entry_count = profile_count::zero (); bitmap_set_bit (peeled_loops, loop->num); return true; -- 2.41.0
Re: Fix optimize_mask_stores profile update
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches > wrote: > > > > Hi, > > While looking into sphinx3 regression I noticed that vectorizer produces > > BBs with overall probability count 120%. This patch fixes it. > > Richi, I don't know how to create a testcase, but having one would > > be nice. > > > > Bootstrapped/regtested x86_64-linux, commited last night (sorry for > > late email) > > This should trigger with sth like > > for (i) > if (cond[i]) > out[i] = 1.; > > so a masked store and then using AVX2+. ISTR we disable AVX masked > stores on zen (but not AVX512). OK, let me see if I can get a testcase out of that. > >efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); > >/* Put STORE_BB to likely part. */ > >efalse->probability = profile_probability::unlikely (); > > + e->probability = efalse->probability.invert (); > >store_bb->count = efalse->count (); > > isn't the count also wrong? Or rather efalse should be likely(). We're > testing doing > > if (!mask all zeros) > masked-store > > because a masked store with all zero mask can end up invoking COW page fault > handling multiple times (because it doesn't actually write). Hmm, I only fixed the profile, efalse was already set to unlikely, but indeed I think it should be likely. Maybe we can compute some bound on actual probability by knowing if(cond[i]) probability. If the loop always does factor many ones or zeros, the probability would remain the same. If that is p and they are all independent, the outcome would be (1-p)^factor sp we know the conditoinal shoul dbe in ragne (1-p)^factor(1-p), right? Honza > > Note -Ofast allows store data races and thus does RMW instead of a masked > store. > > >make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU); > >if (dom_info_available_p (CDI_DOMINATORS))
[PATCH] RTL_SSA: Relax PHI_MODE in phi_setup
Hi, Richard. RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS (inserted after RA) ICE: rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 13 | } | ^ 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3186 0x1407616 wider_subreg_mode(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3252 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, rtl_ssa::set_info**, bitmap_head*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 0x2a2c142 rtl_ssa::function_info::simplify_phis() ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 0x1cebab9 pass_vsetvl::init() ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 0x1cec150 pass_vsetvl::execute(function*) ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 The reason is that we have V32QImode (size = [32,0]) which is the mode set as regno_reg_rtx[97] When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = V32QImode. But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the codes). Then combine_modes report ICE. In this situation, I relax it and let it use phi_mode directly. Is it correct ? Thanks. gcc/ChangeLog: * rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax combine in PHI setup. --- gcc/rtl-ssa/functions.cc | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc index c35d25dbf8f..0793598ab1d 100644 --- a/gcc/rtl-ssa/functions.cc +++ b/gcc/rtl-ssa/functions.cc @@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, set_info **assumed_values, // If the input has a known mode (i.e. not BLKmode), make sure // that the phi's mode is at least as large. if (def) - phi_mode = combine_modes (phi_mode, def->mode ()); + { + /* For target like RISC-V, it applies both variable-length +and fixed-length to the same REG_CLASS. + +It will cause ICE for these 2 following cases: + 1. phi_mode: variable-length. + def->mode (): fixed-length. + 2. phi_mode: fixed-length. + def->mode (): variable-length. */ + if (!(GET_MODE_SIZE (phi_mode).is_constant () + ^ GET_MODE_SIZE (def->mode ()).is_constant ())) + phi_mode = combine_modes (phi_mode, def->mode ()); + } } if (phi->mode () != phi_mode) phi->set_mode (phi_mode); -- 2.36.1
RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.
On Mon, 17 Jul 2023, Tamar Christina wrote: > > > > -Original Message- > > From: Richard Biener > > Sent: Friday, July 14, 2023 2:35 PM > > To: Tamar Christina > > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com > > Subject: RE: [PATCH 12/19]middle-end: implement loop peeling and IV > > updates for early break. > > > > On Thu, 13 Jul 2023, Tamar Christina wrote: > > > > > > -Original Message- > > > > From: Richard Biener > > > > Sent: Thursday, July 13, 2023 6:31 PM > > > > To: Tamar Christina > > > > Cc: gcc-patches@gcc.gnu.org; nd ; > > j...@ventanamicro.com > > > > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV > > > > updates for early break. > > > > > > > > On Wed, 28 Jun 2023, Tamar Christina wrote: > > > > > > > > > Hi All, > > > > > > > > > > This patch updates the peeling code to maintain LCSSA during peeling. > > > > > The rewrite also naturally takes into account multiple exits and so > > > > > it didn't > > > > > make sense to split them off. > > > > > > > > > > For the purposes of peeling the only change for multiple exits is > > > > > that the > > > > > secondary exits are all wired to the start of the new loop preheader > > > > > when > > > > doing > > > > > epilogue peeling. > > > > > > > > > > When doing prologue peeling the CFG is kept in tact. > > > > > > > > > > For both epilogue and prologue peeling we wire through between the > > two > > > > loops any > > > > > PHI nodes that escape the first loop into the second loop if > > > > > flow_loops is > > > > > specified. The reason for this conditionality is because > > > > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 > > > > > ways: > > > > > - prologue peeling > > > > > - epilogue peeling > > > > > - loop distribution > > > > > > > > > > for the last case the loops should remain independent, and so not be > > > > connected. > > > > > Because of this propagation of only used phi nodes get_current_def can > > be > > > > used > > > > > to easily find the previous definitions. However live statements > > > > > that are > > > > > not used inside the loop itself are not propagated (since if unused, > > > > > the > > > > moment > > > > > we add the guard in between the two loops the value across the bypass > > edge > > > > can > > > > > be wrong if the loop has been peeled.) > > > > > > > > > > This is dealt with easily enough in find_guard_arg. > > > > > > > > > > For multiple exits, while we are in LCSSA form, and have a correct DOM > > tree, > > > > the > > > > > moment we add the guard block we will change the dominators again. To > > > > deal with > > > > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the > > blocks > > > > to > > > > > update without having to recompute the list of blocks to update again. > > > > > > > > > > When multiple exits and doing epilogue peeling we will also > > > > > temporarily > > have > > > > an > > > > > incorrect VUSES chain for the secondary exits as it anticipates the > > > > > final > > result > > > > > after the VDEFs have been moved. This will thus be corrected once the > > code > > > > > motion is applied. > > > > > > > > > > Lastly by doing things this way we can remove the helper functions > > > > > that > > > > > previously did lock step iterations to update things as it went along. > > > > > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > > > > > > > Ok for master? > > > > > > > > Not sure if I get through all of this in one go - so be prepared that > > > > the rest of the review follows another day. > > > > > > No worries, I appreciate the reviews! > > > Just giving some quick replies for when you continue. > > > > Continueing. > > > > > > > > > > > Thanks, > > > > > Tamar > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops > > > > > = > > > > false. > > > > > * tree-ssa-loop-niter.cc (loop_only_exit_p): Fix bug when > > > > > exit==null. > > > > > * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add > > > > additional > > > > > assert. > > > > > (vect_set_loop_condition_normal): Skip modifying loop IV for > > > > > multiple > > > > > exits. > > > > > (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit > > > > peeling. > > > > > (slpeel_can_duplicate_loop_p): Likewise. > > > > > (vect_update_ivs_after_vectorizer): Don't enter this... > > > > > (vect_update_ivs_after_early_break): ...but instead enter here. > > > > > (find_guard_arg): Update for new peeling code. > > > > > (slpeel_update_phi_nodes_for_loops): Remove. > > > > > (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 > > > > checks. > > > > > (slpeel_update_phi_nodes_for_lcssa): Remove. > > > > > (vect_do_peeling): Fix VF for multiple exits and force epilogue. > > > > > * tree-vect-loop.cc (_
[PATCH] Read global value/mask in IPA.
Instead of reading the known zero bits in IPA, read the value/mask pair which is available. There is a slight change of behavior here. I have removed the check for SSA_NAME, as the ranger can calculate the range and value/mask for INTEGER_CST. This simplifies the code a bit, since there's no special casing when setting the jfunc bits. The default range for VR is undefined, so I think it's safe just to check for undefined_p(). OK? gcc/ChangeLog: * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Read global value/mask. --- gcc/ipa-prop.cc | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc index 5d790ff1265..4f6ed7b89bd 100644 --- a/gcc/ipa-prop.cc +++ b/gcc/ipa-prop.cc @@ -2402,8 +2402,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, } else { - if (TREE_CODE (arg) == SSA_NAME - && param_type + if (param_type && Value_Range::supports_type_p (TREE_TYPE (arg)) && Value_Range::supports_type_p (param_type) && irange::supports_p (TREE_TYPE (arg)) @@ -2422,15 +2421,14 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, gcc_assert (!jfunc->m_vr); } - if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) - && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST)) + if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && !vr.undefined_p ()) { - if (TREE_CODE (arg) == SSA_NAME) - ipa_set_jfunc_bits (jfunc, 0, - widest_int::from (get_nonzero_bits (arg), - TYPE_SIGN (TREE_TYPE (arg; - else - ipa_set_jfunc_bits (jfunc, wi::to_widest (arg), 0); + irange &r = as_a (vr); + irange_bitmask bm = r.get_bitmask (); + signop sign = TYPE_SIGN (TREE_TYPE (arg)); + ipa_set_jfunc_bits (jfunc, + widest_int::from (bm.value (), sign), + widest_int::from (bm.mask (), sign)); } else if (POINTER_TYPE_P (TREE_TYPE (arg))) { -- 2.40.1
Re: Fix optimize_mask_stores profile update
> Am 17.07.2023 um 14:38 schrieb Jan Hubicka : > > >> >>> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches >>> wrote: >>> >>> Hi, >>> While looking into sphinx3 regression I noticed that vectorizer produces >>> BBs with overall probability count 120%. This patch fixes it. >>> Richi, I don't know how to create a testcase, but having one would >>> be nice. >>> >>> Bootstrapped/regtested x86_64-linux, commited last night (sorry for >>> late email) >> >> This should trigger with sth like >> >> for (i) >>if (cond[i]) >> out[i] = 1.; >> >> so a masked store and then using AVX2+. ISTR we disable AVX masked >> stores on zen (but not AVX512). > > OK, let me see if I can get a testcase out of that. >>> efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE); >>> /* Put STORE_BB to likely part. */ >>> efalse->probability = profile_probability::unlikely (); >>> + e->probability = efalse->probability.invert (); >>> store_bb->count = efalse->count (); >> >> isn't the count also wrong? Or rather efalse should be likely(). We're >> testing doing >> >> if (!mask all zeros) >>masked-store >> >> because a masked store with all zero mask can end up invoking COW page fault >> handling multiple times (because it doesn't actually write). > > Hmm, I only fixed the profile, efalse was already set to unlikely, but > indeed I think it should be likely. Maybe we can compute some bound on > actual probability by knowing if(cond[i]) probability. > If the loop always does factor many ones or zeros, the probability would > remain the same. > If that is p and they are all independent, the outcome would be > (1-p)^factor > > sp we know the conditoinal shoul dbe in ragne (1-p)^factor(1-p), > right? Yes. I think the heuristic was added for The case of bigger ranges with all 0/1 for Purely random one wouldn’t expect all zeros ever in practice. Maybe the probability was also set with that special case in mind (which is of course broken) Richard > Honza > >> >> Note -Ofast allows store data races and thus does RMW instead of a masked >> store. >> >>> make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU); >>> if (dom_info_available_p (CDI_DOMINATORS))
[committed] OpenMP/Fortran: Parsing support for 'uses_allocators'
Committed the attached patch as r14-2582-g89d0f082b3c95f. This is about OpenMP's uses_allocators clause to the 'target' directive. Using the clause with predefined allocators as list arguments is required if those allocators are used in a target region - unless there is an 'omp requires dynamic_allocators' in the compilation unit. While the later is a no op (requirement fulfilled by all devices), we still had to handle the no op when using 'uses_allocators', which this commit does. However, uses_allocators also permits to define new allocators; for those, this commit stops after parsing and resolving with a 'sorry, unimplemented'. Support for the latter will be added together with the C/C++ support by a re-diffed/updated version of Chung-Lin's patch at https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html (See thread for pending review issues; the C++ member var issue is https://gcc.gnu.org/PR110347 ) Tobias - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 commit 89d0f082b3c95f68d116d4480126e3ab7fb7f36b Author: Tobias Burnus Date: Mon Jul 17 15:13:44 2023 +0200 OpenMP/Fortran: Parsing support for 'uses_allocators' The 'uses_allocators' clause to the 'target' construct accepts predefined allocators and can also be used to define a new allocator for a target region. As predefined allocators in GCC do not require special handling, those can and are ignored after parsing, such that this feature now works. On the other hand, defining a new allocator will fail for now with a 'sorry, unimplemented'. Note that both the OpenMP 5.0/5.1 and 5.2 syntax for uses_allocators is supported by this commit. 2023-07-17 Tobias Burnus Chung-Lin Tang gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist, show_omp_clauses): Dump uses_allocators clause. * gfortran.h (gfc_free_omp_namelist): Add memspace_sym to u union and traits_sym to u2 union. (OMP_LIST_USES_ALLOCATORS): New enum value. (gfc_free_omp_namelist): Add 'bool free_mem_traits_space' arg. * match.cc (gfc_free_omp_namelist): Likewise. * openmp.cc (gfc_free_omp_clauses, gfc_match_omp_variable_list, gfc_match_omp_to_link, gfc_match_omp_doacross_sink, gfc_match_omp_clause_reduction, gfc_match_omp_allocate, gfc_match_omp_flush): Update call. (gfc_match_omp_clauses): Likewise. Parse uses_allocators clause. (gfc_match_omp_clause_uses_allocators): New. (enum omp_mask2): Add new OMP_CLAUSE_USES_ALLOCATORS. (OMP_TARGET_CLAUSES): Accept it. (resolve_omp_clauses): Resolve uses_allocators clause * st.cc (gfc_free_statement): Update gfc_free_omp_namelist call. * trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_USES_ALLOCATORS; fail with sorry unless predefined allocator. (gfc_split_omp_clauses): Handle uses_allocators. libgomp/ChangeLog: * testsuite/libgomp.fortran/uses_allocators_1.f90: New test. * testsuite/libgomp.fortran/uses_allocators_2.f90: New test. Co-authored-by: Chung-Lin Tang --- gcc/fortran/dump-parse-tree.cc | 24 +++ gcc/fortran/gfortran.h | 5 +- gcc/fortran/match.cc | 7 +- gcc/fortran/openmp.cc | 194 +++-- gcc/fortran/st.cc | 2 +- gcc/fortran/trans-openmp.cc| 11 ++ .../libgomp.fortran/uses_allocators_1.f90 | 168 ++ .../libgomp.fortran/uses_allocators_2.f90 | 99 +++ 8 files changed, 491 insertions(+), 19 deletions(-) diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc index effcebe9325..68122e3e6fd 100644 --- a/gcc/fortran/dump-parse-tree.cc +++ b/gcc/fortran/dump-parse-tree.cc @@ -1497,6 +1497,29 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) case OMP_LINEAR_UVAL: fputs ("uval(", dumpfile); break; default: break; } + else if (list_type == OMP_LIST_USES_ALLOCATORS) + { + if (n->u.memspace_sym) + { + fputs ("memspace(", dumpfile); + fputs (n->sym->name, dumpfile); + fputc (')', dumpfile); + } + if (n->u.memspace_sym && n->u2.traits_sym) + fputc (',', dumpfile); + if (n->u2.traits_sym) + { + fputs ("traits(", dumpfile); + fputs (n->u2.traits_sym->name, dumpfile); + fputc (')', dumpfile); + } + if (n->u.memspace_sym || n->u2.traits_sym) + fputc (':', dumpfile); + fputs (n->sym->name, dumpfile); + i
Re: [WIP RFC] Add support for keyword-based attributes
On Sun, Jul 16, 2023 at 6:50 AM Richard Sandiford wrote: > Jakub Jelinek writes: > > On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via > Gcc-patches wrote: > >> Summary: We'd like to be able to specify some attributes using > >> keywords, rather than the traditional __attribute__ or [[...]] > >> syntax. Would that be OK? > > > > Will defer to C/C++ maintainers, but as you mentioned, there are many > > attributes which really can't be ignored and change behavior > significantly. > > vector_size is one of those, mode attribute another, > > no_unique_address another one (changes ABI in various cases), > > the OpenMP attributes (omp::directive, omp::sequence) can change > > behavior if -fopenmp, etc. > > One can easily error with > > #ifdef __has_cpp_attribute > > #if !__has_cpp_attribute (arm::whatever) > > #error arm::whatever attribute unsupported > > #endif > > #else > > #error __has_cpp_attribute unsupported > > #endif > > Yeah. It's easy to detect whether a particular ACLE feature is supported, > since there are predefined macros for each one. But IMO it's a failing > if we have to recommend that any compilation that uses arm::foo should > also have: > > #ifndef __ARM_FEATURE_FOO > #error arm::foo not supported > #endif > > It ought to be the compiler's job to diagnose its limitations, rather > than the user's. > > The feature macros are more for conditional usage of features, where > there's a fallback available. > > I suppose we could say that users have to include a particular header > file before using an attribute, and use a pragma in that header file to > tell the compiler to enable the attribute. But then there would need to > be a separate header file for each distinct set of attributes (in terms > of historical timeline), which would get ugly. I'm not sure that it's > better than using keywords, or even whether it's better than predefining > the "keyword" as a macro that expands to a compiler-internal attribute. > With a combination of those approaches it can be a single header: #ifdef __ARM_FEATURE_FOO #define __arm_foo [[arm::foo]] // else use of __arm_foo will fail #endif
Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid
Hi, Richard Biener writes: > On Fri, 14 Jul 2023, Andrew MacLeod wrote: > >> >> On 7/14/23 09:37, Richard Biener wrote: >> > On Fri, 14 Jul 2023, Aldy Hernandez wrote: >> > >> >> I don't know what you're trying to accomplish here, as I haven't been >> >> following the PR, but adding all these helper functions to the ranger >> >> header >> >> file seems wrong, especially since there's only one use of them. I see >> >> you're >> >> tweaking the irange API, adding helper functions to range-op (which is >> >> only >> >> for code dealing with implementing range operators for tree codes), etc >> >> etc. >> >> >> >> If you need these helper functions, I suggest you put them closer to their >> >> uses (i.e. wherever the match.pd support machinery goes). >> > Note I suggested the opposite beacuse I thought these kind of helpers >> > are closer to value-range support than to match.pd. >> >> >> probably vr-values.{cc.h} and the simply_using_ranges paradigm would be the >> most sensible place to put these kinds of auxiliary routines? >> >> >> > >> > But I take away from your answer that there's nothing close in the >> > value-range machinery that answers the question whether A op B may >> > overflow? >> >> we dont track it in ranges themselves. During calculation of a range we >> obviously know, but propagating that generally when we rarely care doesn't >> seem worthwhile. The very first generation of irange 6 years ago had an >> overflow_p() flag, but it was removed as not being worth keeping. easier >> to simply ask the question when it matters >> >> As the routines show, it pretty easy to figure out when the need arises so I >> think that should suffice. At least for now, >> >> Should we decide we would like it in general, it wouldnt be hard to add to >> irange. wi_fold() cuurently returns null, it could easily return a bool >> indicating if an overflow happened, and wi_fold_in_parts and fold_range would >> simply OR the results all together of the compoent wi_fold() calls. It would >> require updating/audfiting a number of range-op entries and adding an >> overflowed_p() query to irange. > > Ah, yeah - the folding APIs would be a good fit I guess. I was > also looking to have the "new" helpers to be somewhat consistent > with the ranger API. > > So if we had a fold_range overload with either an output argument > or a flag that makes it return false on possible overflow that > would work I guess? Since we have a virtual class setup we > might be able to provide a default failing method and implement > workers for plus and mult (as needed for this patch) as the need > arises? Thanks for your comments! Here is a concern. The patterns in match.pd may be supported by 'vrp' passes. At that time, the range info would be computed (via the value-range machinery) and cached for each SSA_NAME. In the patterns, when range_of_expr is called for a capture, the range info is retrieved from the cache, and no need to fold_range again. This means the overflow info may also need to be cached together with other range info. There may be additional memory and time cost. BR, Jeff (Jiufu Guo) > > Thanks, > Richard.
Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.
On Mon, 17 Jul 2023, Richard Biener wrote: > > > > > OK. Btw, while I didn't spot this during review I would appreciate > > > > > if the code could use vec.[q]sort, this should work with a lambda as > > > > > well I think. > > > > > > > > That was my first use, but that hits > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99469 > > > > > > That is not hitting PR 99469 but rather it means your comparison is not > > > correct for an (unstable) sort. > > > That is qsort comparator should have this relationship `f(a,b) == !f(b, > > > a)` and > > > `f(a,a)` should also return false. > > > > I'm using the standard std::pair comparator which indicates that f(a,a) is > > true, > > https://en.cppreference.com/w/cpp/utility/pair/operator_cmp > > > > > If you are running into this for qsort here, you will most likely run > > > into issues > > > with std::sort later on too. > > > > Don't see why or how. It needs to have a consistent relationship which > > std::pair > > maintains. So why would using the standard tuple comparator with a standard > > std::sort cause problem? > > At least for > > return left.second < right.second; > > f(a,a) doesn't hold. Note qsort can end up comparing an element to > itself (not sure if GCCs implementation now can). (it cannot but that is not important here) Tamar, while std::sort receives a "less-than" comparison predicate, qsort needs a tri-state comparator that returns a negative value for "less-than" relation, positive for "more-than", and zero when operands are "equal". Passing output of std::pair::operator< straight to qsort is not correct, and qsort_chk catches that mistake at runtime. std::sort is not a stable sort and therefore can cause code generation differences by swapping around elements that are not bitwise-identical but "equal" according to the comparator. This is the main reason for preferring our internal qsort, which yields same results on all platforms. Let me also note that #include is pretty heavy-weight, and so I'd suggest to avoid it to avoid needlessly increasing bootstrap times. Alexander
Re: [WIP RFC] Add support for keyword-based attributes
Hello, On Mon, 17 Jul 2023, Richard Sandiford via Gcc-patches wrote: > >> There are some existing attributes that similarly affect semantics > >> in ways that cannot be ignored. vector_size is one obvious example. > >> But that doesn't make it a good thing. :) > >... > > If you had added __arm(bar(args)) instead of __arm_bar(args) you would only > > need one additional keyword - we could set aside a similar one for each > > target then. I realize that double-nesting of arguments might prove a bit > > challenging but still. > > Yeah, that would work. So, essentially you want unignorable attributes, right? Then implement exactly that: add one new keyword "__known_attribute__" (invent a better name, maybe :) ), semantics exactly as with __attribute__ (including using the same underlying lists in our data structures), with only one single deviation: instead of the warning you give an error for unhandled attributes. Done. (Old compilers will barf of the unknown new keyword, new compilers will error on unknown values used within such attribute list) > > In any case I also think that attributes are what you want and their > > ugliness/issues are not worse than the ugliness/issues of the keyword > > approach IMHO. > > I guess the ugliness of keywords is the double underscore? What are the > issues with the keyword approach though? There are _always_ problems with new keywords, the more new keywords the more problems :-) Is the keyword context-sensitive or not? What about existing system headers that use it right now? Is it recognized in free-standing or not? Is it only recognized for some targets? Is it recognized only for certain configurations of the target? So, let's define one new mechanism, for all targets, all configs, and all language standards. Let's use __attribute__ with a twist :) Ciao, Michael.
Re: [WIP RFC] Add support for keyword-based attributes
Jason Merrill writes: > On Sun, Jul 16, 2023 at 6:50 AM Richard Sandiford > wrote: > >> Jakub Jelinek writes: >> > On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via >> Gcc-patches wrote: >> >> Summary: We'd like to be able to specify some attributes using >> >> keywords, rather than the traditional __attribute__ or [[...]] >> >> syntax. Would that be OK? >> > >> > Will defer to C/C++ maintainers, but as you mentioned, there are many >> > attributes which really can't be ignored and change behavior >> significantly. >> > vector_size is one of those, mode attribute another, >> > no_unique_address another one (changes ABI in various cases), >> > the OpenMP attributes (omp::directive, omp::sequence) can change >> > behavior if -fopenmp, etc. >> > One can easily error with >> > #ifdef __has_cpp_attribute >> > #if !__has_cpp_attribute (arm::whatever) >> > #error arm::whatever attribute unsupported >> > #endif >> > #else >> > #error __has_cpp_attribute unsupported >> > #endif >> >> Yeah. It's easy to detect whether a particular ACLE feature is supported, >> since there are predefined macros for each one. But IMO it's a failing >> if we have to recommend that any compilation that uses arm::foo should >> also have: >> >> #ifndef __ARM_FEATURE_FOO >> #error arm::foo not supported >> #endif >> >> It ought to be the compiler's job to diagnose its limitations, rather >> than the user's. >> >> The feature macros are more for conditional usage of features, where >> there's a fallback available. >> >> I suppose we could say that users have to include a particular header >> file before using an attribute, and use a pragma in that header file to >> tell the compiler to enable the attribute. But then there would need to >> be a separate header file for each distinct set of attributes (in terms >> of historical timeline), which would get ugly. I'm not sure that it's >> better than using keywords, or even whether it's better than predefining >> the "keyword" as a macro that expands to a compiler-internal attribute. >> > > With a combination of those approaches it can be a single header: > > #ifdef __ARM_FEATURE_FOO > #define __arm_foo [[arm::foo]] > // else use of __arm_foo will fail > #endif If we did that, would it be a defined part of the interface that __arm_foo expands to exactly arm::foo, rather than to an obfuscated or compiler-dependent attribute? In other words, would it be a case of providing both the attribute and the macro, and leaving users to choose whether they use the attribute directly (and run the risk of miscompilation) or whether they use the macros, based on their risk appetite? If so, the risk of miscompliation is mostly borne by the people who build the deployed code rather than the people who initially wrote it. If instead we say that the expansion of the macros is compiler-dependent and that the macros must always be used, then I'm not sure the header file provides a better interface than predefining the macros in the compiler (which was the fallback option if the keywords were rejected). But the diagnostics using these macros would be worse than diagnostics based on keywords, not least because the diagnostics about invalid use of the macros (from compilers that understood them) would refer to the underlying attribute rather than the macro. Thanks, Richard
[PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check
gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): Add sorry check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test. * gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test. --- gcc/config/riscv/riscv.cc | 8 .../gcc.target/riscv/rvv/base/zvl-unimplemented-1.c | 4 .../gcc.target/riscv/rvv/base/zvl-unimplemented-2.c | 4 3 files changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 6ed735d6983..82e7c27b057 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -6672,6 +6672,14 @@ riscv_option_override (void) riscv_stack_protector_guard_offset = offs; } + /* FIXME: We don't allow TARGET_MIN_VLEN > 4096 since the datatypes of + both GET_MODE_SIZE and GET_MODE_BITSIZE are poly_uint16. + + We can only allow TARGET_MIN_VLEN * 8 (LMUL) < 65535. */ + if (TARGET_MIN_VLEN > 4096) +sorry ( + "Current RISC-V GCC can not support VLEN > 4096bit for 'V' Extension"); + /* Convert -march to a chunks count. */ riscv_vector_chunks = riscv_convert_vector_bits (); } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c new file mode 100644 index 000..03f67035ca4 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-1.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=rv64gcv_zvl8192b -mabi=lp64d --param riscv-autovec-preference=fixed-vlmax" } */ + +void foo () {} // { dg-excess-errors "sorry, unimplemented: Current RISC-V GCC can not support VLEN > 4096bit for 'V' Extension" } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c new file mode 100644 index 000..075112f2f81 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvl-unimplemented-2.c @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=rv64gcv_zvl8192b -mabi=lp64d --param riscv-autovec-preference=scalable" } */ + +void foo () {} // { dg-excess-errors "sorry, unimplemented: Current RISC-V GCC can not support VLEN > 4096bit for 'V' Extension" } -- 2.36.1
Re: [PATCH V3] RISC-V: Add TARGET_MIN_VLEN > 4096 check
On 7/17/23 08:20, Juzhe-Zhong wrote: gcc/ChangeLog: * config/riscv/riscv.cc (riscv_option_override): Add sorry check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvl-unimplemented-1.c: New test. * gcc.target/riscv/rvv/base/zvl-unimplemented-2.c: New test. OK jeff
RE: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc
I think Andrew is listed as maintainer for tree-ssa, or maybe it's on one of the Richard's lists? > -Original Message- > From: Gcc-patches bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Philipp > Tomsich > Sent: Tuesday, July 11, 2023 7:51 AM > To: Jakub Jelinek > Cc: gcc-patches@gcc.gnu.org; Di Zhao OS > > Subject: Re: [PATCH v2] tree-optimization/110279- Check for nested FMA > chains in reassoc > > Jakub, > > it looks like you did a lot of work on reassoc in the past — could you have a > quick look and comment? > > Thanks, > Philipp. > > > On Tue, 11 Jul 2023 at 04:59, Di Zhao OS > wrote: > > > > Attached is an updated version of the patch. > > > > Based on Philipp's review, some changes: > > > > 1. Defined new enum fma_state to describe the state of FMA candidates > >for a list of operands. (Since the tests seems simple after the > >change, I didn't add predicates on it.) 2. Changed return type of > > convert_mult_to_fma_1 and convert_mult_to_fma > >to tree, to remove the in/out parameter. > > 3. Added description of return value values of rank_ops_for_fma. > > > > --- > > gcc/ChangeLog: > > > > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new > parameter > > check_only_p. Changed return type to tree. > > (struct fma_transformation_info): Moved to header. > > (class fma_deferring_state): Moved to header. > > (convert_mult_to_fma): Added new parameter check_only_p. Changed > > return type to tree. > > * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from > .cc. > > (class fma_deferring_state): Moved from .cc. > > (convert_mult_to_fma): Add function decl. > > * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe > > the state of FMA candidates for a list of operands. > > (rewrite_expr_tree_parallel): Changed boolean parameter to enum > > type. > > (rank_ops_for_fma): Return enum fma_state. > > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are > > found. > > > > Thanks, > > Di Zhao > > > >
Re: [PATCH] Include insn-opinit.h in PLUGIN_H [PR110610]
On 7/17/23 05:55, Andre Vieira (lists) wrote: On 11/07/2023 23:28, Jeff Law wrote: On 7/11/23 04:37, Andre Vieira (lists) via Gcc-patches wrote: Hi, This patch fixes PR110610 by including OPTABS_H in the INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. This will lead to insn-opinit.h, among the other OPTABS_H header files, being installed in the plugin directory. Bootstrapped aarch64-unknown-linux-gnu. @Jakub: could you check to see if it also addresses PR 110284? gcc/ChangeLog: PR 110610 * Makefile.in (INTERNAL_FN_H): Add OPTABS_H. Why use OPTABS_H here? Isn't the new dependency just on insn-opinit.h and insn-codes.h and neither of those #include other headers do they? Yeah, there was no particular reason other than I just felt the Makefile structure sort of lend itself that way. I checked genopinit.cc and it seems insn-opinit.h doesn't include any other header files, only the sources do, so I've changed the patch to only add insn-opinit.h to INTERNAL_FN_H. --- This patch fixes PR110610 by including insn-opinit.h in the INTERNAL_FN_H list, as insn-opinit.h is now required by internal-fn.h. This will lead to insn-opinit.h, among the other OPTABS_H header files, being installed in the plugin directory. Bootstrapped aarch64-unknown-linux-gnu. gcc/ChangeLog: PR 110610 * Makefile.in (INTERNAL_FN_H): Add insn-opinit.h. OK jeff
Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup
Juzhe-Zhong writes: > Hi, Richard. > > RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) > There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) > > When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS > (inserted after RA) ICE: > rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 >13 | } > | ^ > 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3186 > 0x1407616 wider_subreg_mode(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3252 > 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 > 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, > rtl_ssa::set_info**, bitmap_head*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 > 0x2a2c142 rtl_ssa::function_info::simplify_phis() > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 > 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 > 0x1cebab9 pass_vsetvl::init() > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 > 0x1cec150 pass_vsetvl::execute(function*) > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 > > The reason is that we have V32QImode (size = [32,0]) which is the mode set as > regno_reg_rtx[97] > When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = > V32QImode. > But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the > codes). > Then combine_modes report ICE. > > In this situation, I relax it and let it use phi_mode directly. The idea is that phi_mode must be: (a) big enough to store all possible inputs without losing significant bits (b) something that occupies the right number of registers I think the patch loses property (a). I suppose it would be difficult to find a "real" mode that is known to contain both V32QI and VNx2QI without losing property (b). There is some support for using BLKmode as a wildcard mode for registers. Does it work if you add: if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2))) return BLKmode; before the call to wider_subreg_mode in combine_modes? Thanks, Richard > > Is it correct ? > > Thanks. > > gcc/ChangeLog: > > * rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax > combine in PHI setup. > > --- > gcc/rtl-ssa/functions.cc | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc > index c35d25dbf8f..0793598ab1d 100644 > --- a/gcc/rtl-ssa/functions.cc > +++ b/gcc/rtl-ssa/functions.cc > @@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, > set_info **assumed_values, >// If the input has a known mode (i.e. not BLKmode), make sure >// that the phi's mode is at least as large. >if (def) > - phi_mode = combine_modes (phi_mode, def->mode ()); > + { > + /* For target like RISC-V, it applies both variable-length > + and fixed-length to the same REG_CLASS. > + > + It will cause ICE for these 2 following cases: > +1. phi_mode: variable-length. > + def->mode (): fixed-length. > +2. phi_mode: fixed-length. > + def->mode (): variable-length. */ > + if (!(GET_MODE_SIZE (phi_mode).is_constant () > + ^ GET_MODE_SIZE (def->mode ()).is_constant ())) > + phi_mode = combine_modes (phi_mode, def->mode ()); > + } > } >if (phi->mode () != phi_mode) > phi->set_mode (phi_mode);
[PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup
From: Ju-Zhe Zhong Hi, Richard. RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS (inserted after RA) ICE: rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 13 | } | ^ 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3186 0x1407616 wider_subreg_mode(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3252 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, rtl_ssa::set_info**, bitmap_head*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 0x2a2c142 rtl_ssa::function_info::simplify_phis() ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 0x1cebab9 pass_vsetvl::init() ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 0x1cec150 pass_vsetvl::execute(function*) ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 The reason is that we have V32QImode (size = [32,0]) which is the mode set as regno_reg_rtx[97] When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = V32QImode. But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the codes). Then combine_modes report ICE. gcc/ChangeLog: * rtl-ssa/internals.inl: Fix when mode1 and mode2 are not ordred. --- gcc/rtl-ssa/internals.inl | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/rtl-ssa/internals.inl b/gcc/rtl-ssa/internals.inl index 0a61811289d..e49297c12b3 100644 --- a/gcc/rtl-ssa/internals.inl +++ b/gcc/rtl-ssa/internals.inl @@ -673,6 +673,9 @@ combine_modes (machine_mode mode1, machine_mode mode2) if (mode2 == E_BLKmode) return mode1; + if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2))) +return BLKmode; + return wider_subreg_mode (mode1, mode2); } -- 2.36.1
Re: Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup
Thanks so much. It works! https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624694.html Is it OK? juzhe.zh...@rivai.ai From: Richard Sandiford Date: 2023-07-17 22:31 To: Juzhe-Zhong CC: gcc-patches Subject: Re: [PATCH] RTL_SSA: Relax PHI_MODE in phi_setup Juzhe-Zhong writes: > Hi, Richard. > > RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) > There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) > > When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS > (inserted after RA) ICE: > rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 >13 | } > | ^ > 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3186 > 0x1407616 wider_subreg_mode(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3252 > 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 > 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, > rtl_ssa::set_info**, bitmap_head*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 > 0x2a2c142 rtl_ssa::function_info::simplify_phis() > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 > 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 > 0x1cebab9 pass_vsetvl::init() > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 > 0x1cec150 pass_vsetvl::execute(function*) > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 > > The reason is that we have V32QImode (size = [32,0]) which is the mode set as > regno_reg_rtx[97] > When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = > V32QImode. > But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the > codes). > Then combine_modes report ICE. > > In this situation, I relax it and let it use phi_mode directly. The idea is that phi_mode must be: (a) big enough to store all possible inputs without losing significant bits (b) something that occupies the right number of registers I think the patch loses property (a). I suppose it would be difficult to find a "real" mode that is known to contain both V32QI and VNx2QI without losing property (b). There is some support for using BLKmode as a wildcard mode for registers. Does it work if you add: if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2))) return BLKmode; before the call to wider_subreg_mode in combine_modes? Thanks, Richard > > Is it correct ? > > Thanks. > > gcc/ChangeLog: > > * rtl-ssa/functions.cc (function_info::simplify_phi_setup): Relax > combine in PHI setup. > > --- > gcc/rtl-ssa/functions.cc | 14 +- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/gcc/rtl-ssa/functions.cc b/gcc/rtl-ssa/functions.cc > index c35d25dbf8f..0793598ab1d 100644 > --- a/gcc/rtl-ssa/functions.cc > +++ b/gcc/rtl-ssa/functions.cc > @@ -143,7 +143,19 @@ function_info::simplify_phi_setup (phi_info *phi, > set_info **assumed_values, >// If the input has a known mode (i.e. not BLKmode), make sure >// that the phi's mode is at least as large. >if (def) > - phi_mode = combine_modes (phi_mode, def->mode ()); > + { > + /* For target like RISC-V, it applies both variable-length > + and fixed-length to the same REG_CLASS. > + > + It will cause ICE for these 2 following cases: > +1. phi_mode: variable-length. > + def->mode (): fixed-length. > +2. phi_mode: fixed-length. > + def->mode (): variable-length. */ > + if (!(GET_MODE_SIZE (phi_mode).is_constant () > + ^ GET_MODE_SIZE (def->mode ()).is_constant ())) > + phi_mode = combine_modes (phi_mode, def->mode ()); > + } > } >if (phi->mode () != phi_mode) > phi->set_mode (phi_mode);
Re: [PATCH V2] RTL_SSA: Relax PHI_MODE in phi_setup
juzhe.zh...@rivai.ai writes: > From: Ju-Zhe Zhong > > Hi, Richard. > > RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) > There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) > > When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS > (inserted after RA) ICE: > rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 >13 | } > | ^ > 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3186 > 0x1407616 wider_subreg_mode(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl.h:3252 > 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) > ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 > 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, > rtl_ssa::set_info**, bitmap_head*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 > 0x2a2c142 rtl_ssa::function_info::simplify_phis() > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 > 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) > ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 > 0x1cebab9 pass_vsetvl::init() > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 > 0x1cec150 pass_vsetvl::execute(function*) > ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 > > The reason is that we have V32QImode (size = [32,0]) which is the mode set as > regno_reg_rtx[97] > When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = > V32QImode. > But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the > codes). > Then combine_modes report ICE. > > gcc/ChangeLog: > > * rtl-ssa/internals.inl: Fix when mode1 and mode2 are not ordred. OK if it passes testing. Thanks, Richard > --- > gcc/rtl-ssa/internals.inl | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gcc/rtl-ssa/internals.inl b/gcc/rtl-ssa/internals.inl > index 0a61811289d..e49297c12b3 100644 > --- a/gcc/rtl-ssa/internals.inl > +++ b/gcc/rtl-ssa/internals.inl > @@ -673,6 +673,9 @@ combine_modes (machine_mode mode1, machine_mode mode2) >if (mode2 == E_BLKmode) > return mode1; > > + if (!ordered_p (GET_MODE_SIZE (mode1), GET_MODE_SIZE (mode2))) > +return BLKmode; > + >return wider_subreg_mode (mode1, mode2); > }
[PATCH] s390: Optimize vec_cmpge followed by vec_sel
A vec_cmpge produces a negation. Replace this negation by swapping the two selection choices of a vec_sel based on the result of the vec_cmpge. Bootstrapped and regression tested on s390x. gcc/ChangeLog: * config/s390/vx-builtins.md: New vsel pattern. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-cmpge.c: New test. Signed-off-by: Juergen Christ --- gcc/config/s390/vx-builtins.md | 11 +++ .../gcc.target/s390/vector/vec-cmpge.c | 18 ++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index f4248c55d4ec..0ce3ff6ef4a6 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -530,6 +530,17 @@ "vsel\t%v0,%1,%2,%3" [(set_attr "op_type" "VRR")]) +(define_insn "vsel_swapped" + [(set (match_operand:V_HW_FT 0 "register_operand" "=v") + (ior:V_HW_FT +(and:V_HW_FT (not:V_HW_FT (match_operand:V_HW_FT 3 "register_operand" "v")) + (match_operand:V_HW_FT 1 "register_operand" "v")) +(and:V_HW_FT (match_dup 3) + (match_operand:V_HW_FT 2 "register_operand" "v"] + "TARGET_VX" + "vsel\t%v0,%2,%1,%3" + [(set_attr "op_type" "VRR")]) + ; Vector sign extend to doubleword diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c b/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c new file mode 100644 index ..eb188690ae41 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/vec-cmpge.c @@ -0,0 +1,18 @@ +/* Check that vec_sel absorbs a negation generated by vec_cmpge. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -mzarch -march=z13" } */ + +typedef __attribute__((vector_size(16))) unsigned char uv16qi; + +#include + +void f(char *res, uv16qi ctrl) +{ + uv16qi a = vec_splat_u8(0xfe); + uv16qi b = vec_splat_u8(0x80); + uv16qi mask = vec_cmpge(ctrl, b); + *(uv16qi *)res = vec_sel(a, b, mask); +} + +/* { dg-final { scan-assembler-not "vno\t" } } */ -- 2.39.3
Re: PR82943 - Suggested patch to fix
Hello, I wanted to follow up on this, and ask what the next steps would be to incorporate this patch? Thanks, Alexander Westbrooks On Thu, Jun 29, 2023 at 10:38 PM Alexander Westbrooks wrote: > Hello, > > I have finished my testing, and updated my patch and relevant Changelogs. > I added 4 new tests and all the existing tests in the current testsuite > for gfortran passed or failed as expected. Do I need to attach the test > results here? > > The platform I tested on was a Docker container running in Docker Desktop, > running the "mcr.microsoft.com/devcontainers/universal:2-linux" image. > > I also made sure that my code changes followed the coding standards. > Please let me know if there is anything else that I need to do. I don't > have write-access to the repository. > > Thanks, > > Alexander > > On Wed, Jun 28, 2023 at 4:14 PM Harald Anlauf wrote: > >> Hi Alex, >> >> welcome to the gfortran community. It is great that you are trying >> to get actively involved. >> >> You already did quite a few things right: patches shall be sent to >> the gcc-patches ML, but Fortran reviewers usually notice them only >> where they are copied to the fortran ML. >> >> There are some general recommendations on the formatting of C code, >> like indentation, of the patches, and of the commit log entries. >> >> Regarding coding standards, see https://www.gnu.org/prep/standards/ . >> >> Regarding testcases, a recommendation is to have a look at >> existing testcases, e.g. in gcc/testsuite/gfortran.dg/, and then >> decide if the testcase shall test the compile-time or run-time >> behaviour, and add the necessary dejagnu directives. >> >> You should also verify if your patch passes regression testing. >> For changes to gfortran, it is usually sufficient to run >> >> make check-fortran -j >> >> where is the number of parallel tests. >> You would need to report also the platform where you tested on. >> >> There is also a legal issue to consider before non-trivial patches can >> be accepted for incorporation: https://gcc.gnu.org/contribute.html#legal >> >> If your patch is accepted and if you do not have write-access to the >> repository, one of the maintainers will likely take care of it. >> If you become a regular contributor, you will probably want to consider >> getting write access. >> >> Cheers, >> Harald >> >> >> >> On 6/24/23 19:17, Alexander Westbrooks via Gcc-patches wrote: >> > Hello, >> > >> > I am new to the GFortran community. Over the past two weeks I created a >> > patch that should fix PR82943 for GFortran. I have attached it to this >> > email. The patch allows the code below to compile successfully. I am >> > working on creating test cases next, but I am new to the process so it >> may >> > take me some time. After I make test cases, do I email them to you as >> well? >> > Do I need to make a pull-request on github in order to get the patch >> > reviewed? >> > >> > Thank you, >> > >> > Alexander Westbrooks >> > >> > module testmod >> > >> > public :: foo >> > >> > type, public :: tough_lvl_0(a, b) >> > integer, kind :: a = 1 >> > integer, len :: b >> > contains >> > procedure :: foo >> > end type >> > >> > type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c) >> > integer, len :: c >> > contains >> > procedure :: bar >> > end type >> > >> > type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d) >> > integer, len :: d >> > contains >> > procedure :: foobar >> > end type >> > >> > contains >> > subroutine foo(this) >> > class(tough_lvl_0(1,*)), intent(inout) :: this >> > end subroutine >> > >> > subroutine bar(this) >> > class(tough_lvl_1(1,*,*)), intent(inout) :: this >> > end subroutine >> > >> > subroutine foobar(this) >> > class(tough_lvl_2(1,*,*,*)), intent(inout) :: this >> > end subroutine >> > >> > end module >> > >> > PROGRAM testprogram >> > USE testmod >> > >> > TYPE(tough_lvl_0(1,5)) :: test_pdt_0 >> > TYPE(tough_lvl_1(1,5,6)) :: test_pdt_1 >> > TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2 >> > >> > CALL test_pdt_0%foo() >> > >> > CALL test_pdt_1%foo() >> > CALL test_pdt_1%bar() >> > >> > CALL test_pdt_2%foo() >> > CALL test_pdt_2%bar() >> > CALL test_pdt_2%foobar() >> > >> > >> > END PROGRAM testprogram >> >>
Re: [PATCH v3 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]
On 7/16/23 09:47, Nathaniel Shead wrote: On Fri, Jul 14, 2023 at 11:16:58AM -0400, Jason Merrill wrote: What if, instead of removing the variable from one hash table and adding it to another, we change the value to, say, void_node? I have another patch I'm working on after this which does seem to require the overlapping tables to properly catch uses of aggregates while they are still being constructed (i.e. before their lifetime has begun), as part of PR c++/109518. In that case the 'values' map contains the CONSTRUCTOR node for the aggregate, but it also needs to be in 'outside_lifetime'. I could also explore solving this another way however if you prefer. I'd think to handle this with a flag on the CONSTRUCTOR to indicate that members with no value are out of lifetime (so, a stronger version of CONSTRUCTOR_NO_CLEARING that just indicates uninitialized). Currently all the TREE_LANG_FLAG_* are occupied on CONSTRUCTOR, but there seem to be plenty of spare bits to add a TREE_LANG_FLAG_7. It might make sense to access those two flags with accessor functions so they stay aligned. (I also have vague dreams of at some point making this a map to the location that the object was destroyed for more context in the error messages, but I'm not yet sure if that's feasible or will actually be all that helpful so I'm happy to forgo that.) Hmm, that sounds convenient for debugging, but affected cases would also be straightforward to debug by adding a run-time call, so I'm skeptical it would be worth the overhead for successful compiles. Jason
Re: [PATCH v1] RISC-V: Support basic floating-point dynamic rounding mode
On 7/16/23 19:02, juzhe.zh...@rivai.ai wrote: LGTM And as of today, that's all we need ;-) Thanks, Jeff
Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid
On 7/17/23 09:45, Jiufu Guo wrote: Should we decide we would like it in general, it wouldnt be hard to add to irange. wi_fold() cuurently returns null, it could easily return a bool indicating if an overflow happened, and wi_fold_in_parts and fold_range would simply OR the results all together of the compoent wi_fold() calls. It would require updating/audfiting a number of range-op entries and adding an overflowed_p() query to irange. Ah, yeah - the folding APIs would be a good fit I guess. I was also looking to have the "new" helpers to be somewhat consistent with the ranger API. So if we had a fold_range overload with either an output argument or a flag that makes it return false on possible overflow that would work I guess? Since we have a virtual class setup we might be able to provide a default failing method and implement workers for plus and mult (as needed for this patch) as the need arises? Thanks for your comments! Here is a concern. The patterns in match.pd may be supported by 'vrp' passes. At that time, the range info would be computed (via the value-range machinery) and cached for each SSA_NAME. In the patterns, when range_of_expr is called for a capture, the range info is retrieved from the cache, and no need to fold_range again. This means the overflow info may also need to be cached together with other range info. There may be additional memory and time cost. I've been thinking about this a little bit, and how to make the info available in a useful way. I wonder if maybe we just add another entry point to range-ops that looks a bit like fold_range .. Attached is an (untested) patch which ads overflow_free_p(op1, op2, relation) to rangeops. It defaults to returning false. If you want to implement it for say plus, you'd add to operator_plus in range-ops.cc something like operator_plus::overflow_free_p (irange&op1, irange& op2, relation_kind) { // stuff you do in plus_without_overflow } I added relation_kind as param, but you can ignore it. maybe it wont ever help, but it seems like if we know there is a relation between op1 and op2 we might be able to someday determine something else? if not, remove it. Then all you need to do too access it is to go thru range-op_handler.. so for instance: range_op_handler (PLUS_EXPR).overflow_free_p (op1, op2) It'll work for all types an all tree codes. the dispatch machinery will return false unless both op1 and op2 are integral ranges, and then it will invoke the appropriate handler, defaulting to returning FALSE. I also am not a fan of the get_range routine. It would be better to generally just call range_of_expr, get the results, then handle undefined in the new overflow_free_p() routine and return false. varying should not need anything special since it will trigger the overflow when you do the calculation. The auxillary routines could go in vr-values.h/cc. They seem like things that simplify_using_ranges could utilize, and when we get to integrating simplify_using_ranges better, what you are doing may end up there anyway Does that work? Andrew diff --git a/gcc/range-op.cc b/gcc/range-op.cc index d1c735ee6aa..f2a863db286 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -366,6 +366,24 @@ range_op_handler::op1_op2_relation (const vrange &lhs) const } } +bool +range_op_handler::overflow_free_p (const vrange &lh, + const vrange &rh, + relation_trio rel) const +{ + gcc_checking_assert (m_operator); + switch (dispatch_kind (lh, lh, rh)) +{ + case RO_III: + return m_operator->overflow_free_p(as_a (lh), + as_a (rh), + rel); + default: + return false; +} +} + + // Convert irange bitmasks into a VALUE MASK pair suitable for calling CCP. @@ -688,6 +706,13 @@ range_operator::op1_op2_relation_effect (irange &lhs_range ATTRIBUTE_UNUSED, return false; } +bool +range_operator::overflow_free_p (const irange &, const irange &, + relation_trio) const +{ + return false; +} + // Apply any known bitmask updates based on this operator. void diff --git a/gcc/range-op.h b/gcc/range-op.h index af94c2756a7..db3b03f28a5 100644 --- a/gcc/range-op.h +++ b/gcc/range-op.h @@ -147,6 +147,9 @@ public: virtual relation_kind op1_op2_relation (const irange &lhs) const; virtual relation_kind op1_op2_relation (const frange &lhs) const; + + virtual bool overflow_free_p (const irange &lh, const irange &rh, +relation_trio = TRIO_VARYING) const; protected: // Perform an integral operation between 2 sub-ranges and return it. virtual void wi_fold (irange &r, tree type, @@ -214,6 +217,8 @@ public: const vrange &op2, relation_kind = VREL_VARYING) const; relation_kind op1_op2_relation (const vrange &lhs) const; + bool overflow_free_p (const vrange &lh, const vrange &rh, + relation_trio = TRIO_VARYING) const; protected: unsigned dispatch_kind (const vrange &lhs, const vrange &op1, const vrange&
Re: [PATCH] c++: non-standalone surrogate call template
On 7/12/23 14:47, Patrick Palka wrote: Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? There might be an existing PR for this issue but Bugzilla search seems to be timing out for me currently. OK. -- >8 -- I noticed we were accidentally preventing ourselves from considering a pointer/reference-to-function conversion function template if it's not the first conversion function that's considered, which for the testcase below resulted in us accepting the B call but not the A call despite the only difference between A and B being the order of member declarations. This patch fixes this so that the outcome of overload resolution doesn't arbitrarily depend on declaration order in this situation. gcc/cp/ChangeLog: * call.cc (add_template_conv_candidate): Don't check for non-empty 'candidates' here. (build_op_call): Check it here, before we've considered any conversion functions. gcc/testsuite/ChangeLog: * g++.dg/overload/conv-op5.C: New test. --- gcc/cp/call.cc | 24 ++-- gcc/testsuite/g++.dg/overload/conv-op5.C | 18 ++ 2 files changed, 32 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/g++.dg/overload/conv-op5.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 81935b83908..119063979fa 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -3709,12 +3709,6 @@ add_template_conv_candidate (struct z_candidate **candidates, tree tmpl, tree return_type, tree access_path, tree conversion_path, tsubst_flags_t complain) { - /* Making this work broke PR 71117 and 85118, so until the committee resolves - core issue 2189, let's disable this candidate if there are any call - operators. */ - if (*candidates) -return NULL; - return add_template_candidate_real (candidates, tmpl, NULL_TREE, NULL_TREE, NULL_TREE, arglist, return_type, access_path, @@ -5290,6 +5284,8 @@ build_op_call (tree obj, vec **args, tsubst_flags_t complain) LOOKUP_NORMAL, &candidates, complain); } + bool any_call_ops = candidates != nullptr; + convs = lookup_conversions (type); for (; convs; convs = TREE_CHAIN (convs)) @@ -5306,10 +5302,18 @@ build_op_call (tree obj, vec **args, tsubst_flags_t complain) continue; if (TREE_CODE (fn) == TEMPLATE_DECL) - add_template_conv_candidate - (&candidates, fn, obj, *args, totype, -/*access_path=*/NULL_TREE, -/*conversion_path=*/NULL_TREE, complain); + { + /* Making this work broke PR 71117 and 85118, so until the + committee resolves core issue 2189, let's disable this + candidate if there are any call operators. */ + if (any_call_ops) + continue; + + add_template_conv_candidate + (&candidates, fn, obj, *args, totype, + /*access_path=*/NULL_TREE, + /*conversion_path=*/NULL_TREE, complain); + } else add_conv_candidate (&candidates, fn, obj, *args, /*conversion_path=*/NULL_TREE, diff --git a/gcc/testsuite/g++.dg/overload/conv-op5.C b/gcc/testsuite/g++.dg/overload/conv-op5.C new file mode 100644 index 000..b7724908b62 --- /dev/null +++ b/gcc/testsuite/g++.dg/overload/conv-op5.C @@ -0,0 +1,18 @@ +// { dg-do compile { target c++11 } } + +template using F = int(*)(T); +using G = int(*)(int*); + +struct A { + template operator F(); // #1 + operator G() = delete; // #2 +}; + +int i = A{}(0); // selects #1 + +struct B { + operator G() = delete; // #2 + template operator F(); // #1 +}; + +int j = B{}(0); // selects #1
Re: [PATCH] c++: constrained surrogate calls [PR110535]
On 7/12/23 11:54, Patrick Palka wrote: On Wed, 12 Jul 2023, Patrick Palka wrote: We're not checking constraints of pointer/reference-to-function conversion functions during overload resolution, which causes us to ICE on the first testcase and incorrectly reject the second testcase. Er, I noticed [over.call.object] doesn't exactly say that surrogate call functions inherit the constraints of the corresponding conversion function, but I reckon that's the intent? I also assume so. OK. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk/13? PR c++/110535 gcc/cp/ChangeLog: * call.cc (add_conv_candidate): Check constraints. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-surrogate1.C: New test. * g++.dg/cpp2a/concepts-surrogate2.C: New test. --- gcc/cp/call.cc | 8 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C | 12 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C | 14 ++ 3 files changed, 34 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc index 15a3d6f2a1f..81935b83908 100644 --- a/gcc/cp/call.cc +++ b/gcc/cp/call.cc @@ -2588,6 +2588,14 @@ add_conv_candidate (struct z_candidate **candidates, tree fn, tree obj, if (*candidates && (*candidates)->fn == totype) return NULL; + if (!constraints_satisfied_p (fn)) +{ + reason = constraint_failure (); + viable = 0; + return add_candidate (candidates, fn, obj, arglist, len, convs, + access_path, conversion_path, viable, reason, flags); +} + for (i = 0; i < len; ++i) { tree arg, argtype, convert_type = NULL_TREE; diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C new file mode 100644 index 000..e8481a31656 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate1.C @@ -0,0 +1,12 @@ +// PR c++/110535 +// { dg-do compile { target c++20 } } + +using F = int(int); + +template +struct A { + operator F*() requires B; +}; + +int i = A{}(0); // OK +int j = A{}(0); // { dg-error "no match" } diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C new file mode 100644 index 000..8bf8364beb7 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-surrogate2.C @@ -0,0 +1,14 @@ +// PR c++/110535 +// { dg-do compile { target c++20 } } + +using F = int(int); +using G = long(int); + +template +struct A { + operator F&() requires B; + operator G&() requires (!B); +}; + +int i = A{}(0); // { dg-bogus "ambiguous" } +int j = A{}(0); // { dg-bogus "ambiguous" } -- 2.41.0.327.gaa9166bcc0
Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments
On Thu, 2023-07-13 at 17:41 +0800, Kewen.Lin wrote: > Hi Carl, > > on 2023/7/8 04:18, Carl Love wrote: > > GCC maintainers: > > > > Version 3, added code to altivec_resolve_overloaded_builtin so the > > correct instruction is selected for the size of the second > > argument. > > This restores the instruction counts to the original values where > > the > > correct instructions were originally being generated. The naming > > of > > Nice, I have some comments inlined below. > > > the overloaded builtin instances and builtin definitions were > > changed > > to reflect the type of the second argument since the type of the > > first > > argument is now the same for all overloaded instances. A new > > builtin > > test file was added for the case where the first argument is cast > > to > > the unsigned long long type. This test requires the -flax-vector- > > conversions gcc command line option. Since the other tests do not > > require this option, I felt that the new test needed to be in a > > separate file. Finally some formatting fixes were made in the > > original > > test file. Patch has been retested on Power 10 with no > > regressions. > > > > Version 2, fixed various typos. Updated the change log body to say > > the > > instruction counts were updated. The instruction counts changed as > > a > > result of changing the first argument of the vec_replace_unaligned > > builtin call from vector unsigned long long (vull) to vector > > unsigned > > char (vuc). When the first argument was vull the builtin call > > generated the vinsd instruction for the two test cases. The > > updated > > call with vuc as the first argument generates two vinsw > > instructions > > instead. Patch was retested on Power 10 with no regressions. > > > > The following patch fixes the first argument in the builtin > > definition > > and the corresponding test cases. Initially, the builtin > > specification > > was wrong due to a cut and past error. The documentation was fixed > > in: > > > >commit ed3fea09b18f67e757b5768b42cb6e816626f1db > >Author: Bill Schmidt > >Date: Fri Feb 4 13:07:17 2022 -0600 > > > >rs6000: Correct function prototypes for > > vec_replace_unaligned > > > >Due to a pasto error in the documentation, > > vec_replace_unaligned > > was > >implemented with the same function prototypes as > > vec_replace_elt. It was > >intended that vec_replace_unaligned always specify output > > vectors as having > >type vector unsigned char, to emphasize that elements are > > potentially > >misaligned by this built-in function. This patch corrects > > the > >misimplementation. > > > > > > This patch fixes the arguments in the definitions and updates the > > testcases accordingly. Additionally, a few minor spacing issues > > are > > fixed. > > > > The patch has been tested on Power 10 with no regressions. Please > > let > > me know if the patch is acceptable for mainline. Thanks. > > > > Carl > > > > -- > > rs6000, fix vec_replace_unaligned built-in arguments > > > > The first argument of the vec_replace_unaligned built-in should > > always be > > unsigned char, as specified in gcc/doc/extend.texi. > > Maybe "be with type vector unsigned char"? Changed to The first argument of the vec_replace_unaligned built-in should always be of type unsigned char, > > > This patch fixes the builtin definitions and updates the test cases > > to use > > the correct arguments. The original test file is renamed and a > > second test > > file is added for a new test case. > > > > gcc/ChangeLog: > > * config/rs6000/rs6000-builtins.def: Rename > > __builtin_altivec_vreplace_un_uv2di as > > __builtin_altivec_vreplace_un_udi > > __builtin_altivec_vreplace_un_uv4si as > > __builtin_altivec_vreplace_un_usi > > __builtin_altivec_vreplace_un_v2df as > > __builtin_altivec_vreplace_un_df > > __builtin_altivec_vreplace_un_v2di as > > __builtin_altivec_vreplace_un_di > > __builtin_altivec_vreplace_un_v4sf as > > __builtin_altivec_vreplace_un_sf > > __builtin_altivec_vreplace_un_v4si as > > __builtin_altivec_vreplace_un_si. > > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI > > as > > VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF, > > VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as > > VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI. > > Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as > > vreplace_un_si, vreplace_un_v2df as vreplace_un_df, > > vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as > > vreplace_un_sf, vreplace_un_v4si as vreplace_un_si. > > * config/rs6000/rs6000-c.cc (find_instance): Add new argument > > nargs. Add nargs check. Extend function to handle three > > arguments. > > (altivec_resolve_overloaded_builtin): Add new argument nargs to > > function calls. Add case
[PATCH 0/2] rs6000, fix vec_replace_unaligned built-in arguments
GCC maintianers: In the process of fixing the powerpc/vec-replace-word-runnable.c test I found there is an existing issue with function find_instance in rs6000- c.cc. Per the review comments from Kewen in https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html The fix for function find_instance was put into a separate patch followed by a patch for the vec-replace-word-runnable.c test fixes. The two patches have been tested on Power 10 LE with no regression failures. Carl
[PATCH 1/2] rs6000, add argument to function find_instance
GCC maintainers: The rs6000 function find_instance assumes that it is called for built- ins with only two arguments. There is no checking for the actual number of aruguments used in the built-in. This patch adds an additional parameter to the function call containing the number of aruguments in the built-in. The function will now do the needed checks for all of the arguments. This fix is needed for the next patch in the series that fixes the vec_replace_unaligned built-in.c test. Please let me know if this patch is acceptable for mainline. Thanks. Carl rs6000, add argument to function find_instance The function find_instance assumes it is called to check a built-in with only two arguments. Ths patch extends the function by adding a parameter specifying the number of buit-in arguments to check. gcc/ChangeLog: * config/rs6000/rs6000-c.cc (find_instance): Add new parameter that specifies the number of built-in arguments to check. (altivec_resolve_overloaded_builtin): Update calls to find_instance to pass the number of built-in argument to be checked. --- gcc/config/rs6000/rs6000-c.cc | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index a353bca19ef..350987b851b 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -1679,7 +1679,7 @@ tree find_instance (bool *unsupported_builtin, ovlddata **instance, rs6000_gen_builtins instance_code, rs6000_gen_builtins fcode, - tree *types, tree *args) + tree *types, tree *args, int nargs) { while (*instance && (*instance)->bifid != instance_code) *instance = (*instance)->next; @@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin, ovlddata **instance, if (!inst->fntype) return error_mark_node; tree fntype = rs6000_builtin_info[inst->bifid].fntype; - tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype)); - tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype))); + tree argtype = TYPE_ARG_TYPES (fntype); + tree parmtype; + int args_compatible = true; - if (rs6000_builtin_type_compatible (types[0], parmtype0) - && rs6000_builtin_type_compatible (types[1], parmtype1)) + for (int i = 0; i bifid, false) != error_mark_node && rs6000_builtin_is_supported (inst->bifid)) { tree ret_type = TREE_TYPE (inst->fntype); - return altivec_build_resolved_builtin (args, 2, fntype, ret_type, + return altivec_build_resolved_builtin (args, nargs, fntype, ret_type, inst->bifid, fcode); } else @@ -1921,7 +1932,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, instance_code = RS6000_BIF_CMPB_32; tree call = find_instance (&unsupported_builtin, &instance, - instance_code, fcode, types, args); + instance_code, fcode, types, args, nargs); if (call != error_mark_node) return call; break; @@ -1958,7 +1969,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, } tree call = find_instance (&unsupported_builtin, &instance, - instance_code, fcode, types, args); + instance_code, fcode, types, args, nargs); if (call != error_mark_node) return call; break; -- 2.37.2
[PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments
GCC maintainers: Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement rs6000/rs6000-c.cc. The existing REPLACE_ELT iterator name was changed to REPLACE_ELT_V along with the associated define_mode_attr. Renamed VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to REPLACE_ELT_char. Fixed the double test in vec-replace-word- runnable_1.c to be consistent with the other tests. Removed the "dg-do link" from both tests. Put in an explicit cast in test vec-replace-word-runnable_2.c to eliminate the need for the -flax-vector-conversions dg-option. Version 3, added code to altivec_resolve_overloaded_builtin so the correct instruction is selected for the size of the second argument. This restores the instruction counts to the original values where the correct instructions were originally being generated. The naming of the overloaded builtin instances and builtin definitions were changed to reflect the type of the second argument since the type of the first argument is now the same for all overloaded instances. A new builtin test file was added for the case where the first argument is cast to the unsigned long long type. This test requires the -flax-vector- conversions gcc command line option. Since the other tests do not require this option, I felt that the new test needed to be in a separate file. Finally some formatting fixes were made in the original test file. Patch has been retested on Power 10 with no regressions. Version 2, fixed various typos. Updated the change log body to say the instruction counts were updated. The instruction counts changed as a result of changing the first argument of the vec_replace_unaligned builtin call from vector unsigned long long (vull) to vector unsigned char (vuc). When the first argument was vull the builtin call generated the vinsd instruction for the two test cases. The updated call with vuc as the first argument generates two vinsw instructions instead. Patch was retested on Power 10 with no regressions. The following patch fixes the first argument in the builtin definition and the corresponding test cases. Initially, the builtin specification was wrong due to a cut and past error. The documentation was fixed in: commit ed3fea09b18f67e757b5768b42cb6e816626f1db Author: Bill Schmidt Date: Fri Feb 4 13:07:17 2022 -0600 rs6000: Correct function prototypes for vec_replace_unaligned Due to a pasto error in the documentation, vec_replace_unaligned was implemented with the same function prototypes as vec_replace_elt. It was intended that vec_replace_unaligned always specify output vectors as having type vector unsigned char, to emphasize that elements are potentially misaligned by this built-in function. This patch corrects the misimplementation. This patch fixes the arguments in the definitions and updates the testcases accordingly. Additionally, a few minor spacing issues are fixed. The patch has been tested on Power 10 with no regressions. Please let me know if the patch is acceptable for mainline. Thanks. Carl rs6000, fix vec_replace_unaligned built-in arguments The first argument of the vec_replace_unaligned built-in should always be of type unsigned char, as specified in gcc/doc/extend.texi. This patch fixes the builtin definitions and updates the test cases to use the correct arguments. The original test file is renamed and a second test file is added for a new test case. gcc/ChangeLog: * config/rs6000/rs6000-builtins.def: Rename __builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi __builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi __builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df __builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di __builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf __builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si. Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF, VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI. Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as vreplace_un_si, vreplace_un_v2df as vreplace_un_df, vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as vreplace_un_sf, vreplace_un_v4si as vreplace_un_si. * config/rs6000/rs6000-c.cc (find_instance): Add case RS6000_OVLD_VEC_REPLACE_UN. * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un): Fix first argument type. Rename VREPLACE_UN_UV4SI as VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI, VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as VREPLACE_UN_DI, VR
Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math
On Mon, Jul 17, 2023 at 2:30 AM Richard Biener wrote: > > On Fri, 14 Jul 2023, Andrew Pinski wrote: > > > On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches > > wrote: > > > > > > The following makes sure that FP x > y ? x : y style max/min operations > > > are if-converted at the GIMPLE level. While we can neither match > > > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE > > > than the ternary ?: operation we can make sure to maintain this form > > > as a COND_EXPR so backends have the chance to match this to instructions > > > their ISA offers. > > > > > > The patch does this in phiopt where we recognize min/max and instead > > > of giving up when we have to honor NaNs we alter the generated code > > > to a COND_EXPR. > > > > > > This resolves PR88540 and we can then SLP vectorize the min operation > > > for its testcase. It also resolves part of the regressions observed > > > with the change matching bit-inserts of bit-field-refs to vec_perm. > > > > > > Expansion from a COND_EXPR rather than from compare-and-branch > > > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c > > > by producing extra moves while the corresponding min/max operations > > > are now already synthesized by RTL expansion, register selection > > > isn't optimal. This can be also provoked without this change by > > > altering the operand order in the source. > > > > > > It regresses gcc.target/i386/pr110170.c where we end up CSEing the > > > condition which makes RTL expansion no longer produce the min/max > > > directly and code generation is obfuscated enough to confuse > > > RTL if-conversion. > > > > > > It also regresses gcc.target/i386/ssefp-[12].c where oddly one > > > variant isn't if-converted and ix86_expand_fp_movcc doesn't > > > match directly (the FP constants get expanded twice). A fix > > > could be in emit_conditional_move where both prepare_cmp_insn > > > and emit_conditional_move_1 force the constants to (different) > > > registers. > > > > > > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > PR tree-optimization/88540 > > > * tree-ssa-phiopt.cc (minmax_replacement): Do not give up > > > with NaNs but handle the simple case by if-converting to a > > > COND_EXPR. > > > > One thing which I was thinking about adding to phiopt is having the > > last pass do the conversion to COND_EXPR if the target supports a > > conditional move for that expression. That should fix this one right? > > This was one of things I was working towards with the moving to use > > match-and-simplify too. > > Note the if-conversion has to happen before BB SLP but the last > phiopt is too late for this (yes, BB SLP could also be enhanced > to handle conditionals and do if-conversion on-the-fly). For > BB SLP there's also usually jump threading making a mess of > same condition chain of if-convertible ops ... Oh, I didn't think about that. I was thinking more of PR 110170 and PR 106952 when I saw this patch rather than thinking of SLP vectorizer related stuff. > > As for the min + max case that regresses due > to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding > > _1 = _2 < _3; > _4 = _1 ? _2 : _3; > _5 = _1 ? _3 : _2; > > to something more clever would be appropriate anyway. We could > adjust this to either duplicate _1 or expand the COND_EXPRs back > to a single CFG diamond. I suppose force-duplicating non-vector > compares of COND_EXPRs to make TER work again would fix similar > regressions we might already observe (but I'm not aware of many > COND_EXPR generators). Oh yes you had already recorded as PR 105715 too. Thanks, Andrew Pinski > > Richard. > > > Thanks, > > Andrew > > > > > > > > * gcc.target/i386/pr88540.c: New testcase. > > > * gcc.target/i386/pr54855-12.c: Adjust. > > > * gcc.target/i386/pr54855-13.c: Likewise. > > > --- > > > gcc/testsuite/gcc.target/i386/pr54855-12.c | 2 +- > > > gcc/testsuite/gcc.target/i386/pr54855-13.c | 2 +- > > > gcc/testsuite/gcc.target/i386/pr88540.c| 10 ++ > > > gcc/tree-ssa-phiopt.cc | 21 - > > > 4 files changed, 28 insertions(+), 7 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c > > > b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > > index 2f8af392c83..09e8ab8ae39 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c > > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > > @@ -1,6 +1,6 @@ > > > /* { dg-do compile } */ > > > /* { dg-options "-O2 -mavx512fp16" } */ > > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ > > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ > > > /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ > > > /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } > > > } }
[committed] combine: Change return type of predicate functions from int to bool
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * combine.cc (struct reg_stat_type): Change last_set_invalid to bool. (cant_combine_insn_p): Change return type from int to bool and adjust function body accordingly. (can_combine_p): Ditto. (combinable_i3pat): Ditto. Change "i1_not_in_src" and "i0_not_in_src" function arguments from int to bool. (contains_muldiv): Change return type from int to bool and adjust function body accordingly. (try_combine): Ditto. Change "new_direct_jump" pointer function argument from int to bool. Change "substed_i2", "substed_i1", "substed_i0", "added_sets_0", "added_sets_1", "added_sets_2", "i2dest_in_i2src", "i1dest_in_i1src", "i2dest_in_i1src", "i0dest_in_i0src", "i1dest_in_i0src", "i2dest_in_i0src", "i2dest_killed", "i1dest_killed", "i0dest_killed", "i1_feeds_i2_n", "i0_feeds_i2_n", "i0_feeds_i1_n", "i3_subst_into_i2", "have_mult", "swap_i2i3", "split_i2i3" and "changed_i3_dest" variables from int to bool. (subst): Change "in_dest", "in_cond" and "unique_copy" function arguments from int to bool. (combine_simplify_rtx): Change "in_dest" and "in_cond" function arguments from int to bool. (make_extraction): Change "unsignedp", "in_dest" and "in_compare" function argument from int to bool. (force_int_to_mode): Change "just_select" function argument from int to bool. Change "next_select" variable to bool. (rtx_equal_for_field_assignment_p): Change return type from int to bool and adjust function body accordingly. (merge_outer_ops): Ditto. Change "pcomp_p" pointer function argument from int to bool. (get_last_value_validate): Change return type from int to bool and adjust function body accordingly. (reg_dead_at_p): Ditto. (reg_bitfield_target_p): Ditto. (combine_instructions): Ditto. Change "new_direct_jump" variable to bool. (can_combine_p): Change return type from int to bool and adjust function body accordingly. (likely_spilled_retval_p): Ditto. (can_change_dest_mode): Change "added_sets" function argument from int to bool. (find_split_point): Change "unsignedp" variable to bool. (simplify_if_then_else): Change "comparison_p" and "swapped" variables to bool. (simplify_set): Change "other_changed" variable to bool. (expand_compound_operation): Change "unsignedp" variable to bool. (force_to_mode): Change "just_select" function argument from int to bool. Change "next_select" variable to bool. (extended_count): Change "unsignedp" function argument to bool. (simplify_shift_const_1): Change "complement_p" variable to bool. (simplify_comparison): Change "changed" variable to bool. (rest_of_handle_combine): Change return type to void. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/combine.cc b/gcc/combine.cc index 304c020ec79..d9161b257e8 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -156,7 +156,7 @@ struct reg_stat_type { register was assigned last_set_table_tick records the value of label_tick when a value using the register is assigned - last_set_invalid set to nonzero when it is not valid + last_set_invalid set to true when it is not valid to use the value of this register in some register's value @@ -202,11 +202,11 @@ struct reg_stat_type { char last_set_sign_bit_copies; ENUM_BITFIELD(machine_mode) last_set_mode : MACHINE_MODE_BITSIZE; - /* Set nonzero if references to register n in expressions should not be + /* Set to true if references to register n in expressions should not be used. last_set_invalid is set nonzero when this register is being assigned to and last_set_table_tick == label_tick. */ - char last_set_invalid; + bool last_set_invalid; /* Some registers that are set more than once and used in more than one basic block are nevertheless always set in similar ways. For example, @@ -416,35 +416,36 @@ static void do_SUBST_INT (int *, int); static void init_reg_last (void); static void setup_incoming_promotions (rtx_insn *); static void set_nonzero_bits_and_sign_copies (rtx, const_rtx, void *); -static int cant_combine_insn_p (rtx_insn *); -static int can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, - rtx_insn *, rtx_insn *, rtx *, rtx *); -static int combinable_i3pat (rtx_insn *, rtx *, rtx, rtx, rtx, int, int, rtx *); -static int contains_muldiv (rtx); +static bool cant_combine_insn_p (rtx_insn *); +static bool can_combine_p (rtx_insn *, rtx_insn *, rtx_insn *, rtx_insn *, + rtx_insn *, rtx_insn *, rtx *, rtx *); +static bool combin
Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)
> On Jul 13, 2023, at 4:31 PM, Kees Cook wrote: > > In the bug, the problem is that "p" isn't known to be allocated, if I'm > reading that correctly? I think that the major point in PR109557 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557): for the following pointer p.3_1, p.3_1 = p; _2 = __builtin_object_size (p.3_1, 0); Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of p when the TYPE_SIZE can be determined at compile time? Answer: From just knowing the type of the pointee of p, the compiler cannot determine the size of the object. Therefore the bug has been closed. In your following testing 5: > I'm not sure this is a reasonable behavior, but > let me get into the specific test, which looks like this: > > TEST(counted_by_seen_by_bdos) > { >struct annotated *p; >int index = MAX_INDEX + unconst; > >p = alloc_annotated(index); > >REPORT_SIZE(p->array); > /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array)); >/* Check array size alone. */ > /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX); > /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * > sizeof(*p->array)); >/* Check check entire object size. */ > /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX); > /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo * > sizeof(*p->array)); > } > > Test 5 should pass as well, since, again, p can be examined. Passing p > to __bdos implies it is allocated and the __counted_by annotation can be > examined. Since the call to the routine “alloc_annotated" cannot be inlined, GCC does not see any allocation calls for the pointer p. At the same time, due to the same reason as PR109986, GCC cannot determine the size of the object by just knowing the TYPE_SIZE of the pointee of p. So, this is exactly the same issue as PR109557. It’s an existing behavior per the current __buitlin_object_size algorithm. I am still not very sure whether the situation in PR109557 can be improved or not, but either way, it’s a separate issue. Please see the new testing case I added for PR109557, you will see that the above case 5 is a similar case as the new testing case in PR109557. > > If "return p->array[index];" would be caught by the sanitizer, then > it follows that __builtin_dynamic_object_size(p, 1) must also know the > size. i.e. both must examine "p" and its trailing flexible array and > __counted_by annotation. > >> >> 2. The common issue for the failed testing 3, 4, 9, 10 is: >> >> for the following annotated structure: >> >> >> struct annotated { >>unsigned long flags; >>size_t foo; >>int array[] __attribute__((counted_by (foo))); >> }; >> >> >> struct annotated *p; >> int index = 16; >> >> p = malloc(sizeof(*p) + index * sizeof(*p->array)); // allocated real size >> >> p->foo = index + 2; // p->foo was set by a different value than the real >> size of p->array as in test 9 and 10 > > Right, tests 9 and 10 are checking that the _smallest_ possible value of > the array is used. (There are two sources of information: the allocation > size and the size calculated by counted_by. The smaller of the two > should be used when both are available.) The counted_by attribute is used to annotate a Flexible array member on how many elements it will have. However, if this information can not accurately reflect the real number of elements for the array allocated, What’s the purpose of such information? >> or >> p->foo was not set to any value as in test 3 and 4 > > For tests 3 and 4, yes, this was my mistake. I have fixed up the tests > now. Bill noticed the same issue. Sorry for the think-o! > >> >> >> i.e, the value of p->foo is NOT synced with the number of elements allocated >> for the array p->array. >> >> I think that this should be considered as an user error, and the >> documentation of the attribute should include >> this requirement. (In the LLVM’s RFC, such requirement was included in the >> programing model: >> https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854#maintaining-correctness-of-bounds-annotations-18) >> >> We can add a new warning option -Wcounted-by to report such user error if >> needed. >> >> What’s your opinion on this? > > I think it is correct to allow mismatch between allocation and > counted_by as long as only the least of the two is used. What’s your mean by “only the least of the two is used”? > This may be > desirable in a few situations. One example would be a large allocation > that is slowly filled up by the program. So, for such situation, whenever the allocation is filled up, the field that hold the “counted_by” attribute should be increased at the same time, Then, the “counted_by” value always sync with the real allocation. > I.e. the counted_by member is > slowly increased during runtime (but not beyond the true allocation
[pushed] c++: only cache constexpr calls that are constant exprs
Tested x86_64-pc-linux-gnu, applying to trunk. -- 8< -- In reviewing Nathaniel's patch for PR70331, it occurred to me that instead of looking for various specific problematic things in the result of a constexpr call to decide whether to cache it, we should use reduced_constant_expression_p. The change to that function is to avoid crashing on uninitialized objects of non-class type. In a trial version of this patch I checked to see what cases this stopped caching; some were instances of partially-initialized return values, which seem fine to not cache. Some were returning pointers to expiring local variables, which we definitely want not to cache. And one was bit-cast3.C, which will be handled in a follow-up patch. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_call_expression): Only cache reduced_constant_expression_p results. (reduced_constant_expression_p): Handle CONSTRUCTOR of scalar type. (cxx_eval_constant_expression): Fold vectors here. (cxx_eval_bare_aggregate): Not here. --- gcc/cp/constexpr.cc | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index c6f323ebf43..9d85c3be5cc 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -3033,7 +3033,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, } else { - bool cacheable = true; + bool cacheable = !!entry; if (result && result != error_mark_node) /* OK */; else if (!DECL_SAVED_TREE (fun)) @@ -3185,7 +3185,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, for the constexpr evaluation and should not be cached. It is fine if the call allocates something and deallocates it too. */ - if (entry + if (cacheable && (save_heap_alloc_count != ctx->global->heap_vars.length () || (save_heap_dealloc_count != ctx->global->heap_dealloc_count))) @@ -3204,10 +3204,6 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, cacheable = false; break; } - /* Also don't cache a call that returns a deallocated pointer. */ - if (cacheable && (cp_walk_tree_without_duplicates - (&result, find_heap_var_refs, NULL))) - cacheable = false; } /* Rewrite all occurrences of the function's RESULT_DECL with the @@ -3217,6 +3213,10 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t, && !is_empty_class (TREE_TYPE (res))) if (replace_decl (&result, res, ctx->object)) cacheable = false; + + /* Only cache a permitted result of a constant expression. */ + if (cacheable && !reduced_constant_expression_p (result)) + cacheable = false; } else /* Couldn't get a function copy to evaluate. */ @@ -3268,8 +3268,9 @@ reduced_constant_expression_p (tree t) case CONSTRUCTOR: /* And we need to handle PTRMEM_CST wrapped in a CONSTRUCTOR. */ tree field; - if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE) - /* An initialized vector would have a VECTOR_CST. */ + if (!AGGREGATE_TYPE_P (TREE_TYPE (t))) + /* A constant vector would be folded to VECTOR_CST. + A CONSTRUCTOR of scalar type means uninitialized. */ return false; if (CONSTRUCTOR_NO_CLEARING (t)) { base-commit: caabf0973a4e9a26421c94d540e3e20051e93e77 -- 2.39.3
[PATCH RFA (fold)] c++: constexpr bit_cast with empty field
Tested x86_64-pc-linux-gnu, OK for trunk? -- 8< -- The change to only cache constexpr calls that are reduced_constant_expression_p tripped on bit-cast3.C, which failed that predicate due to the presence of an empty field in the result of native_interpret_aggregate, which reduced_constant_expression_p rejects to avoid confusing output_constructor. This patch proposes to skip such fields in native_interpret_aggregate, since they aren't actually involved in the value representation. gcc/ChangeLog: * fold-const.cc (native_interpret_aggregate): Skip empty fields. gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_bit_cast): Check that the result of native_interpret_aggregate doesn't need more evaluation. --- gcc/cp/constexpr.cc | 9 + gcc/fold-const.cc | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc index 9d85c3be5cc..6e8f1c2b61e 100644 --- a/gcc/cp/constexpr.cc +++ b/gcc/cp/constexpr.cc @@ -1440,6 +1440,8 @@ enum value_cat { static tree cxx_eval_constant_expression (const constexpr_ctx *, tree, value_cat, bool *, bool *, tree * = NULL); +static tree cxx_eval_bare_aggregate (const constexpr_ctx *, tree, +value_cat, bool *, bool *); static tree cxx_fold_indirect_ref (const constexpr_ctx *, location_t, tree, tree, bool * = NULL); static tree find_heap_var_refs (tree *, int *, void *); @@ -4803,6 +4805,13 @@ cxx_eval_bit_cast (const constexpr_ctx *ctx, tree t, bool *non_constant_p, { clear_type_padding_in_mask (TREE_TYPE (t), mask); clear_uchar_or_std_byte_in_mask (loc, r, mask); + if (CHECKING_P) + { + tree e = cxx_eval_bare_aggregate (ctx, r, vc_prvalue, + non_constant_p, overflow_p); + gcc_checking_assert (e == r); + r = e; + } } } diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index a02ede79fed..db8f7de5680 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8935,7 +8935,8 @@ native_interpret_aggregate (tree type, const unsigned char *ptr, int off, return NULL_TREE; for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) { - if (TREE_CODE (field) != FIELD_DECL || DECL_PADDING_P (field)) + if (TREE_CODE (field) != FIELD_DECL || DECL_PADDING_P (field) + || integer_zerop (DECL_SIZE (field))) continue; tree fld = field; HOST_WIDE_INT bitoff = 0, pos = 0, sz = 0; base-commit: caabf0973a4e9a26421c94d540e3e20051e93e77 -- 2.39.3
[pushed] extend.texi: index __auto_type
gcc/ChangeLog: * doc/extend.texi: Add @cindex on __auto_type. --- Pushed as obvious. gcc/doc/extend.texi | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 093bd97ba4d..ec9ffa3c86e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -843,6 +843,7 @@ Thus, @code{array (pointer (char), 4)} is the type of arrays of 4 pointers to @code{char}. @end itemize +@cindex @code{__auto_type} in GNU C In GNU C, but not GNU C++, you may also declare the type of a variable as @code{__auto_type}. In that case, the declaration must declare only one variable, whose declarator must just be an identifier, the -- 2.41.0
[RFC v2] RISC-V: Add Ztso atomic mappings
The RISC-V Ztso extension currently has no effect on generated code. With the additional ordering constraints guarenteed by Ztso, we can emit more optimized atomic mappings than the RVWMO mappings. This PR defines a strengthened version of Andrea Parri's proposed Ztso mappings ("Proposed Mapping") [1]. The changes were discussed by Andrea Parri and Hans Boehm on the GCC mailing list and are required in order to be compatible with the RVWMO ABI [2]. This change corresponds to the Ztso psABI proposal[3]. [1] https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst [2] https://inbox.sourceware.org/gcc-patches/ZFV8pNAstwrF2qBb@andrea/T/#t [3] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/391 gcc/ChangeLog: 2023-07-17 Patrick O'Neill * common/config/riscv/riscv-common.cc: Add Ztso and mark Ztso as dependent on 'a' extension. * config/riscv/riscv-opts.h (MASK_ZTSO): New mask. (TARGET_ZTSO): New target. * config/riscv/riscv.cc (riscv_memmodel_needs_amo_acquire): Add Ztso case. (riscv_memmodel_needs_amo_release): Add Ztso case. (riscv_print_operand): Add Ztso case for LR/SC annotations. * config/riscv/riscv.md: Import sync-rvwmo.md and sync-ztso.md. * config/riscv/riscv.opt: Add Ztso target variable. * config/riscv/sync.md (mem_thread_fence_1): Expand to RVWMO or Ztso specific insn. (atomic_load): Expand to RVWMO or Ztso specific insn. (atomic_store): Expand to RVWMO or Ztso specific insn. * config/riscv/sync-rvwmo.md: New file. Seperate out RVWMO specific load/store/fence mappings. * config/riscv/sync-ztso.md: New file. Seperate out Ztso specific load/store/fence mappings. gcc/testsuite/ChangeLog: 2023-07-17 Patrick O'Neill * gcc.target/riscv/amo-table-ztso-amo-add-1.c: New test. * gcc.target/riscv/amo-table-ztso-amo-add-2.c: New test. * gcc.target/riscv/amo-table-ztso-amo-add-3.c: New test. * gcc.target/riscv/amo-table-ztso-amo-add-4.c: New test. * gcc.target/riscv/amo-table-ztso-amo-add-5.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: New test. * gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: New test. * gcc.target/riscv/amo-table-ztso-fence-1.c: New test. * gcc.target/riscv/amo-table-ztso-fence-2.c: New test. * gcc.target/riscv/amo-table-ztso-fence-3.c: New test. * gcc.target/riscv/amo-table-ztso-fence-4.c: New test. * gcc.target/riscv/amo-table-ztso-fence-5.c: New test. * gcc.target/riscv/amo-table-ztso-load-1.c: New test. * gcc.target/riscv/amo-table-ztso-load-2.c: New test. * gcc.target/riscv/amo-table-ztso-load-3.c: New test. * gcc.target/riscv/amo-table-ztso-store-1.c: New test. * gcc.target/riscv/amo-table-ztso-store-2.c: New test. * gcc.target/riscv/amo-table-ztso-store-3.c: New test. * gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: New test. * gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: New test. * gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: New test. * gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: New test. * gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: New test. Signed-off-by: Patrick O'Neill --- gcc/common/config/riscv/riscv-common.cc | 4 + gcc/config/riscv/riscv-opts.h | 4 + gcc/config/riscv/riscv.cc | 20 +++- gcc/config/riscv/riscv.md | 2 + gcc/config/riscv/riscv.opt| 3 + gcc/config/riscv/sync-rvwmo.md| 96 gcc/config/riscv/sync-ztso.md | 80 + gcc/config/riscv/sync.md | 107 ++ .../riscv/amo-table-ztso-amo-add-1.c | 15 +++ .../riscv/amo-table-ztso-amo-add-2.c | 15 +++ .../riscv/amo-table-ztso-amo-add-3.c | 15 +++ .../riscv/amo-table-ztso-amo-add-4.c | 15 +++ .../riscv/amo-table-ztso-amo-add-5.c | 15 +++ .../riscv/amo-table-ztso-compare-exchange-1.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-2.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-3.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-4.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-5.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-6.c | 10 ++ .../riscv/amo-table-ztso-compare-exchange-7.c | 10 ++ .../gcc.target/riscv/amo-table-ztso-fence-1.c | 14 +++ .../g
Re: [PATCH] c++: redundant targ coercion for var/alias tmpls
On Fri, 14 Jul 2023, Jason Merrill wrote: > On 7/14/23 14:07, Patrick Palka wrote: > > On Thu, 13 Jul 2023, Jason Merrill wrote: > > > > > On 7/13/23 11:48, Patrick Palka wrote: > > > > On Wed, 28 Jun 2023, Patrick Palka wrote: > > > > > > > > > On Wed, Jun 28, 2023 at 11:50 AM Jason Merrill > > > > > wrote: > > > > > > > > > > > > On 6/23/23 12:23, Patrick Palka wrote: > > > > > > > On Fri, 23 Jun 2023, Jason Merrill wrote: > > > > > > > > > > > > > > > On 6/21/23 13:19, Patrick Palka wrote: > > > > > > > > > When stepping through the variable/alias template > > > > > > > > > specialization > > > > > > > > > code > > > > > > > > > paths, I noticed we perform template argument coercion twice: > > > > > > > > > first from > > > > > > > > > instantiate_alias_template / finish_template_variable and > > > > > > > > > again > > > > > > > > > from > > > > > > > > > tsubst_decl (during instantiate_template). It should suffice > > > > > > > > > to > > > > > > > > > perform > > > > > > > > > coercion once. > > > > > > > > > > > > > > > > > > To that end patch elides this second coercion from tsubst_decl > > > > > > > > > when > > > > > > > > > possible. We can't get rid of it completely because we don't > > > > > > > > > always > > > > > > > > > specialize a variable template from finish_template_variable: > > > > > > > > > we > > > > > > > > > could > > > > > > > > > also be doing so directly from instantiate_template during > > > > > > > > > variable > > > > > > > > > template partial specialization selection, in which case the > > > > > > > > > coercion > > > > > > > > > from tsubst_decl would be the first and only coercion. > > > > > > > > > > > > > > > > Perhaps we should be coercing in lookup_template_variable rather > > > > > > > > than > > > > > > > > finish_template_variable? > > > > > > > > > > > > > > Ah yes, there's a patch for that at > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :) > > > > > > > > > > > > So after that patch, can we get rid of the second coercion > > > > > > completely? > > > > > > > > > > On second thought it should be possible to get rid of it, if we > > > > > rearrange things to always pass the primary arguments to tsubst_decl, > > > > > and perform partial specialization selection from there instead of > > > > > instantiate_template. Let me try... > > > > > > > > Like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. > > > > > > > > -- >8 -- > > > > > > > > When stepping through the variable/alias template specialization code > > > > paths, I noticed we perform template argument coercion twice: first from > > > > instantiate_alias_template / finish_template_variable and again from > > > > tsubst_decl (during instantiate_template). It'd be good to avoid this > > > > redundant coercion. > > > > > > > > It turns out that this coercion could be safely elided whenever > > > > specializing a primary variable/alias template, because we can rely on > > > > lookup_template_variable and instantiate_alias_template to already have > > > > coerced the arguments. > > > > > > > > The other situation to consider is when fully specializing a partial > > > > variable template specialization (from instantiate_template), in which > > > > case the passed 'args' are the (already coerced) arguments relative to > > > > the partial template and 'argvec', the result of substitution into > > > > DECL_TI_ARGS, are the (uncoerced) arguments relative to the primary > > > > template, so coercion is still necessary. We can still avoid this > > > > coercion however if we always pass the primary variable template to > > > > tsubst_decl from instantiate_template, and instead perform partial > > > > specialization selection directly from tsubst_decl. This patch > > > > implements this approach. > > > > > > The relationship between instantiate_template and tsubst_decl is pretty > > > tangled. We use the former to substitute (often deduced) template > > > arguments > > > into a template, and the latter to substitute template arguments into a > > > use of > > > a template...and also to implement the former. > > > > > > For substitution of uses of a template, we expect to need to coerce the > > > arguments after substitution. But we avoid this issue for variable > > > templates > > > by keeping them as TEMPLATE_ID_EXPR until substitution time, so if we see > > > a > > > VAR_DECL in tsubst_decl it's either a non-template variable or under > > > instantiate_template. > > > > FWIW it seems we could also be in tsubst_decl for a VAR_DECL if > > > >* we're partially instantiating a class-scope variable template > > during instantiation of the class > > Hmm, why don't partial instantiations stay as TEMPLATE_ID_EXPR? Whoops, I accidentally omitted a crucial word. The situation is when partially instantiating a class-scope variable template _declaration_, e.g. for template struct A { template static int v; }; template struct A; we call ts
Re: [PATCH v2 0/2] ifcvt: Allow if conversion of arithmetic in basic blocks with multiple sets
Manolis Tsamis writes: > noce_convert_multiple_sets has been introduced and extended over time to > handle > if conversion for blocks with multiple sets. Currently this is focused on > register moves and rejects any sort of arithmetic operations. > > This series is an extension to allow more sequences to take part in if > conversion. The first patch is a required change to emit correct code and the > second patch whitelists a larger number of operations through > bb_ok_for_noce_convert_multiple_sets. > > For targets that have a rich selection of conditional instructions, > like aarch64, I have seen an ~5x increase of profitable if conversions for > multiple set blocks in SPEC benchmarks. Also tested with a wide variety of > benchmarks and I have not seen performance regressions on either x64 / > aarch64. Interesting results. Are you free to say which target you used for aarch64? If I've understood the cost heuristics correctly, we'll allow a "predictable" branch to be replaced by up to 5 simple conditional instructions and an "unpredictable" branch to be replaced by up to 10 simple conditional instructions. That seems pretty high. And I'm not sure how well we guess predictability in the absence of real profile information. So my gut instinct was that the limitations of the current code might be saving us from overly generous limits. It sounds from your results like that might not be the case though. Still, if it does turn out to be the case in future, I agree we should fix the costs rather than hamstring the code. > Some samples that previously resulted in a branch but now better use these > instructions can be seen in the provided test case. > > Tested on aarch64 and x64; On x64 some tests that use __builtin_rint are > failing with an ICE but I believe that it's not an issue of this change. > force_operand crashes when (and:DF (not:DF (reg:DF 88)) (reg/v:DF 83 [ x ])) > is provided through emit_conditional_move. I guess that needs to be fixed first though. (Thanks for checking both targets.) My main comments on the series are: (1) It isn't obvious which operations should be included in the list in patch 2 and which shouldn't. Also, the patch only checks the outermost operation, and so it allows the inner rtxes to be arbitrarily complex. Because of that, it might be better to remove the condition altogether and just rely on the other routines to do costing and correctness checks. (2) Don't you also need to update the "rewiring" mechanism, to cope with cases where the then block has something like: if (a == 0) { a = b op c; ->a' = a == 0 ? b op c : a; d = a op b; ->d = a == 0 ? a' op b : d; } a = a' At the moment the code only handles regs and subregs, whereas but IIUC it should now iterate over all the regs in the SET_SRC. And I suppose that creates the need for multiple possible rewirings in the same insn, so that it isn't a simple insn -> index mapping any more. Thanks, Richard > > > Changes in v2: > - Change "conditional moves" to "conditional instructions" > in bb_ok_for_noce_convert_multiple_sets's comment. > > Manolis Tsamis (2): > ifcvt: handle sequences that clobber flags in > noce_convert_multiple_sets > ifcvt: Allow more operations in multiple set if conversion > > gcc/ifcvt.cc | 109 ++ > .../aarch64/ifcvt_multiple_sets_arithm.c | 67 +++ > 2 files changed, 127 insertions(+), 49 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
Re: [x86-64] RFC: Add nosse abi attribute
Michael Matz via Gcc-patches writes: > Hello, > > the ELF psABI for x86-64 doesn't have any callee-saved SSE > registers (there were actual reasons for that, but those don't > matter anymore). This starts to hurt some uses, as it means that > as soon as you have a call (say to memmove/memcpy, even if > implicit as libcall) in a loop that manipulates floating point > or vector data you get saves/restores around those calls. > > But in reality many functions can be written such that they only need > to clobber a subset of the 16 XMM registers (or do the save/restore > themself in the codepaths that needs them, hello memcpy again). > So we want to introduce a way to specify this, via an ABI attribute > that basically says "doesn't clobber the high XMM regs". > > I've opted to do only the obvious: do something special only for > xmm8 to xmm15, without a way to specify the clobber set in more detail. > I think such half/half split is reasonable, and as I don't want to > change the argument passing anyway (whose regs are always clobbered) > there isn't that much wiggle room anyway. > > I chose to make it possible to write function definitions with that > attribute with GCC adding the necessary callee save/restore code in > the xlogue itself. Carefully note that this is only possible for > the SSE2 registers, as other parts of them would need instructions > that are only optional. When a function doesn't contain calls to > unknown functions we can be a bit more lenient: we can make it so that > GCC simply doesn't touch xmm8-15 at all, then no save/restore is > necessary. If a function contains calls then GCC can't know which > parts of the XMM regset is clobbered by that, it may be parts > which don't even exist yet (say until avx2048 comes out), so we must > restrict ourself to only save/restore the SSE2 parts and then of course > can only claim to not clobber those parts. > > To that end I introduce actually two related attributes (for naming > see below): > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > * noanysseclobber: claims (and ensures) that nothing of any of the > registers overlapping xmm8-15 is clobbered (not even future, as of > yet unknown, parts) > > Ensuring the first is simple: potentially add saves/restore in xlogue > (e.g. when xmm8 is either used explicitely or implicitely by a call). > Ensuring the second comes with more: we must also ensure that no > functions are called that don't guarantee the same thing (in addition > to just removing all xmm8-15 parts alltogether from the available > regsters). > > See also the added testcases for what I intended to support. > > I chose to use the new target independend function-abi facility for > this. I need some adjustments in generic code: > * the "default_abi" is actually more like a "current" abi: it happily > changes its contents according to conditional_register_usage, > and other code assumes that such changes do propagate. > But if that conditonal_reg_usage is actually done because the current > function is of a different ABI, then we must not change default_abi. > * in insn_callee_abi we do look at a potential fndecl for a call > insn (only set when -fipa-ra), but doesn't work for calls through > pointers and (as said) is optional. So, also always look at the > called functions type (it's always recorded in the MEM_EXPR for > non-libcalls), before asking the target. > (The function-abi accessors working on trees were already doing that, > its just the RTL accessor that missed this) > [...] > diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc > index 2ab9b2c5649..efbe114218c 100644 > --- a/gcc/function-abi.cc > +++ b/gcc/function-abi.cc > @@ -42,6 +42,26 @@ void > predefined_function_abi::initialize (unsigned int id, >const_hard_reg_set full_reg_clobbers) > { > + /* Don't reinitialize an ABI struct. We might be called from reinit_regs > + from the targets conditional_register_usage hook which might depend > + on cfun and might have changed the global register sets according > + to that functions ABI already. That's not the default ABI anymore. > + > + XXX only avoid this if we're reinitializing the default ABI, and the > + current function is _not_ of the default ABI. That's for > + backward compatibility where some backends modify the regsets with > + the exception that those changes are then reflected also in the default > + ABI (which rather is then the "current" ABI). E.g. x86_64 with the > + ms_abi vs sysv attribute. They aren't reflected by separate ABI > + structs, but handled different. The "default" ABI hence changes > + back and forth (and is expected to!) between a ms_abi and a sysv > + function. */ The default ABI is also the eh_edge_abi, and so describes the set of registers that are preserved or clobbered across EH edges. If changing between ms_abi and sysv changes the "default" A
Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)
On Mon, Jul 17, 2023 at 09:17:48PM +, Qing Zhao wrote: > > > On Jul 13, 2023, at 4:31 PM, Kees Cook wrote: > > > > In the bug, the problem is that "p" isn't known to be allocated, if I'm > > reading that correctly? > > I think that the major point in PR109557 > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109557): > > for the following pointer p.3_1, > > p.3_1 = p; > _2 = __builtin_object_size (p.3_1, 0); > > Question: why the size of p.3_1 cannot use the TYPE_SIZE of the pointee of p > when the TYPE_SIZE can be determined at compile time? > > Answer: From just knowing the type of the pointee of p, the compiler cannot > determine the size of the object. Why is that? "p" points to "struct P", which has a fixed size. There must be an assumption somewhere that a pointer is allocated, otherwise __bos would almost never work? > Therefore the bug has been closed. > > In your following testing 5: > > > I'm not sure this is a reasonable behavior, but > > let me get into the specific test, which looks like this: > > > > TEST(counted_by_seen_by_bdos) > > { > >struct annotated *p; > >int index = MAX_INDEX + unconst; > > > >p = alloc_annotated(index); > > > >REPORT_SIZE(p->array); > > /* 1 */ EXPECT_EQ(sizeof(*p), offsetof(typeof(*p), array)); > >/* Check array size alone. */ > > /* 2 */ EXPECT_EQ(__builtin_object_size(p->array, 1), SIZE_MAX); > > /* 3 */ EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->foo * > > sizeof(*p->array)); > >/* Check check entire object size. */ > > /* 4 */ EXPECT_EQ(__builtin_object_size(p, 1), SIZE_MAX); > > /* 5 */ EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->foo > > * sizeof(*p->array)); > > } > > > > Test 5 should pass as well, since, again, p can be examined. Passing p > > to __bdos implies it is allocated and the __counted_by annotation can be > > examined. > > Since the call to the routine “alloc_annotated" cannot be inlined, GCC does > not see any allocation calls for the pointer p. Correct. > At the same time, due to the same reason as PR109986, GCC cannot determine > the size of the object by just knowing the TYPE_SIZE > of the pointee of p. So the difference between test 3 and test 5 is that "p" is explicitly dereferenced to find "array", and therefore an assumption is established that "p" is allocated? > So, this is exactly the same issue as PR109557. It’s an existing behavior > per the current __buitlin_object_size algorithm. > I am still not very sure whether the situation in PR109557 can be improved or > not, but either way, it’s a separate issue. Okay, so the issue is one of object allocation visibility (or assumptions there in)? > Please see the new testing case I added for PR109557, you will see that the > above case 5 is a similar case as the new testing case in PR109557. I will ponder this a bit more to see if I can come up with a useful test case to replace the part from "test 5" above. > > > > If "return p->array[index];" would be caught by the sanitizer, then > > it follows that __builtin_dynamic_object_size(p, 1) must also know the > > size. i.e. both must examine "p" and its trailing flexible array and > > __counted_by annotation. > > > >> > >> 2. The common issue for the failed testing 3, 4, 9, 10 is: > >> > >> for the following annotated structure: > >> > >> > >> struct annotated { > >>unsigned long flags; > >>size_t foo; > >>int array[] __attribute__((counted_by (foo))); > >> }; > >> > >> > >> struct annotated *p; > >> int index = 16; > >> > >> p = malloc(sizeof(*p) + index * sizeof(*p->array)); // allocated real > >> size > >> > >> p->foo = index + 2; // p->foo was set by a different value than the real > >> size of p->array as in test 9 and 10 > > > > Right, tests 9 and 10 are checking that the _smallest_ possible value of > > the array is used. (There are two sources of information: the allocation > > size and the size calculated by counted_by. The smaller of the two > > should be used when both are available.) > > The counted_by attribute is used to annotate a Flexible array member on how > many elements it will have. > However, if this information can not accurately reflect the real number of > elements for the array allocated, > What’s the purpose of such information? For example, imagine code that allocates space for 100 elements since the common case is that the number of elements will grow over time. Elements are added as it goes. For example: struct grows { int alloc_count; int valid_count; struct element item[] __counted_by(valid_count); } *p; void something(void) { p = malloc(sizeof(*p) + sizeof(*p->item) * 100); p->alloc_count = 100; p->valid_count = 0; /* this loop doesn't check that we don't go over 100. */ while (items_to_copy) { struct element *item_ptr = get_next_item(); /* __