[gcc r16-1406] diagnostics: make experimental-html sink prettier [PR116792]
https://gcc.gnu.org/g:cb1d203445c923aa64bca01b0ffb6d3d16a82130 commit r16-1406-gcb1d203445c923aa64bca01b0ffb6d3d16a82130 Author: David Malcolm Date: Tue Jun 10 20:06:38 2025 -0400 diagnostics: make experimental-html sink prettier [PR116792] This patch to the "experimental-html" diagnostic sink: * adds use of the PatternFly 3 CSS library (via an optional link in the generated html to a copy in a CDN) * uses PatternFly's "alert" pattern to show severities for diagnostics, properly nesting "note" diagnostics for diagnostic groups. Example: before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/diagnostic-ranges.c.html after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/diagnostic-ranges.c.html * adds initial support for logical locations and physical locations * adds initial support for multi-level nested diagnostics such as those for C++ concepts diagnostics. Ideally this would show a clickable disclosure widget to expand/collapse a level, but for now it uses nested elements with for the child diagnostics. Example: before: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/before/nested-diagnostics-1.C.html after: https://dmalcolm.fedorapeople.org/gcc/2025-06-10/after/nested-diagnostics-1.C.html gcc/ChangeLog: PR other/116792 * diagnostic-format-html.cc: Include "diagnostic-path.h" and "diagnostic-client-data-hooks.h". (html_builder::m_logical_loc_mgr): New field. (html_builder::m_cur_nesting_levels): New field. (html_builder::m_last_logical_location): New field. (html_builder::m_last_location): New field. (html_builder::m_last_expanded_location): New field. (HTML_STYLE): Add "white-space: pre;" to .source and .annotation. Add "gcc-quoted-text" CSS class. (html_builder::html_builder): Initialize the new fields. If CSS is enabled, add CDN links to PatternFly 3 stylesheets. (html_builder::add_stylesheet): New. (html_builder::on_report_diagnostic): Add "alert" param to make_element_for_diagnostic, setting it by default, but unsetting it for nested diagnostics below the top level. Use add_at_nesting_level for nested diagnostics. (add_nesting_level_attr): New. (html_builder::add_at_nesting_level): New. (get_pf_class_for_alert_div): New. (get_pf_class_for_alert_icon): New. (get_label_for_logical_location_kind): New. (add_labelled_value): New. (html_builder::make_element_for_diagnostic): Add leading comment. Add "alert" param. Drop class="gcc-diagnostic" from tag, instead adding the class for a PatternFly 3 alert if "alert" is true, and adding a with an alert icon, both according to the diagnostic severity. Add a severity prefix to the message for alerts. Add any metadata/option text as suffixes to the message. Show any logical location. Show any physical location. Don't show the locus if the last location is unchanged within the diagnostic_group. Wrap any execution path element in a and add a label to it. Wrap any generated patch in a and add a label to it. (selftest::test_simple_log): Update expected HTML. gcc/testsuite/ChangeLog: PR other/116792 * gcc.dg/html-output/missing-semicolon.py: Update for changes to diagnostic elements. * gcc.dg/format/diagnostic-ranges-html.py: Likewise. * gcc.dg/plugin/diagnostic-test-metadata-html.py: Likewise. Drop out-of-date comment. * gcc.dg/plugin/diagnostic-test-paths-2.py: Likewise. * gcc.dg/plugin/diagnostic-test-paths-4.py: Likewise. Drop out-of-date comment. * gcc.dg/plugin/diagnostic-test-show-locus.py: Likewise. * lib/htmltest.py (get_diag_by_index): Update to use search by id. (get_message_within_diag): Update to use search by class. libcpp/ChangeLog: PR other/116792 * include/line-map.h (typedef expanded_location): Convert to... (struct expanded_location): ...this. (operator==): New decl, for expanded_location. (operator!=): Likewise. * line-map.cc (operator==): New decl, for expanded_location. Signed-off-by: David Malcolm Diff: --- gcc/diagnostic-format-html.cc | 433 +++-- .../gcc.dg/format/diagnostic-ranges-html.py| 15 +- .../gcc.dg/html-output/missing-semicolon.py| 68 +++- .../gcc.dg/plugin/diagnostic-test-metadata-html.py | 38 +- .../gcc.dg/plugin/diagnostic-test-paths-2.py |
[gcc r16-1405] diagnostics: xml: add add_text_from_pp
https://gcc.gnu.org/g:896edb1d0ae90ff1f60a6b894f04eb3c436790f5 commit r16-1405-g896edb1d0ae90ff1f60a6b894f04eb3c436790f5 Author: David Malcolm Date: Tue Jun 10 20:06:38 2025 -0400 diagnostics: xml: add add_text_from_pp Various places use xp.add_text (pp_formatted_text (&pp)) Add a helper function for this. No functional change intended. gcc/ChangeLog: * diagnostic-path-output.cc: Use xml::printer::add_text_from_pp. * diagnostic-show-locus.cc: Likewise. * xml-printer.h (xml::printer::add_text_from_pp): New decl. * xml.cc (xml::node_with_children::add_text_from_pp): New. (xml::printer::add_text_from_pp): New. * xml.h (xml::node_with_children::add_text_from_pp): New decl. Signed-off-by: David Malcolm Diff: --- gcc/diagnostic-path-output.cc | 6 +++--- gcc/diagnostic-show-locus.cc | 2 +- gcc/xml-printer.h | 1 + gcc/xml.cc| 12 gcc/xml.h | 1 + 5 files changed, 18 insertions(+), 4 deletions(-) diff --git a/gcc/diagnostic-path-output.cc b/gcc/diagnostic-path-output.cc index 199028ea7f3a..bae24bf01a70 100644 --- a/gcc/diagnostic-path-output.cc +++ b/gcc/diagnostic-path-output.cc @@ -689,7 +689,7 @@ struct event_range iter_event.print_desc (pp); if (event_label_writer) event_label_writer->begin_label (); - xp.add_text (pp_formatted_text (&pp)); + xp.add_text_from_pp (pp); if (event_label_writer) event_label_writer->end_label (); } @@ -1243,7 +1243,7 @@ print_path_summary_as_html (const path_summary &ps, else pp_printf (&pp, "events %i-%i", range->m_start_idx + 1, range->m_end_idx + 1); - xp.add_text (pp_formatted_text (&pp)); + xp.add_text_from_pp (pp); xp.pop_tag ("span"); } if (show_depths) @@ -1252,7 +1252,7 @@ print_path_summary_as_html (const path_summary &ps, xp.push_tag_with_class ("span", "depth", true); pretty_printer pp; pp_printf (&pp, "(depth %i)", range->m_stack_depth); - xp.add_text (pp_formatted_text (&pp)); + xp.add_text_from_pp (pp); xp.pop_tag ("span"); } xp.pop_tag ("div"); diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc index 575c7ec8d709..ffb72da138d9 100644 --- a/gcc/diagnostic-show-locus.cc +++ b/gcc/diagnostic-show-locus.cc @@ -614,7 +614,7 @@ struct to_html { pp_clear_output_area (&m_scratch_pp); pp_unicode_character (&m_scratch_pp, ch); -m_xp.add_text (pp_formatted_text (&m_scratch_pp)); +m_xp.add_text_from_pp (m_scratch_pp); } void add_utf8_byte (char b) diff --git a/gcc/xml-printer.h b/gcc/xml-printer.h index 24ac2f42e735..428da0a4245d 100644 --- a/gcc/xml-printer.h +++ b/gcc/xml-printer.h @@ -44,6 +44,7 @@ public: void set_attr (const char *name, std::string value); void add_text (std::string text); + void add_text_from_pp (pretty_printer &pp); void add_raw (std::string text); diff --git a/gcc/xml.cc b/gcc/xml.cc index 0a925619f5d3..9077c1ab1300 100644 --- a/gcc/xml.cc +++ b/gcc/xml.cc @@ -121,6 +121,11 @@ node_with_children::add_text (std::string str) add_child (std::make_unique (std::move (str))); } +void +node_with_children::add_text_from_pp (pretty_printer &pp) +{ + add_text (pp_formatted_text (&pp)); +} /* struct document : public node_with_children. */ @@ -251,6 +256,13 @@ printer::add_text (std::string text) parent->add_text (std::move (text)); } +void +printer::add_text_from_pp (pretty_printer &pp) +{ + element *parent = m_open_tags.back (); + parent->add_text_from_pp (pp); +} + void printer::add_raw (std::string text) { diff --git a/gcc/xml.h b/gcc/xml.h index 3c5813a22862..952cfa4376b0 100644 --- a/gcc/xml.h +++ b/gcc/xml.h @@ -65,6 +65,7 @@ struct node_with_children : public node { void add_child (std::unique_ptr node); void add_text (std::string str); + void add_text_from_pp (pretty_printer &pp); std::vector> m_children; };
[gcc r16-1403] gimple-ssa-warn-access: add missing auto_diagnostic_group
https://gcc.gnu.org/g:b619b4d7e7a5078d4fe8b1c4e89258ce4d21be4d commit r16-1403-gb619b4d7e7a5078d4fe8b1c4e89258ce4d21be4d Author: David Malcolm Date: Tue Jun 10 20:06:37 2025 -0400 gimple-ssa-warn-access: add missing auto_diagnostic_group Spotted whilst implementing nesting support in the experimental-html diagnostic sink. gcc/ChangeLog: * gimple-ssa-warn-access.cc (pass_waccess::maybe_check_dealloc_call): Add missing auto_diagnostic_group to nest the "returned from %qD" note within the warning. Signed-off-by: David Malcolm Diff: --- gcc/gimple-ssa-warn-access.cc | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc index 305b63567fea..0f4aff6b59b5 100644 --- a/gcc/gimple-ssa-warn-access.cc +++ b/gcc/gimple-ssa-warn-access.cc @@ -3767,6 +3767,7 @@ pass_waccess::maybe_check_dealloc_call (gcall *call) if (is_gimple_call (def_stmt)) { + auto_diagnostic_group d; bool warned = false; if (gimple_call_alloc_p (def_stmt)) {
[gcc r16-1404] diagnostics: fix tag nesting issues in experimental-html sink [PR120610]
https://gcc.gnu.org/g:3dcce649a1e0833a4c3bb9ced4b9c0b38c3fb8a5 commit r16-1404-g3dcce649a1e0833a4c3bb9ced4b9c0b38c3fb8a5 Author: David Malcolm Date: Tue Jun 10 20:06:37 2025 -0400 diagnostics: fix tag nesting issues in experimental-html sink [PR120610] I've been seeing issues in the experimental-html sink where the nesting of tags goes wrong. The two issues I've seen are: * the pp_token_list from the diagnostic message that reaches the html_token_printer doesn't always have matching pairs of begin/end tokens (PR other/120610) * a bug in diagnostic-show-locus where there was a stray xp.pop_tag, in print_trailing_fixits. This patch: * changes the xml::printer::pop_tag API so that it now takes the expected name of the element being popped (rather than expressing this in comments), and that, by default, the xml::printer asserts that this matches. * gives the html_token_printer its own xml::printer instance to restrict the affected area of the DOM tree; this xml::printer doesn't enforce nesting (PR other/120610) * adds RAII sentinel classes that automatically check for pushes/pops being balanced within a scope, using them in various places * fixes the bug in print_trailing_fixits for html output gcc/ChangeLog: PR other/120610 * diagnostic-format-html.cc (html_builder::html_builder): Update for new param of xml::printer::pop_tag. (html_path_label_writer::end_label): Likewise. (html_builder::make_element_for_diagnostic::html_token_printer): Give the instance its own xml::printer. Update for new param of xml::printer::pop_tag. (html_builder::make_element_for_diagnostic): Give the instance its own xml::printer. (html_builder::make_metadata_element): Update for new param of xml::printer::pop_tag. (html_builder::flush_to_file): Likewise. * diagnostic-path-output.cc (begin_html_stack_frame): Likewise. (begin_html_stack_frame): Likewise. (end_html_stack_frame): Likewise. (print_path_summary_as_html): Likewise. * diagnostic-show-locus.cc (struct to_text::auto_check_tag_nesting): New. (struct to_html:: auto_check_tag_nesting): New. (to_text::pop_html_tag): Change param to const char *. (to_html::pop_html_tag): Likewise; rename param to "expected_name". (default_diagnostic_start_span_fn): Update for new param of xml::printer::pop_tag. (layout_printer::end_label): Likewise. (layout_printer::print_trailing_fixits): Add RAII sentinel to check tag nesting for the HTML case. Delete stray popping of "td" in the presence of fix-it hints. (layout_printer::print_line): Add RAII sentinel to check tag nesting for the HTML case. (diagnostic_source_print_policy::print_as_html): Likewise. (layout_printer::print): Likewise. * xml-printer.h (xml::printer::printer): Add optional "check_popped_tags" param. (xml::printer::pop_tag): Add "expected_name" param. (xml::printer::get_num_open_tags): New accessor. (xml::printer::dump): New decl. (xml::printer::m_check_popped_tags): New field. (class xml::auto_check_tag_nesting): New. (class xml::auto_print_element): Update for new param of pop_tag. * xml.cc: Move pragma pop so that the pragma also covers xml::printer's member functions, "dump" in particular. (xml::printer::printer): Add param "check_popped_tags". (xml::printer::pop_tag): Add param "expected_name" and use it to assert that the popped tag is as expected. Assert that we have a tag to pop. (xml::printer::dump): New. (selftest::test_printer): Update for new param of pop_tag. (selftest::test_attribute_ordering): Likewise. gcc/testsuite/ChangeLog: PR other/120610 * gcc.dg/format/diagnostic-ranges-html.py: Remove out-of-date comment. Signed-off-by: David Malcolm Diff: --- gcc/diagnostic-format-html.cc | 29 -- gcc/diagnostic-path-output.cc | 28 ++--- gcc/diagnostic-show-locus.cc | 36 + .../gcc.dg/format/diagnostic-ranges-html.py| 23 --- gcc/xml-printer.h | 41 --- gcc/xml.cc | 46 -- 6 files changed, 131 insertions(+), 72 deletions(-) diff --git a/gcc/diagnostic-format-html.cc b/gcc/diagnostic-format-html.cc index ddf6ba0f4cfc..6010712b8a5b 100644 --- a/
[gcc r15-9818] libstdc++: Fix std::format thousands separators when sign present [PR120548]
https://gcc.gnu.org/g:974d59aec69b35bd4f7f8464a3bcfc55e849ed1f commit r15-9818-g974d59aec69b35bd4f7f8464a3bcfc55e849ed1f Author: Jonathan Wakely Date: Wed Jun 4 18:22:28 2025 +0100 libstdc++: Fix std::format thousands separators when sign present [PR120548] The leading sign character should be skipped when deciding whether to insert thousands separators into a floating-point format. libstdc++-v3/ChangeLog: PR libstdc++/120548 * include/std/format (__formatter_fp::_M_localize): Do not include a leading sign character in the string to be grouped. * testsuite/std/format/functions/format.cc: Check grouping when sign is present in the output. Reviewed-by: Tomasz Kamiński (cherry picked from commit 2c3559839d70df6311da18fd93237050405580c3) Diff: --- libstdc++-v3/include/std/format | 11 +-- libstdc++-v3/testsuite/std/format/functions/format.cc | 10 ++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 8beef93c7809..d25df1224ffc 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -2362,9 +2362,16 @@ namespace __format const size_t __r = __str.size() - __e; // Length of remainder. auto __overwrite = [&](_CharT* __p, size_t) { // Apply grouping to the digits before the radix or exponent. - auto __end = std::__add_grouping(__p, __np.thousands_sep(), + int __off = 0; + if (auto __c = __str.front(); __c == '-' || __c == '+' || __c == ' ') + { + *__p = __c; + __off = 1; + } + auto __end = std::__add_grouping(__p + __off, __np.thousands_sep(), __grp.data(), __grp.size(), - __str.data(), __str.data() + __e); + __str.data() + __off, + __str.data() + __e); if (__r) // If there's a fractional part or exponent { if (__d != __str.npos) diff --git a/libstdc++-v3/testsuite/std/format/functions/format.cc b/libstdc++-v3/testsuite/std/format/functions/format.cc index 93c33b456e64..6f6f1f15b35b 100644 --- a/libstdc++-v3/testsuite/std/format/functions/format.cc +++ b/libstdc++-v3/testsuite/std/format/functions/format.cc @@ -260,6 +260,16 @@ test_locale() s = std::format(eloc, "{0:Le} {0:Lf} {0:Lg}", -nan); VERIFY( s == "-nan -nan -nan" ); + // PR libstdc++/120548 format confuses a negative sign for a thousands digit + s = std::format(bloc, "{:L}", -123.45); + VERIFY( s == "-123.45" ); + s = std::format(bloc, "{:-L}", -876543.21); + VERIFY( s == "-876,543.21" ); + s = std::format(bloc, "{:+L}", 333.22); + VERIFY( s == "+333.22" ); + s = std::format(bloc, "{: L}", 999.44); + VERIFY( s == " 999.44" ); + // Restore std::locale::global(cloc); }
[gcc r15-9819] libstdc++: Make system_clock::to_time_t always_inline [PR99832]
https://gcc.gnu.org/g:5327eef7b003f66b90841af77c5095eebfa53938 commit r15-9819-g5327eef7b003f66b90841af77c5095eebfa53938 Author: Jonathan Wakely Date: Wed May 28 15:19:18 2025 +0100 libstdc++: Make system_clock::to_time_t always_inline [PR99832] For some 32-bit targets Glibc supports changing the size of time_t to be 64 bits by defining _TIME_BITS=64. That causes an ABI change which would affect std::chrono::system_clock::to_time_t. Because to_time_t is not a function template, its mangled name does not depend on the return type, so it has the same mangled name whether it returns a 32-bit time_t or a 64-bit time_t. On targets where the size of time_t can be selected at preprocessing time, that can cause ODR violations, e.g. the linker selects a definition of to_time_t that returns a 32-bit value but a caller expects 64-bit and so reads 32 bits of garbage from the stack. This commit adds always_inline to to_time_t so that all callers inline the conversion to time_t, and will do so using whatever type time_t happens to be in that translation unit. Existing objects compiled before this change will either have inlined the function anyway (which is likely if compiled with any optimization enabled) or will contain a COMDAT definition of the inline function and so still be able to find it at link-time. The attribute is also added to system_clock::from_time_t, because that's an equally simple function and it seems reasonable for them to both be always inlined. libstdc++-v3/ChangeLog: PR libstdc++/99832 * include/bits/chrono.h (system_clock::to_time_t): Add always_inline attribute to be agnostic to the underlying type of time_t. (system_clock::from_time_t): Add always_inline for consistency with to_time_t. * testsuite/20_util/system_clock/99832.cc: New test. (cherry picked from commit d045eb13b0b42870a1f081895df3901112a358f0) Diff: --- libstdc++-v3/include/bits/chrono.h | 2 ++ libstdc++-v3/testsuite/20_util/system_clock/99832.cc | 14 ++ 2 files changed, 16 insertions(+) diff --git a/libstdc++-v3/include/bits/chrono.h b/libstdc++-v3/include/bits/chrono.h index fad216203d2f..8de8e756c714 100644 --- a/libstdc++-v3/include/bits/chrono.h +++ b/libstdc++-v3/include/bits/chrono.h @@ -1244,6 +1244,7 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2) now() noexcept; // Map to C API + [[__gnu__::__always_inline__]] static std::time_t to_time_t(const time_point& __t) noexcept { @@ -1251,6 +1252,7 @@ _GLIBCXX_BEGIN_INLINE_ABI_NAMESPACE(_V2) (__t.time_since_epoch()).count()); } + [[__gnu__::__always_inline__]] static time_point from_time_t(std::time_t __t) noexcept { diff --git a/libstdc++-v3/testsuite/20_util/system_clock/99832.cc b/libstdc++-v3/testsuite/20_util/system_clock/99832.cc new file mode 100644 index ..693d4d647d9b --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/system_clock/99832.cc @@ -0,0 +1,14 @@ +// { dg-options "-O0 -g0" } +// { dg-do compile { target c++20 } } +// { dg-final { scan-assembler-not "system_clock9to_time_t" } } + +// Bug libstdc++/99832 +// std::chrono::system_clock::to_time_t needs ABI tag for 32-bit time_t + +#include + +std::time_t +test_pr99832(std::chrono::system_clock::time_point t) +{ + return std::chrono::system_clock::to_time_t(t); +}
[gcc r13-9750] libstdc++: Fix incorrect links to archived SGI STL docs
https://gcc.gnu.org/g:d97de614f7a3729f7a841f48748d0b3bf746d0a2 commit r13-9750-gd97de614f7a3729f7a841f48748d0b3bf746d0a2 Author: Jonathan Wakely Date: Tue May 20 10:53:41 2025 +0100 libstdc++: Fix incorrect links to archived SGI STL docs In r8--g25949ee33201f2 I updated some URLs to point to copies of the SGI STL docs in the Wayback Machine, because the original pags were no longer hosted on sgi.com. However, I incorrectly assumed that if one archived page was at https://web.archive.org/web/20171225062613/... then all the other pages would be too. Apparently that's not how the Wayback Machine works, and each page is archived on a different date. That meant that some of our links were redirecting to archived copies of the announcement that the SGI STL docs have gone away. This fixes each URL to refer to a correctly archived copy of the original docs. libstdc++-v3/ChangeLog: * doc/xml/faq.xml: Update URL for archived SGI STL docs. * doc/xml/manual/containers.xml: Likewise. * doc/xml/manual/extensions.xml: Likewise. * doc/xml/manual/using.xml: Likewise. * doc/xml/manual/utilities.xml: Likewise. * doc/html/*: Regenerate. (cherry picked from commit 501e6e786652748ff0ad9a322f74b9b47970031f) Diff: --- libstdc++-v3/doc/html/faq.html | 2 +- libstdc++-v3/doc/html/manual/containers.html| 2 +- libstdc++-v3/doc/html/manual/ext_numerics.html | 2 +- libstdc++-v3/doc/html/manual/ext_sgi.html | 4 ++-- libstdc++-v3/doc/html/manual/using_concurrency.html | 10 +- libstdc++-v3/doc/html/manual/utilities.html | 4 ++-- libstdc++-v3/doc/xml/faq.xml| 2 +- libstdc++-v3/doc/xml/manual/containers.xml | 2 +- libstdc++-v3/doc/xml/manual/extensions.xml | 6 +++--- libstdc++-v3/doc/xml/manual/using.xml | 10 +- libstdc++-v3/doc/xml/manual/utilities.xml | 4 ++-- 11 files changed, 24 insertions(+), 24 deletions(-) diff --git a/libstdc++-v3/doc/html/faq.html b/libstdc++-v3/doc/html/faq.html index 2e913920d05d..d7ad0934cfb9 100644 --- a/libstdc++-v3/doc/html/faq.html +++ b/libstdc++-v3/doc/html/faq.html @@ -797,7 +797,7 @@ Libstdc++-v3 incorporates a lot of code from https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/"; target="_top">the SGI STL (the final merge was from -https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/whats_new.html"; target="_top">release 3.3). +https://web.archive.org/web/20171206110416/http://www.sgi.com/tech/stl/whats_new.html"; target="_top">release 3.3). The code in libstdc++ contains many fixes and changes compared to the original SGI code. diff --git a/libstdc++-v3/doc/html/manual/containers.html b/libstdc++-v3/doc/html/manual/containers.html index 7035a949074d..dcd609a6000d 100644 --- a/libstdc++-v3/doc/html/manual/containers.html +++ b/libstdc++-v3/doc/html/manual/containers.html @@ -11,7 +11,7 @@ Yes it is, at least using the old ABI, and that's okay. This is a decision that we preserved when we imported SGI's STL implementation. The following is - quoted from https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/FAQ.html"; target="_top">their FAQ: + quoted from https://web.archive.org/web/20161222192301/http://www.sgi.com/tech/stl/FAQ.html"; target="_top">their FAQ: The size() member function, for list and slist, takes time proportional to the number of elements in the list. This was a diff --git a/libstdc++-v3/doc/html/manual/ext_numerics.html b/libstdc++-v3/doc/html/manual/ext_numerics.html index 9b864e1dcf4a..c3a5623d1752 100644 --- a/libstdc++-v3/doc/html/manual/ext_numerics.html +++ b/libstdc++-v3/doc/html/manual/ext_numerics.html @@ -14,7 +14,7 @@ The operation functor must be associative. The iota function wins the award for Extension With the Coolest Name (the name comes from Ken Iverson's APL language.) As - described in the https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/iota.html"; target="_top">SGI + described in the https://web.archive.org/web/20170201044840/http://www.sgi.com/tech/stl/iota.html"; target="_top">SGI documentation, it "assigns sequentially increasing values to a range. That is, it assigns value to *first, value + 1 to *(first + 1) and so on." diff --git a/libstdc++-v3/doc/html/manual/ext_sgi.html b/libstdc++-v3/doc/html/manual/ext_sgi.html index ae2062954f4f..2310857804b3 100644 --- a/libstdc++-v3/doc/html/manual/ext_sgi.html +++ b/libstdc++-v3/doc/html/manual/ext_sgi.html @@ -28,12 +28,12 @@ and sets. Each of the associative containers map, multimap, set, and multiset have a counterpart which uses a - https://web.archive.org/web/20171225062613/http:
[gcc r14-11834] libstdc++: Fix incorrect links to archived SGI STL docs
https://gcc.gnu.org/g:e1cbf566970f02a7ac110df58c412be11604b278 commit r14-11834-ge1cbf566970f02a7ac110df58c412be11604b278 Author: Jonathan Wakely Date: Tue May 20 10:53:41 2025 +0100 libstdc++: Fix incorrect links to archived SGI STL docs In r8--g25949ee33201f2 I updated some URLs to point to copies of the SGI STL docs in the Wayback Machine, because the original pags were no longer hosted on sgi.com. However, I incorrectly assumed that if one archived page was at https://web.archive.org/web/20171225062613/... then all the other pages would be too. Apparently that's not how the Wayback Machine works, and each page is archived on a different date. That meant that some of our links were redirecting to archived copies of the announcement that the SGI STL docs have gone away. This fixes each URL to refer to a correctly archived copy of the original docs. libstdc++-v3/ChangeLog: * doc/xml/faq.xml: Update URL for archived SGI STL docs. * doc/xml/manual/containers.xml: Likewise. * doc/xml/manual/extensions.xml: Likewise. * doc/xml/manual/using.xml: Likewise. * doc/xml/manual/utilities.xml: Likewise. * doc/html/*: Regenerate. (cherry picked from commit 501e6e786652748ff0ad9a322f74b9b47970031f) Diff: --- libstdc++-v3/doc/html/faq.html | 2 +- libstdc++-v3/doc/html/manual/containers.html| 2 +- libstdc++-v3/doc/html/manual/ext_numerics.html | 2 +- libstdc++-v3/doc/html/manual/ext_sgi.html | 4 ++-- libstdc++-v3/doc/html/manual/using_concurrency.html | 10 +- libstdc++-v3/doc/html/manual/utilities.html | 4 ++-- libstdc++-v3/doc/xml/faq.xml| 2 +- libstdc++-v3/doc/xml/manual/containers.xml | 2 +- libstdc++-v3/doc/xml/manual/extensions.xml | 6 +++--- libstdc++-v3/doc/xml/manual/using.xml | 10 +- libstdc++-v3/doc/xml/manual/utilities.xml | 4 ++-- 11 files changed, 24 insertions(+), 24 deletions(-) diff --git a/libstdc++-v3/doc/html/faq.html b/libstdc++-v3/doc/html/faq.html index bbe716d5e233..622137939335 100644 --- a/libstdc++-v3/doc/html/faq.html +++ b/libstdc++-v3/doc/html/faq.html @@ -796,7 +796,7 @@ Libstdc++-v3 incorporates a lot of code from https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/"; target="_top">the SGI STL (the final merge was from -https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/whats_new.html"; target="_top">release 3.3). +https://web.archive.org/web/20171206110416/http://www.sgi.com/tech/stl/whats_new.html"; target="_top">release 3.3). The code in libstdc++ contains many fixes and changes compared to the original SGI code. diff --git a/libstdc++-v3/doc/html/manual/containers.html b/libstdc++-v3/doc/html/manual/containers.html index 7035a949074d..dcd609a6000d 100644 --- a/libstdc++-v3/doc/html/manual/containers.html +++ b/libstdc++-v3/doc/html/manual/containers.html @@ -11,7 +11,7 @@ Yes it is, at least using the old ABI, and that's okay. This is a decision that we preserved when we imported SGI's STL implementation. The following is - quoted from https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/FAQ.html"; target="_top">their FAQ: + quoted from https://web.archive.org/web/20161222192301/http://www.sgi.com/tech/stl/FAQ.html"; target="_top">their FAQ: The size() member function, for list and slist, takes time proportional to the number of elements in the list. This was a diff --git a/libstdc++-v3/doc/html/manual/ext_numerics.html b/libstdc++-v3/doc/html/manual/ext_numerics.html index 9b864e1dcf4a..c3a5623d1752 100644 --- a/libstdc++-v3/doc/html/manual/ext_numerics.html +++ b/libstdc++-v3/doc/html/manual/ext_numerics.html @@ -14,7 +14,7 @@ The operation functor must be associative. The iota function wins the award for Extension With the Coolest Name (the name comes from Ken Iverson's APL language.) As - described in the https://web.archive.org/web/20171225062613/http://www.sgi.com/tech/stl/iota.html"; target="_top">SGI + described in the https://web.archive.org/web/20170201044840/http://www.sgi.com/tech/stl/iota.html"; target="_top">SGI documentation, it "assigns sequentially increasing values to a range. That is, it assigns value to *first, value + 1 to *(first + 1) and so on." diff --git a/libstdc++-v3/doc/html/manual/ext_sgi.html b/libstdc++-v3/doc/html/manual/ext_sgi.html index ae2062954f4f..2310857804b3 100644 --- a/libstdc++-v3/doc/html/manual/ext_sgi.html +++ b/libstdc++-v3/doc/html/manual/ext_sgi.html @@ -28,12 +28,12 @@ and sets. Each of the associative containers map, multimap, set, and multiset have a counterpart which uses a - https://web.archive.org/web/20171225062613/http
[gcc r16-1401] More API for IPA profile manipulation
https://gcc.gnu.org/g:e416c8097fc87513e05c2d104c63488f733758c0 commit r16-1401-ge416c8097fc87513e05c2d104c63488f733758c0 Author: Jan Hubicka Date: Tue Jun 10 21:32:40 2025 +0200 More API for IPA profile manipulation This patch attempts to make IPA profile manipulation easier. It introduces node->scale_profile_to (count) which can be used to scale profile to a given IPA count. If IPA count is zero, then local profile is preserved and proper variant of global0 count is used. node->make_profile_local this can be used to drop IPA profile but keep local profile node->make_profile_global0 this can be used to make IPA profile 0 but keep local profile. Most of this can be accomplished by existing apply_scale. I.e. - node->scale_profile_to (count) corresponds to node->apply_scale (count, node->count), - node->make_profile_local corresponds to node->apply_scale (node->count.guessed_local (), node->count) I think new API is more clean about what intention is and less error prone. Also it handles some side cases when entry block of profile happens to be 0, but body is non-zero (by profile inconsistencies). In this case the scaling API did kind of random things. I noticed three bugs in ipa-cp (two already in released GCCs while one mine introduced by last patch): @@ -4528,7 +4528,7 @@ lenient_count_portion_handling (profile_count remainder, cgraph_node *orig_node) if (remainder.ipa_p () && !remainder.ipa ().nonzero_p () && orig_node->count.ipa_p () && orig_node->count.ipa ().nonzero_p () && opt_for_fn (orig_node->decl, flag_profile_partial_training)) -remainder = remainder.guessed_local (); +remainder = orig_node->count.guessed_local (); The code was intended to drop IPA profile to local when remainder is 0. In this case orig_node->count is some non-zero count but all of control flow was redirected to a clone which means that remainer is 0 (adjusted). Doing remainder = remainder.guessed_local (); will turn it into 0 (guessed_local) and the scalling will then multiply all counts by 0 and turn them tinto guessed local. We want to keep original count but reduce the quality. i.e. remainder = orig_node->count.guessed_local (); Second problem is: /* TODO: Profile has alreay gone astray, keep what we have but lower it to global0 category. */ remainder = orig_node->count.global0 (); global0 means that converting to ipa count will be precise 0. Since we lost track it should be adjusted 0 :) Finally in: new_sum = orig_node_count.combine_with_ipa_count (new_sum); orig_node->count = remainder; new_node->apply_scale (new_sum, new_node->count); if (!orig_edges_processed) orig_node->apply_scale (remainder, orig_node->count); orig_node->scale_profile_to (remainder); orig_node->count is first set to remainder and then scalling is done (which in turn does nothing). This is bug I introduced in last path which should have removed orig_node->count = remainder. As a result now counts of cgraph edges are not adjusted correctly. I am sorry for that. gcc/ChangeLog: * cgraph.cc (cgraph_node::make_profile_local): New member function. (cgraph_node::make_profile_global0): New member function. (cgraph_node::apply_scale): Do not call adjust_for_ipa_scalling. (cgraph_node::scale_profile_to): New member function. * cgraph.h (cgraph_node::make_profile_local, cgraph_node::make_profile_global0, cgraph_node::scale_profile_to): Declare. * ipa-cp.cc (lenient_count_portion_handling): Fix logic dropping count to local. (update_counts_for_self_gen_clones): Use scale_profile_to. (update_profiling_info): Use make_profile_local, make_profile_global0 and scale_profile_to. (update_specialized_profile): Likewise. * ipa-inline-transform.cc (clone_inlined_nodes): Call adjust_for_ipa_scalling. Diff: --- gcc/cgraph.cc | 114 +--- gcc/cgraph.h| 14 +- gcc/ipa-cp.cc | 53 ++-- gcc/ipa-inline-transform.cc | 5 +- 4 files changed, 140 insertions(+), 46 deletions(-) diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc index 4a037a7bab10..2f31260207df 100644 --- a/gcc/cgraph.cc +++ b/gcc/cgraph.cc @@ -179,26 +179,128 @@ cgraph_node::function_version (void) return cgraph_fnver_htab->find (&key); } -/* Scale profile by NUM/DEN. Walk into inlined clones. */ +/* If profile is IPA, turn it into local one. */ +void +cgraph_node::make_pr
[gcc/devel/omp/gcc-15] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async
https://gcc.gnu.org/g:7704131525574cd28bf3f779da1e1057c46a1f25 commit 7704131525574cd28bf3f779da1e1057c46a1f25 Author: Tobias Burnus Date: Mon Jun 2 17:43:57 2025 +0200 libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async PR libgomp/120444 include/ChangeLog: * cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare. libgomp/ChangeLog: * libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare. * libgomp.h (struct gomp_device_descr): Add memset_func. * libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}. * libgomp.texi (Device Memory Routines): Document them. * omp.h.in (omp_target_memset, omp_target_memset_async): Declare. * omp_lib.f90.in (omp_target_memset, omp_target_memset_async): Add interfaces. * omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise. * plugin/cuda-lib.def: Add cuMemsetD8. * plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add hsa_amd_memory_fill_fn. (init_hsa_runtime_functions): DLSYM_OPT_FN load it. (GOMP_OFFLOAD_memset): New. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New. * target.c (omp_target_memset_int, omp_target_memset, omp_target_memset_async_helper, omp_target_memset_async): New. (gomp_load_plugin_for_device): Add DLSYM (memset). * testsuite/libgomp.c-c++-common/omp_target_memset.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test. * testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test. * testsuite/libgomp.fortran/omp_target_memset.f90: New test. * testsuite/libgomp.fortran/omp_target_memset-2.f90: New test. (cherry picked from commit 4e47e2f833732c5d9a3c3e69dc753f99b3a56737) Diff: --- include/cuda/cuda.h| 3 + libgomp/libgomp-plugin.h | 1 + libgomp/libgomp.h | 3 +- libgomp/libgomp.map| 6 ++ libgomp/libgomp.texi | 98 +- libgomp/omp.h.in | 4 + libgomp/omp_lib.f90.in | 23 + libgomp/omp_lib.h.in | 25 ++ libgomp/plugin/cuda-lib.def| 1 + libgomp/plugin/plugin-gcn.c| 80 ++ libgomp/plugin/plugin-nvptx.c | 9 ++ libgomp/target.c | 83 ++ .../libgomp.c-c++-common/omp_target_memset-2.c | 62 ++ .../libgomp.c-c++-common/omp_target_memset-3.c | 80 ++ .../libgomp.c-c++-common/omp_target_memset.c | 62 ++ .../libgomp.fortran/omp_target_memset-2.f90| 67 +++ .../libgomp.fortran/omp_target_memset.f90 | 39 + 17 files changed, 642 insertions(+), 4 deletions(-) diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 5e4b7f190ebf..6be1ac0ab438 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -279,6 +279,9 @@ CUresult cuMemcpy3D (const CUDA_MEMCPY3D *); CUresult cuMemcpy3DAsync (const CUDA_MEMCPY3D *, CUstream); CUresult cuMemcpy3DPeer (const CUDA_MEMCPY3D_PEER *); CUresult cuMemcpy3DPeerAsync (const CUDA_MEMCPY3D_PEER *, CUstream); +#define cuMemsetD8 cuMemsetD8_v2 +CUresult cuMemsetD8 (CUdeviceptr, unsigned char, size_t); +CUresult cuMemsetD8Async (CUdeviceptr, unsigned char, size_t, CUstream); #define cuMemFree cuMemFree_v2 CUresult cuMemFree (CUdeviceptr); CUresult cuMemFreeHost (void *); diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 3c7741bcef88..d0bcc237d7fe 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -179,6 +179,7 @@ extern int GOMP_OFFLOAD_memcpy3d (int, int, size_t, size_t, size_t, void *, size_t, size_t, size_t, size_t, size_t, const void *, size_t, size_t, size_t, size_t, size_t); +extern bool GOMP_OFFLOAD_memset (int, void *, int, size_t); extern bool GOMP_OFFLOAD_can_run (void *); extern void GOMP_OFFLOAD_run (int, void *, void *, void **); extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 571ac62ca998..465f7c1b4ea5 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1441,9 +1441,10 @@ struct gomp_device_descr __typeof (GOMP_OFFLOAD_page_locked_host_free) *page_locked_host_free_func; __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func; __typeof (GOMP_OFFLOAD_host2dev) *host2dev_func; + __typeof (GOMP_OFFLOAD_dev2dev) *dev2dev_func; __typeof (GOMP_OFFLOAD_memcpy2d) *memcpy2d_func; __typeof (GOMP_OFFLOAD_m
[gcc/devel/omp/gcc-15] gcn: Add experimental MI300 (gfx942) support
https://gcc.gnu.org/g:5e75ec7168fd3ea5b7791ed67f25a29b44967fc3 commit 5e75ec7168fd3ea5b7791ed67f25a29b44967fc3 Author: Tobias Burnus Date: Tue Jun 10 15:12:47 2025 +0200 gcn: Add experimental MI300 (gfx942) support As gfx942 and gfx950 belong to gfx9-4-generic, the latter two are also added. Note that there are no specific optimizations for MI300, yet. For none of the mentioned devices, any multilib is build by default; use '--with-multilib-list=' when configuring GCC to build them alongside. gfx942 was added in LLVM (and its mc assembler, used by GCC) in version 18, generic support in LLVM 19 and gfx950 in LLVM 20. gcc/ChangeLog: * config/gcn/gcn-devices.def: Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-opts.h (TARGET_CDNA3, TARGET_CDNA3_PLUS, TARGET_GLC_NAME, TARGET_TARGET_SC_CACHE): Define. (TARGET_ARCHITECTED_FLAT_SCRATCH): Use also for CDNA3. * config/gcn/gcn.h (gcn_isa): Add ISA_CDNA3 to the enum. * config/gcn/gcn.cc (print_operand): Update 'g' to use TARGET_GLC_NAME; add 'G' to print TARGET_GLC_NAME unconditionally. * config/gcn/gcn-valu.md (scatter, gather): Use TARGET_GLC_NAME. * config/gcn/gcn.md: Use %G instead of glc; use 'buffer_inv sc1' for TARGET_TARGET_SC_CACHE. * doc/invoke.texi (march): Add gfx942, gfx950 and gfx9-4-generic. * doc/install.texi (amdgcn*-*-*): Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-tables.opt: Regenerate. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h (gfx942): New variant function. * testsuite/libgomp.c/declare-variant-4-gfx942.c: New test. (cherry picked from commit 37b454b7e171bd8a792cbe4c57ea0f9702afa22d) Diff: --- gcc/config/gcn/gcn-devices.def | 33 gcc/config/gcn/gcn-opts.h | 13 +- gcc/config/gcn/gcn-tables.opt | 9 ++ gcc/config/gcn/gcn-valu.md | 8 +- gcc/config/gcn/gcn.cc | 8 +- gcc/config/gcn/gcn.h | 2 + gcc/config/gcn/gcn.md | 168 + gcc/doc/install.texi | 17 ++- gcc/doc/invoke.texi| 10 ++ .../testsuite/libgomp.c/declare-variant-4-gfx942.c | 8 + libgomp/testsuite/libgomp.c/declare-variant-4.h| 8 + 11 files changed, 208 insertions(+), 76 deletions(-) diff --git a/gcc/config/gcn/gcn-devices.def b/gcc/config/gcn/gcn-devices.def index af1420382e2f..426acf0cb7a5 100644 --- a/gcc/config/gcn/gcn-devices.def +++ b/gcc/config/gcn/gcn-devices.def @@ -171,6 +171,28 @@ GCN_DEVICE(gfx90c, GFX90C, 0x32, ISA_GCN5, /* Generic Name */ GFX9_GENERIC ) +GCN_DEVICE(gfx942, GFX942, 0x4c, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_ANY, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 512, + /* Generic code obj version */ 0, /* non-generic */ + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + +GCN_DEVICE(gfx950, GFX950, 0x4f, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_ANY, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 512, + /* Generic code obj version */ 0, /* non-generic */ + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + GCN_DEVICE(gfx9-generic, GFX9_GENERIC, 0x051, ISA_GCN5, /* XNACK default */ HSACO_ATTR_ANY, /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED, @@ -182,6 +204,17 @@ GCN_DEVICE(gfx9-generic, GFX9_GENERIC, 0x051, ISA_GCN5, /* Generic Name */ NONE ) +GCN_DEVICE(gfx9-4-generic, GFX9_4_GENERIC, 0x05f, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 256, + /* Generic code obj version */ 1, + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + /* GCN GFX10.3 (RDNA 2) */ GCN_DEVICE(gfx1030, GFX1030, 0x36, ISA_RDNA2, diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 88f562dfc1e1..bcea14f3fe7a 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -33,7 +33,8 @@ extern enum gcn_isa { ISA_RDNA2, ISA_RDNA3, ISA_CDNA1, - ISA_CDNA2 + ISA_CDNA2, + ISA_CDNA3 } gcn_isa; #define TARGET_GCN5 (gcn_isa == ISA_GCN5) @@ -41,6 +42,8 @@ exter
[gcc/devel/omp/gcc-15] Merge branch 'releases/gcc-15' into devel/omp/gcc-15
https://gcc.gnu.org/g:682e7678f3d2b5b974bf564deea7a405f0fd37bf commit 682e7678f3d2b5b974bf564deea7a405f0fd37bf Merge: f34abf47bf57 5327eef7b003 Author: Tobias Burnus Date: Tue Jun 10 21:56:49 2025 +0200 Merge branch 'releases/gcc-15' into devel/omp/gcc-15 Merge up to r15-9819-g5327eef7b003f6 (June 10, 2025) Diff: gcc/ChangeLog | 95 +++ gcc/DATESTAMP | 2 +- gcc/ada/ChangeLog | 112 +++ gcc/ada/checks.adb | 15 +- gcc/ada/contracts.adb | 103 +-- gcc/ada/einfo.ads | 2 +- gcc/ada/exp_aggr.adb | 16 +- gcc/ada/exp_attr.adb | 9 +- gcc/ada/exp_ch4.adb| 62 +- gcc/ada/exp_ch5.adb| 24 +- gcc/ada/exp_util.adb | 148 +++- gcc/ada/exp_util.ads | 18 +- gcc/ada/freeze.adb | 11 +- gcc/ada/libgnarl/s-stusta.adb | 5 +- gcc/ada/sem_case.adb | 8 +- gcc/ada/sem_ch10.adb | 2 + gcc/ada/sem_ch12.adb | 15 +- gcc/ada/sem_ch3.adb| 4 +- gcc/ada/sem_ch4.adb| 911 +++-- gcc/ada/sem_prag.adb | 9 +- gcc/cp/ChangeLog | 16 + gcc/cp/constexpr.cc| 3 +- gcc/cp/cp-gimplify.cc | 21 +- gcc/cp/decl2.cc| 33 +- gcc/ext-dce.cc | 17 +- gcc/testsuite/ChangeLog| 75 ++ gcc/testsuite/g++.dg/cpp1z/constexpr-if39.C| 30 + gcc/testsuite/g++.dg/cpp2a/constexpr-prvalue2.C| 26 + gcc/tree-vectorizer.h | 1 + libstdc++-v3/ChangeLog | 19 + libstdc++-v3/include/bits/chrono.h | 2 + libstdc++-v3/include/std/format| 11 +- .../testsuite/20_util/system_clock/99832.cc| 14 + .../testsuite/std/format/functions/format.cc | 10 + 34 files changed, 1414 insertions(+), 435 deletions(-)
[gcc/devel/omp/gcc-15] ChangeLog.omp bump
https://gcc.gnu.org/g:a6a5a2674c5c7f2ae64277c7f79a3b8c20a87fc6 commit a6a5a2674c5c7f2ae64277c7f79a3b8c20a87fc6 Author: Tobias Burnus Date: Tue Jun 10 21:57:52 2025 +0200 ChangeLog.omp bump Diff: --- gcc/ChangeLog.omp | 19 +++ gcc/DATESTAMP.omp | 2 +- include/ChangeLog.omp | 8 libgomp/ChangeLog.omp | 37 + 4 files changed, 65 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp index 6ac795bf4c33..9934978ef5b4 100644 --- a/gcc/ChangeLog.omp +++ b/gcc/ChangeLog.omp @@ -1,3 +1,22 @@ +2025-06-10 Tobias Burnus + + Backported from master: + 2025-06-10 Tobias Burnus + + * config/gcn/gcn-devices.def: Add gfx942, gfx950 and gfx9-4-generic. + * config/gcn/gcn-opts.h (TARGET_CDNA3, TARGET_CDNA3_PLUS, + TARGET_GLC_NAME, TARGET_TARGET_SC_CACHE): Define. + (TARGET_ARCHITECTED_FLAT_SCRATCH): Use also for CDNA3. + * config/gcn/gcn.h (gcn_isa): Add ISA_CDNA3 to the enum. + * config/gcn/gcn.cc (print_operand): Update 'g' to use + TARGET_GLC_NAME; add 'G' to print TARGET_GLC_NAME unconditionally. + * config/gcn/gcn-valu.md (scatter, gather): Use TARGET_GLC_NAME. + * config/gcn/gcn.md: Use %G instead of glc; use 'buffer_inv sc1' + for TARGET_TARGET_SC_CACHE. + * doc/invoke.texi (march): Add gfx942, gfx950 and gfx9-4-generic. + * doc/install.texi (amdgcn*-*-*): Add gfx942, gfx950 and gfx9-4-generic. + * config/gcn/gcn-tables.opt: Regenerate. + 2025-06-06 Tobias Burnus Backported from master: diff --git a/gcc/DATESTAMP.omp b/gcc/DATESTAMP.omp index c6de4e349988..52988ae3b03d 100644 --- a/gcc/DATESTAMP.omp +++ b/gcc/DATESTAMP.omp @@ -1 +1 @@ -20250606 +20250610 diff --git a/include/ChangeLog.omp b/include/ChangeLog.omp index 74413c262e62..7a8f2810132f 100644 --- a/include/ChangeLog.omp +++ b/include/ChangeLog.omp @@ -1,3 +1,11 @@ +2025-06-10 Tobias Burnus + + Backported from master: + 2025-06-02 Tobias Burnus + + PR libgomp/120444 + * cuda/cuda.h (cuMemsetD8, cuMemsetD8Async): Declare. + 2025-05-15 Julian Brown * gomp-constants.h (gomp_map_kind): Add GOMP_MAP_TO_GRID, diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index e25761590956..2bf31a9d2180 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,3 +1,40 @@ +2025-06-10 Tobias Burnus + + Backported from master: + 2025-06-10 Tobias Burnus + + * testsuite/libgomp.c/declare-variant-4.h (gfx942): New variant function. + * testsuite/libgomp.c/declare-variant-4-gfx942.c: New test. + +2025-06-10 Tobias Burnus + + Backported from master: + 2025-06-02 Tobias Burnus + + PR libgomp/120444 + * libgomp-plugin.h (GOMP_OFFLOAD_memset): Declare. + * libgomp.h (struct gomp_device_descr): Add memset_func. + * libgomp.map (GOMP_6.0.1): Add omp_target_memset{,_async}. + * libgomp.texi (Device Memory Routines): Document them. + * omp.h.in (omp_target_memset, omp_target_memset_async): Declare. + * omp_lib.f90.in (omp_target_memset, omp_target_memset_async): + Add interfaces. + * omp_lib.h.in (omp_target_memset, omp_target_memset_async): Likewise. + * plugin/cuda-lib.def: Add cuMemsetD8. + * plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add + hsa_amd_memory_fill_fn. + (init_hsa_runtime_functions): DLSYM_OPT_FN load it. + (GOMP_OFFLOAD_memset): New. + * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memset): New. + * target.c (omp_target_memset_int, omp_target_memset, + omp_target_memset_async_helper, omp_target_memset_async): New. + (gomp_load_plugin_for_device): Add DLSYM (memset). + * testsuite/libgomp.c-c++-common/omp_target_memset.c: New test. + * testsuite/libgomp.c-c++-common/omp_target_memset-2.c: New test. + * testsuite/libgomp.c-c++-common/omp_target_memset-3.c: New test. + * testsuite/libgomp.fortran/omp_target_memset.f90: New test. + * testsuite/libgomp.fortran/omp_target_memset-2.f90: New test. + 2025-06-06 Tobias Burnus Backported from master:
[gcc/devel/omp/gcc-15] (33 commits) ChangeLog.omp bump
The branch 'devel/omp/gcc-15' was updated to point to: a6a5a2674c5c... ChangeLog.omp bump It previously pointed to: f34abf47bf57... ChangeLog.omp bump Diff: Summary of changes (added commits): --- a6a5a26... ChangeLog.omp bump 5e75ec7... gcn: Add experimental MI300 (gfx942) support 7704131... libgomp: Add OpenMP's omp_target_memset/omp_target_memset_a 682e767... Merge branch 'releases/gcc-15' into devel/omp/gcc-15 5327eef... libstdc++: Make system_clock::to_time_t always_inline [PR99 (*) 974d59a... libstdc++: Fix std::format thousands separators when sign p (*) 615a92a... vectorizer: Fix riscv build [PR120042] (*) a35f642... ada: Error on subtype with static predicate used in case_ex (*) e249cec... ada: Fix fallout of latest change (*) d02a2fe... ada: Fix wrong initialization of library-level object by co (*) 4aca5bc... ada: Storage_Error on Ordered_Maps container aggregate with (*) 8a4b72a... ada: Fix infinite loop with aggregate in generic unit (*) 2859883... ada: Fix use-after-free in Compute_All_Tasks (*) ba729e2... ext-dce: Don't refine live width with SUBREG mode if !TRULY (*) 62724ea... Daily bump. (*) e4940c0... c++: recursive template with deduced return [PR120555] (*) 4e4684c... c++: constexpr prvalues vs genericize [PR120502] (*) d96603a... ada: Support fixed-lower-bound array types as generic actua (*) 6cc5c01... ada: Reject component-related aspects on formal non-array t (*) 2fd267b... ada: Fix glitch in handling of Atomic_Components on generic (*) f59c4d4... ada: Missing discriminant check on assignment of Bounded_Ve (*) e68026c... ada: Check validity using signedness from the type and not (*) 8a63f6b... ada: Incorrect creation of corresponding expression of clas (*) 823e973... ada: Fix spurious error on anonymous array initialized by c (*) c8934b1... Daily bump. (*) 4caedcd... Daily bump. (*) 69eb171... Daily bump. (*) f59d33a... ada: Constant_Indexing used when context requires a variabl (*) e0777e7... ada: Fix libgpr2 build failure with compiler built with ass (*) cb3e765... ada: Fix wrong initialization of library-level object by co (*) 1189522... ada: Incorrect unresolved operator name in an instantiation (*) 855fe36... ada: Fix internal error on allocator involving interface ty (*) 649bde8... ada: Fix for validity checking of limited scalar types (*) (*) This commit already exists in another branch. Because the reference `refs/heads/devel/omp/gcc-15' matches your hooks.email-new-commits-only configuration, no separate email is sent for this commit.
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:141ce9e514fba385685c8c46587719fc2ddf464e commit 141ce9e514fba385685c8c46587719fc2ddf464e Author: Michael Meissner Date: Tue Jun 10 16:05:20 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector nor/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 161419b7f586..ed15fccdf760 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1949,20 +1949,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vand (define_insn "*fuse_vnor_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnor %3,%1,%0\;vand %3,%3,%2 vnor %3,%1,%0\;vand %3,%3,%2 vnor %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,8 vnor %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 9d3a01a4704a..40d62ae8e9c1 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -219,6 +219,7 @@ sub gen_logical_addsubf "vandc_vand" => 2, "vxor_vand" => 6, "vor_vand"=> 7, + "vnor_vand" => 8, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:6ee9c4088f72ad5a26d56e8e405ddfcea3aba769 commit 6ee9c4088f72ad5a26d56e8e405ddfcea3aba769 Author: Michael Meissner Date: Tue Jun 10 16:02:26 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector or/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 6375cd3a8970..161419b7f586 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1967,20 +1967,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vand (define_insn "*fuse_vor_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (ior:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vor %3,%1,%0\;vand %3,%3,%2 vor %3,%1,%0\;vand %3,%3,%2 vor %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,7 vor %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 2c631b944587..9d3a01a4704a 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -218,6 +218,7 @@ sub gen_logical_addsubf "vand_vand" => 1, "vandc_vand" => 2, "vxor_vand" => 6, + "vor_vand"=> 7, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:fd2acd0a00a7ce79e474e5cc55d071601c54d2db commit fd2acd0a00a7ce79e474e5cc55d071601c54d2db Author: Michael Meissner Date: Tue Jun 10 15:59:17 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector xor/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index b9590b6d1104..6375cd3a8970 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2003,20 +2003,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vand (define_insn "*fuse_vxor_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vxor %3,%1,%0\;vand %3,%3,%2 vxor %3,%1,%0\;vand %3,%3,%2 vxor %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,6 vxor %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vandc diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 23adf98c4056..2c631b944587 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -217,6 +217,7 @@ sub gen_logical_addsubf my %xxeval_fusions = ( "vand_vand" => 1, "vandc_vand" => 2, + "vxor_vand" => 6, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:742340663eeb49c659420603e1d5a73579316004 commit 742340663eeb49c659420603e1d5a73579316004 Author: Michael Meissner Date: Tue Jun 10 15:45:21 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector and/and fusion if XXEVAL is supported. * config/rs6000/predicates.md (vector_fusion_operand): New predicate. * config/rs6000/rs6000.h (TARGET_XXEVAL): New macro. * config/rs6000/rs6000.md (isa attribute): Add xxeval. (enabled attribute): Add support for XXEVAL support. Diff: --- gcc/config/rs6000/fusion.md | 15 ++- gcc/config/rs6000/genfusion.pl | 58 ++--- gcc/config/rs6000/predicates.md | 12 + gcc/config/rs6000/rs6000.h | 4 +++ gcc/config/rs6000/rs6000.md | 7 - 5 files changed, 85 insertions(+), 11 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 621b346f9eb9..d24837d68d83 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1871,20 +1871,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vand (define_insn "*fuse_vand_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "%v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "%v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vand %3,%1,%0\;vand %3,%3,%2 vand %3,%1,%0\;vand %3,%3,%2 vand %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,1 vand %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index e5d3b1ee449d..351a4d914a4a 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -211,25 +211,33 @@ sub gen_logical_addsubf $inner_comp, $inner_inv, $inner_rtl, $inner_op, $both_commute, $c4, $bc, $inner_arg0, $inner_arg1, $inner_exp, $outer_arg2, $outer_exp, $ftype, $insn, $is_subf, $is_rsubf, $outer_32, $outer_42,$outer_name, - $fuse_type); - KIND: foreach $kind ('scalar','vector') { + $fuse_type, $xxeval, $c5, $vect_pred, $vect_inner_arg0, $vect_inner_arg1, + $vect_inner_exp, $vect_outer_arg2, $vect_outer_exp); + +my %xxeval_fusions = ( + "vand_vand" => 1, +); + +KIND: foreach $kind ('scalar','vector') { @outer_ops = @logicals; if ( $kind eq 'vector' ) { $vchr = "v"; $mode = "VM"; $pred = "altivec_register_operand"; + $vect_pred = "vector_fusion_operand"; $constraint = "v"; $fuse_type = "fused_vector"; } else { $vchr = ""; $mode = "GPR"; - $pred = "gpc_reg_operand"; + $vect_pred = $pred = "gpc_reg_operand"; $constraint = "r"; $fuse_type = "fused_arith_logical"; push (@outer_ops, @addsub); push (@outer_ops, ( "rsubf" )); } $c4 = "${constraint},${constraint},${constraint},${constraint}"; + $c5 = "${constraint},${constraint},${constraint},wa,${constraint}"; OUTER: foreach $outer ( @outer_ops ) { $outer_name = "${vchr}${outer}"; $is_subf = ( $outer eq "subf" ); @@ -263,23 +271,33 @@ sub gen_logical_addsubf $bc = ""; if ( $both_commute ) { $bc = "%"; } $inner_arg0 = "(match_operand:${mode} 0 \"${pred}\" \"${c4}\")"; $inner_arg1 = "(match_operand:${mode} 1 \"${pred}\" \"${bc}${c4}\")"; + $vect_inner_arg0 = "(match_operand:${mode} 0 \"${vect_pred}\" \"${c5}\")"; + $vect_inner_arg1 = "(match_operand:${mode} 1 \"${vect_pred}\" \"${bc}${c5}\")"; if ( ($inner_comp & 1) == 1 ) { $inner_arg0 = "(not:${mode} $inner_arg0)"; + $vect_inner_arg0 = "(not:${mode} $vect_inner_arg0)"; } if ( ($inn
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:2f541cae69f67d9e61b6bafa6f4847fa27fdaf05 commit 2f541cae69f67d9e61b6bafa6f4847fa27fdaf05 Author: Michael Meissner Date: Tue Jun 10 18:24:33 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/testsuite/ PR target/117251 * gcc.target/powerpc/p10-vector-fused-1.c: New test. * gcc.target/powerpc/p10-vector-fused-2.c: Likewise. Diff: --- .../gcc.target/powerpc/p10-vector-fused-1.c| 409 + .../gcc.target/powerpc/p10-vector-fused-2.c| 936 + 2 files changed, 1345 insertions(+) diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c b/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c new file mode 100644 index ..28e0874b3454 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p10-vector-fused-1.c @@ -0,0 +1,409 @@ +/* { dg-do run } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Generate and check most of the vector logical instruction combinations that + may or may not generate xxeval to do a fused operation on power10. */ + +#include +#include +#include + +#ifdef DEBUG +#include + +static int errors = 0; +static int tests = 0; +#endif + +typedef vector unsigned intvector_t; +typedef unsigned int scalar_t; + +/* Vector logical functions. */ +static inline vector_t +vector_and (vector_t x, vector_t y) +{ + return x & y; +} + +static inline vector_t +vector_or (vector_t x, vector_t y) +{ + return x | y; +} + +static inline vector_t +vector_xor (vector_t x, vector_t y) +{ + return x ^ y; +} + +static inline vector_t +vector_andc (vector_t x, vector_t y) +{ + return x & ~y; +} + +static inline vector_t +vector_orc (vector_t x, vector_t y) +{ + return x | ~y; +} + +static inline vector_t +vector_nand (vector_t x, vector_t y) +{ + return ~(x & y); +} + +static inline vector_t +vector_nor (vector_t x, vector_t y) +{ + return ~(x | y); +} + +static inline vector_t +vector_eqv (vector_t x, vector_t y) +{ + return ~(x ^ y); +} + +/* Scalar logical functions. */ +static inline scalar_t +scalar_and (scalar_t x, scalar_t y) +{ + return x & y; +} + +static inline scalar_t +scalar_or (scalar_t x, scalar_t y) +{ + return x | y; +} + +static inline scalar_t +scalar_xor (scalar_t x, scalar_t y) +{ + return x ^ y; +} + +static inline scalar_t +scalar_andc (scalar_t x, scalar_t y) +{ + return x & ~y; +} + +static inline scalar_t +scalar_orc (scalar_t x, scalar_t y) +{ + return x | ~y; +} + +static inline scalar_t +scalar_nand (scalar_t x, scalar_t y) +{ + return ~(x & y); +} + +static inline scalar_t +scalar_nor (scalar_t x, scalar_t y) +{ + return ~(x | y); +} + +static inline scalar_t +scalar_eqv (scalar_t x, scalar_t y) +{ + return ~(x ^ y); +} + + +/* + * Generate one function for each combination that we are checking. Do 4 + * operations: + * + * Use FPR regs that should generate either XXEVAL or XXL* insns; + * Use Altivec registers than may generated fused V* insns; + * Use VSX registers, insure fusing it not done via asm; (and) + * Use GPR registers on scalar operations. + */ + +#ifdef DEBUG +#define TRACE(INNER, OUTER)\ + do { \ +tests++; \ +printf ("%s_%s\n", INNER, OUTER); \ +fflush (stdout); \ + } while (0) \ + +#define FAILED(INNER, OUTER) \ + do { \ +errors++; \ +printf ("%s_%s failed\n", INNER, OUTER); \ +fflush (stdout); \ + } while (0) \ + +#else +#define TRACE(INNER, OUTER) +#define FAILED(INNER, OUTER) abort () +#endif + +#define FUSED_FUNC(INNER, OUTER) \ +static void\ +INNER ## _ ## OUTER (vector_t a, vector_t b, vector_t c) \ +{ \ + vector_t f_a, f_b, f_c, f_r, f_t;\ + vector_t v_a, v_b, v_c, v_r, v_t;\ + vector_t w_a, w_b, w_c, w_r, w_t;\ + scalar_t s_a, s_b, s_c, s_r, s_t;\ + \ + TRACE (#INNER, #OUTER); \ +
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:a5bdfca7a952be8e1dd86f5e5a026197bf08fd98 commit a5bdfca7a952be8e1dd86f5e5a026197bf08fd98 Author: Michael Meissner Date: Tue Jun 10 16:34:00 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector andc => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index fccea39d0aae..6e5c88b81b44 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2933,20 +2933,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vxor (define_insn "*fuse_vandc_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vxor %3,%3,%2 vandc %3,%1,%0\;vxor %3,%3,%2 vandc %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,45 vandc %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index ab714b10f622..d15208a4ad3e 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -227,6 +227,7 @@ sub gen_logical_addsubf "vnand_vnor" => 16, "vand_vxor" => 30, "vand_vor"=> 31, + "vandc_vxor" => 45, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:a033ea0c0d3822b7b2f43b244d17ae3c04173e61 commit a033ea0c0d3822b7b2f43b244d17ae3c04173e61 Author: Michael Meissner Date: Tue Jun 10 18:19:39 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector and => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 129f7dfb26ed..61d66129da65 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2336,20 +2336,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vnand (define_insn "*fuse_vand_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (and:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vand %3,%1,%0\;vnand %3,%3,%2 vand %3,%1,%0\;vnand %3,%3,%2 vand %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,254 vand %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 1d31c242042e..9261dd369340 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -258,6 +258,7 @@ sub gen_logical_addsubf "vor_vnand" => 248, "vxor_vnand" => 249, "vandc_vnand" => 253, + "vand_vnand" => 254, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:303ab25c03790bd8cc2be68240371e41e5e75c7f commit 303ab25c03790bd8cc2be68240371e41e5e75c7f Author: Michael Meissner Date: Tue Jun 10 17:36:30 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 3d7e6502b027..f6dc26e9c1f2 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2762,20 +2762,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vor (define_insn "*fuse_vorc_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vor %3,%3,%2 vorc %3,%1,%0\;vor %3,%3,%2 vorc %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,191 vorc %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 15f931baad33..62f2b9e36d89 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -245,6 +245,7 @@ sub gen_logical_addsubf "veqv_vxor" => 150, "veqv_vor"=> 159, "vorc_vxor" => 180, + "vorc_vor"=> 191, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:8228e5b4ab9a58e16b692a4f4ca4fa83c688c222 commit 8228e5b4ab9a58e16b692a4f4ca4fa83c688c222 Author: Michael Meissner Date: Tue Jun 10 17:45:02 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector andc => eqv fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index dd8401d48228..e3d9f7376a8d 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2204,20 +2204,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> veqv (define_insn "*fuse_vandc_veqv" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(not:VM (xor:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(not:VM (xor:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;veqv %3,%3,%2 vandc %3,%1,%0\;veqv %3,%3,%2 vandc %3,%1,%0\;veqv %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,210 vandc %4,%1,%0\;veqv %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> veqv diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index d89e78d4da03..3a603eb09675 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -247,6 +247,7 @@ sub gen_logical_addsubf "vorc_vxor" => 180, "vorc_vor"=> 191, "vandc_vnor" => 208, + "vandc_veqv" => 210, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:2b8a8f1a9e461de365f0de90374398833adb8b74 commit 2b8a8f1a9e461de365f0de90374398833adb8b74 Author: Michael Meissner Date: Tue Jun 10 17:55:21 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nand => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e6d13b38415a..ba3a5a52b990 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2711,20 +2711,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vor (define_insn "*fuse_vnand_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnand %3,%1,%0\;vor %3,%3,%2 vnand %3,%1,%0\;vor %3,%3,%2 vnand %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,239 vnand %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 94eae471c64b..54699d199fc5 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -250,6 +250,7 @@ sub gen_logical_addsubf "vandc_veqv" => 210, "vand_vnor" => 224, "vnand_vxor" => 225, + "vnand_vor" => 239, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:1b6089173e851ffae80699d949666f4c27032e1c commit 1b6089173e851ffae80699d949666f4c27032e1c Author: Michael Meissner Date: Tue Jun 10 16:20:16 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector nand/nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index f70422616ffd..c8a27a9e5471 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2528,20 +2528,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vnor (define_insn "*fuse_vnand_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnand %3,%1,%0\;vnor %3,%3,%2 vnand %3,%1,%0\;vnor %3,%3,%2 vnand %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,16 vnand %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 5beabe530a67..078bc6ca0dab 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -224,6 +224,7 @@ sub gen_logical_addsubf "vorc_vand" => 11, "vandc_vandc" => 13, "vnand_vand" => 14, + "vnand_vnor" => 16, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:50059b8faffe92785770f269feca74119ddbd249 commit 50059b8faffe92785770f269feca74119ddbd249 Author: Michael Meissner Date: Tue Jun 10 16:41:51 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => eqv fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index f45e65f0217c..f84d0aee5d79 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2294,20 +2294,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> veqv (define_insn "*fuse_vorc_veqv" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(not:VM (xor:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(not:VM (xor:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;veqv %3,%3,%2 vorc %3,%1,%0\;veqv %3,%3,%2 vorc %3,%1,%0\;veqv %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,75 vorc %4,%1,%0\;veqv %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> veqv diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 720e8d440c2d..8ba1aa081f75 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -230,6 +230,7 @@ sub gen_logical_addsubf "vandc_vxor" => 45, "vandc_vor" => 47, "vorc_vnor" => 64, + "vorc_veqv" => 75, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:ae5fc7bc755592194e73535df4f28dddf58cf517 commit ae5fc7bc755592194e73535df4f28dddf58cf517 Author: Michael Meissner Date: Tue Jun 10 18:10:38 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector or => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 01b7fda17ecc..39b586918c17 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2435,20 +2435,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vnand (define_insn "*fuse_vor_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (ior:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vor %3,%1,%0\;vnand %3,%3,%2 vor %3,%1,%0\;vnand %3,%3,%2 vor %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,248 vor %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index d4965b6df864..86bca81286ca 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -255,6 +255,7 @@ sub gen_logical_addsubf "vorc_vnand" => 244, "veqv_vnand" => 246, "vnor_vnand" => 247, + "vor_vnand" => 248, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:b144edb81dc7fac54df459d2a5dfbd36ef0daf51 commit b144edb81dc7fac54df459d2a5dfbd36ef0daf51 Author: Michael Meissner Date: Tue Jun 10 18:13:21 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector xor => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 39b586918c17..e0f9ac17659a 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2477,20 +2477,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vnand (define_insn "*fuse_vxor_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vxor %3,%1,%0\;vnand %3,%3,%2 vxor %3,%1,%0\;vnand %3,%3,%2 vxor %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,249 vxor %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 86bca81286ca..5d22a0732df6 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -256,6 +256,7 @@ sub gen_logical_addsubf "veqv_vnand" => 246, "vnor_vnand" => 247, "vor_vnand" => 248, + "vxor_vnand" => 249, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:fa43d1cdb8be87fe0f2e6259ce17d8803806115a commit fa43d1cdb8be87fe0f2e6259ce17d8803806115a Author: Michael Meissner Date: Tue Jun 10 16:55:47 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nor => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 1d4b3c970c7f..032c87ac5765 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2555,20 +2555,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vnor (define_insn "*fuse_vnor_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnor %3,%1,%0\;vnor %3,%3,%2 vnor %3,%1,%0\;vnor %3,%3,%2 vnor %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,112 vnor %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 4ec38beccb9c..6af4c5d7a182 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -235,6 +235,7 @@ sub gen_logical_addsubf "veqv_vnor" => 96, "vxor_vxor" => 105, "vxor_vor"=> 111, + "vnor_vnor" => 112, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:c58f75947d43a31a787e4028e15e37f058fa121c commit c58f75947d43a31a787e4028e15e37f058fa121c Author: Michael Meissner Date: Tue Jun 10 18:16:42 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector andc => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e0f9ac17659a..129f7dfb26ed 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2354,20 +2354,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vnand (define_insn "*fuse_vandc_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vnand %3,%3,%2 vandc %3,%1,%0\;vnand %3,%3,%2 vandc %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,253 vandc %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 5d22a0732df6..1d31c242042e 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -257,6 +257,7 @@ sub gen_logical_addsubf "vnor_vnand" => 247, "vor_vnand" => 248, "vxor_vnand" => 249, + "vandc_vnand" => 253, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:c25a0ce44f75ec67528a2d855ef5229abfe4cf6d commit c25a0ce44f75ec67528a2d855ef5229abfe4cf6d Author: Michael Meissner Date: Tue Jun 10 15:56:02 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector andc/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d24837d68d83..b9590b6d1104 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1892,20 +1892,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vand (define_insn "*fuse_vandc_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vand %3,%3,%2 vandc %3,%1,%0\;vand %3,%3,%2 vandc %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,2 vandc %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 351a4d914a4a..23adf98c4056 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -216,6 +216,7 @@ sub gen_logical_addsubf my %xxeval_fusions = ( "vand_vand" => 1, + "vandc_vand" => 2, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:8bfffbaaf94907d12c535486cd9d8a3a53a1aea9 commit 8bfffbaaf94907d12c535486cd9d8a3a53a1aea9 Author: Michael Meissner Date: Tue Jun 10 16:14:34 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector andc/andc fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e27f05f85f12..810d97963fb9 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2054,20 +2054,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vandc (define_insn "*fuse_vandc_vandc" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vandc %3,%3,%2 vandc %3,%1,%0\;vandc %3,%3,%2 vandc %3,%1,%0\;vandc %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,13 vandc %4,%1,%0\;vandc %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vandc diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index a3cc8b121eab..929257d6c03e 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -222,6 +222,7 @@ sub gen_logical_addsubf "vnor_vand" => 8, "veqv_vand" => 9, "vorc_vand" => 11, + "vandc_vandc" => 13, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:728f505faf021f60c6157433af58bf1c7499 commit 728f505faf021f60c6157433af58bf1c7499 Author: Michael Meissner Date: Tue Jun 10 16:23:42 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector and/xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index c8a27a9e5471..789a4d592419 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2909,20 +2909,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vxor (define_insn "*fuse_vand_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (and:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vand %3,%1,%0\;vxor %3,%3,%2 vand %3,%1,%0\;vxor %3,%3,%2 vand %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,30 vand %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 078bc6ca0dab..e6d44d430b3a 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -225,6 +225,7 @@ sub gen_logical_addsubf "vandc_vandc" => 13, "vnand_vand" => 14, "vnand_vnor" => 16, + "vand_vxor" => 30, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:4a0c4551f0f9df5a312141b8a5bff58645f5115f commit 4a0c4551f0f9df5a312141b8a5bff58645f5115f Author: Michael Meissner Date: Tue Jun 10 16:11:38 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector orc/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index cce179e0c974..e27f05f85f12 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1994,20 +1994,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vand (define_insn "*fuse_vorc_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vand %3,%3,%2 vorc %3,%1,%0\;vand %3,%3,%2 vorc %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,11 vorc %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 268b94089484..a3cc8b121eab 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -221,6 +221,7 @@ sub gen_logical_addsubf "vor_vand"=> 7, "vnor_vand" => 8, "veqv_vand" => 9, + "vorc_vand" => 11, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:a75dd22985863e77b051d70f250e939214ba32dc commit a75dd22985863e77b051d70f250e939214ba32dc Author: Michael Meissner Date: Tue Jun 10 16:07:51 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector nor/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index ed15fccdf760..cce179e0c974 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1913,20 +1913,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vand (define_insn "*fuse_veqv_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ veqv %3,%1,%0\;vand %3,%3,%2 veqv %3,%1,%0\;vand %3,%3,%2 veqv %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,9 veqv %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 40d62ae8e9c1..268b94089484 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -220,6 +220,7 @@ sub gen_logical_addsubf "vxor_vand" => 6, "vor_vand"=> 7, "vnor_vand" => 8, + "veqv_vand" => 9, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:31416ca5a3c12ca137ce74887f2e6bcaed62e4ca commit 31416ca5a3c12ca137ce74887f2e6bcaed62e4ca Author: Michael Meissner Date: Tue Jun 10 16:39:17 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index ed70ac059dfc..f45e65f0217c 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2585,20 +2585,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vnor (define_insn "*fuse_vorc_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vnor %3,%3,%2 vorc %3,%1,%0\;vnor %3,%3,%2 vorc %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,64 vorc %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 69fa544f0317..720e8d440c2d 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -229,6 +229,7 @@ sub gen_logical_addsubf "vand_vor"=> 31, "vandc_vxor" => 45, "vandc_vor" => 47, + "vorc_vnor" => 64, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:14010241a92479cbcf0bf6cc5a584065821c9173 commit 14010241a92479cbcf0bf6cc5a584065821c9173 Author: Michael Meissner Date: Tue Jun 10 16:17:16 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector/vector nand/and fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 810d97963fb9..f70422616ffd 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -1934,20 +1934,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vand (define_insn "*fuse_vnand_vand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnand %3,%1,%0\;vand %3,%3,%2 vnand %3,%1,%0\;vand %3,%3,%2 vnand %3,%1,%0\;vand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,14 vnand %4,%1,%0\;vand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 929257d6c03e..5beabe530a67 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -223,6 +223,7 @@ sub gen_logical_addsubf "veqv_vand" => 9, "vorc_vand" => 11, "vandc_vandc" => 13, + "vnand_vand" => 14, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:5e246c3eea3303bbc3be494765009cc443eda276 commit 5e246c3eea3303bbc3be494765009cc443eda276 Author: Michael Meissner Date: Tue Jun 10 16:36:08 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector andc => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 6e5c88b81b44..ed70ac059dfc 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2642,20 +2642,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vor (define_insn "*fuse_vandc_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vor %3,%3,%2 vandc %3,%1,%0\;vor %3,%3,%2 vandc %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,47 vandc %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index d15208a4ad3e..69fa544f0317 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -228,6 +228,7 @@ sub gen_logical_addsubf "vand_vxor" => 30, "vand_vor"=> 31, "vandc_vxor" => 45, + "vandc_vor" => 47, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:a59034f2a6084d924133668e58db1b2bba2e6a43 commit a59034f2a6084d924133668e58db1b2bba2e6a43 Author: Michael Meissner Date: Tue Jun 10 17:20:17 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nor => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 1f1756dbe63e..66d98f4537e1 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2714,20 +2714,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vor (define_insn "*fuse_vnor_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnor %3,%1,%0\;vor %3,%3,%2 vnor %3,%1,%0\;vor %3,%3,%2 vnor %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,143 vnor %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 0fea2d6d8482..98b56b788f03 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -240,6 +240,7 @@ sub gen_logical_addsubf "vor_vor" => 127, "vor_vnor"=> 128, "vnor_vxor" => 135, + "vnor_vor"=> 143, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:b3706b848edd2b31abe5ef2698392805303548d2 commit b3706b848edd2b31abe5ef2698392805303548d2 Author: Michael Meissner Date: Tue Jun 10 16:50:19 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector xor => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e5099178d63d..a848b21bc3e2 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -3059,20 +3059,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vxor (define_insn "*fuse_vxor_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "%v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "%v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vxor %3,%1,%0\;vxor %3,%3,%2 vxor %3,%1,%0\;vxor %3,%3,%2 vxor %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,105 vxor %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; add-add fusion pattern generated by gen_addadd (define_insn "*fuse_add_add" diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 79d9eaed7da6..b9ff6c99b95e 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -233,6 +233,7 @@ sub gen_logical_addsubf "vorc_veqv" => 75, "vorc_vorc" => 79, "veqv_vnor" => 96, + "vxor_vxor" => 105, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:20db16cfc7f50b4f218466526dff7a6d01c85cba commit 20db16cfc7f50b4f218466526dff7a6d01c85cba Author: Michael Meissner Date: Tue Jun 10 17:28:09 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector eqv => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index bb62ae26445a..cb1ad8b4c0cc 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2681,20 +2681,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vor (define_insn "*fuse_veqv_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ veqv %3,%1,%0\;vor %3,%3,%2 veqv %3,%1,%0\;vor %3,%3,%2 veqv %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,159 veqv %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 726e29c798bc..9400aed267a6 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -243,6 +243,7 @@ sub gen_logical_addsubf "vnor_vor"=> 143, "vxor_vnor" => 144, "veqv_vxor" => 150, + "veqv_vor"=> 159, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:9a5a4fedf0d84ea698e1e31304cf6a4d8652051c commit 9a5a4fedf0d84ea698e1e31304cf6a4d8652051c Author: Michael Meissner Date: Tue Jun 10 17:00:25 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector or => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index d1f6a38b618a..c2a2ebf4bfaf 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2729,20 +2729,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vor (define_insn "*fuse_vor_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "%v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (ior:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "%v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vor %3,%1,%0\;vor %3,%3,%2 vor %3,%1,%0\;vor %3,%3,%2 vor %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,127 vor %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 97681f37d0fa..9df4c8d6527e 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -237,6 +237,7 @@ sub gen_logical_addsubf "vxor_vor"=> 111, "vnor_vnor" => 112, "vor_vxor"=> 120, + "vor_vor" => 127, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:0d518163697d4fc03874268389223ad567abf13a commit 0d518163697d4fc03874268389223ad567abf13a Author: Michael Meissner Date: Tue Jun 10 16:47:39 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector eqv => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 486aa813575d..e5099178d63d 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2513,20 +2513,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vnor (define_insn "*fuse_veqv_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ veqv %3,%1,%0\;vnor %3,%3,%2 veqv %3,%1,%0\;vnor %3,%3,%2 veqv %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,96 veqv %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 8f60fe76c87b..79d9eaed7da6 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -232,6 +232,7 @@ sub gen_logical_addsubf "vorc_vnor" => 64, "vorc_veqv" => 75, "vorc_vorc" => 79, + "veqv_vnor" => 96, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:0dfbbfadd4b128ca440e3b38c0abbb0a6f1bf35e commit 0dfbbfadd4b128ca440e3b38c0abbb0a6f1bf35e Author: Michael Meissner Date: Tue Jun 10 16:58:06 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector or => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 032c87ac5765..d1f6a38b618a 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -3029,20 +3029,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vxor (define_insn "*fuse_vor_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (ior:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vor %3,%1,%0\;vxor %3,%3,%2 vor %3,%1,%0\;vxor %3,%3,%2 vor %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,120 vor %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 6af4c5d7a182..97681f37d0fa 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -236,6 +236,7 @@ sub gen_logical_addsubf "vxor_vxor" => 105, "vxor_vor"=> 111, "vnor_vnor" => 112, + "vor_vxor"=> 120, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:29fb0c6566c9ef1f149d1084dc2581c872649b12 commit 29fb0c6566c9ef1f149d1084dc2581c872649b12 Author: Michael Meissner Date: Tue Jun 10 17:17:48 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nor => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index c55e9d4abd67..1f1756dbe63e 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -3017,20 +3017,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vxor (define_insn "*fuse_vnor_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnor %3,%1,%0\;vxor %3,%3,%2 vnor %3,%1,%0\;vxor %3,%3,%2 vnor %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,135 vnor %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 58f900640bef..0fea2d6d8482 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -239,6 +239,7 @@ sub gen_logical_addsubf "vor_vxor"=> 120, "vor_vor" => 127, "vor_vnor"=> 128, + "vnor_vxor" => 135, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:83297e5ec6c3b8713c10c628aa7acc2a7cfda2c3 commit 83297e5ec6c3b8713c10c628aa7acc2a7cfda2c3 Author: Michael Meissner Date: Tue Jun 10 16:26:51 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector and => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 789a4d592419..fccea39d0aae 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2621,20 +2621,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vor (define_insn "*fuse_vand_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (and:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vand %3,%1,%0\;vor %3,%3,%2 vand %3,%1,%0\;vor %3,%3,%2 vand %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,31 vand %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index e6d44d430b3a..ab714b10f622 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -226,6 +226,7 @@ sub gen_logical_addsubf "vnand_vand" => 14, "vnand_vnor" => 16, "vand_vxor" => 30, + "vand_vor"=> 31, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:c5ffa5bd175214d557b2bec0ee3ea6bd64745ac7 commit c5ffa5bd175214d557b2bec0ee3ea6bd64745ac7 Author: Michael Meissner Date: Tue Jun 10 16:53:16 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector xor => or fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index a848b21bc3e2..1d4b3c970c7f 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2762,20 +2762,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vor (define_insn "*fuse_vxor_vor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vxor %3,%1,%0\;vor %3,%3,%2 vxor %3,%1,%0\;vor %3,%3,%2 vxor %3,%1,%0\;vor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,111 vxor %4,%1,%0\;vor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vorc diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index b9ff6c99b95e..4ec38beccb9c 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -234,6 +234,7 @@ sub gen_logical_addsubf "vorc_vorc" => 79, "veqv_vnor" => 96, "vxor_vxor" => 105, + "vxor_vor"=> 111, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:c29e8af614954a24935024b7662144e94a313b94 commit c29e8af614954a24935024b7662144e94a313b94 Author: Michael Meissner Date: Tue Jun 10 18:04:39 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector eqv => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 96d8951049c9..c1be0e5ff8f1 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2372,20 +2372,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vnand (define_insn "*fuse_veqv_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ veqv %3,%1,%0\;vnand %3,%3,%2 veqv %3,%1,%0\;vnand %3,%3,%2 veqv %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,246 veqv %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 77d3e999eb93..4c70237d2d27 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -253,6 +253,7 @@ sub gen_logical_addsubf "vnand_vor" => 239, "vnand_vnand" => 241, "vorc_vnand" => 244, + "veqv_vnand" => 246, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:41f7160530b5590d445c0eba8edc13349fca0fbf commit 41f7160530b5590d445c0eba8edc13349fca0fbf Author: Michael Meissner Date: Tue Jun 10 17:52:13 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nand => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 68b52d4f5893..e6d13b38415a 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -3023,20 +3023,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vxor (define_insn "*fuse_vnand_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnand %3,%1,%0\;vxor %3,%3,%2 vnand %3,%1,%0\;vxor %3,%3,%2 vnand %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,225 vnand %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 56e5d96ec5f3..94eae471c64b 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -249,6 +249,7 @@ sub gen_logical_addsubf "vandc_vnor" => 208, "vandc_veqv" => 210, "vand_vnor" => 224, + "vnand_vxor" => 225, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:f656cd986db4f0d02a93bde8ca178b566face611 commit f656cd986db4f0d02a93bde8ca178b566face611 Author: Michael Meissner Date: Tue Jun 10 16:44:55 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => orc fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index f84d0aee5d79..486aa813575d 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2885,20 +2885,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vorc (define_insn "*fuse_vorc_vorc" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vorc %3,%3,%2 vorc %3,%1,%0\;vorc %3,%3,%2 vorc %3,%1,%0\;vorc %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,79 vorc %4,%1,%0\;vorc %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vorc diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 8ba1aa081f75..8f60fe76c87b 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -231,6 +231,7 @@ sub gen_logical_addsubf "vandc_vor" => 47, "vorc_vnor" => 64, "vorc_veqv" => 75, + "vorc_vorc" => 79, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:2b9d7bfbad695f60f0dbc8afeead50fe309cc761 commit 2b9d7bfbad695f60f0dbc8afeead50fe309cc761 Author: Michael Meissner Date: Tue Jun 10 17:25:30 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector eqv => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e5ea37c567d6..bb62ae26445a 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2987,20 +2987,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vxor (define_insn "*fuse_veqv_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ veqv %3,%1,%0\;vxor %3,%3,%2 veqv %3,%1,%0\;vxor %3,%3,%2 veqv %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,150 veqv %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index d713d10a1dbc..726e29c798bc 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -242,6 +242,7 @@ sub gen_logical_addsubf "vnor_vxor" => 135, "vnor_vor"=> 143, "vxor_vnor" => 144, + "veqv_vxor" => 150, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:a55729c0c364eb53b14d70af25b13f20f88c6bbd commit a55729c0c364eb53b14d70af25b13f20f88c6bbd Author: Michael Meissner Date: Tue Jun 10 17:48:14 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector and => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index e3d9f7376a8d..68b52d4f5893 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2480,20 +2480,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vnor (define_insn "*fuse_vand_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (and:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (and:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vand %3,%1,%0\;vnor %3,%3,%2 vand %3,%1,%0\;vnor %3,%3,%2 vand %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,224 vand %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 3a603eb09675..56e5d96ec5f3 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -248,6 +248,7 @@ sub gen_logical_addsubf "vorc_vor"=> 191, "vandc_vnor" => 208, "vandc_veqv" => 210, + "vand_vnor" => 224, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:dbc3e4ec82fcfcf17f12065b7b7e71cca3af9c19 commit dbc3e4ec82fcfcf17f12065b7b7e71cca3af9c19 Author: Michael Meissner Date: Tue Jun 10 17:34:06 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => xor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index cb1ad8b4c0cc..3d7e6502b027 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -3071,20 +3071,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vxor (define_insn "*fuse_vorc_vxor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(xor:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(xor:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v"))) + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vxor %3,%3,%2 vorc %3,%1,%0\;vxor %3,%3,%2 vorc %3,%1,%0\;vxor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,180 vorc %4,%1,%0\;vxor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vxor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 9400aed267a6..15f931baad33 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -244,6 +244,7 @@ sub gen_logical_addsubf "vxor_vnor" => 144, "veqv_vxor" => 150, "veqv_vor"=> 159, + "vorc_vxor" => 180, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:1eee925a0672077c5d0518f58b1723c2beea16f0 commit 1eee925a0672077c5d0518f58b1723c2beea16f0 Author: Michael Meissner Date: Tue Jun 10 17:58:15 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nand => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index ba3a5a52b990..241b8a494fb1 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2390,20 +2390,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnand -> vnand (define_insn "*fuse_vnand_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnand %3,%1,%0\;vnand %3,%3,%2 vnand %3,%1,%0\;vnand %3,%3,%2 vnand %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,241 vnand %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 54699d199fc5..728a447c65a9 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -251,6 +251,7 @@ sub gen_logical_addsubf "vand_vnor" => 224, "vnand_vxor" => 225, "vnand_vor" => 239, + "vnand_vnand" => 241, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:539aa4e4b24ae01bdf157cd10feb7dc9ce63c5ca commit 539aa4e4b24ae01bdf157cd10feb7dc9ce63c5ca Author: Michael Meissner Date: Tue Jun 10 17:22:37 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector xor => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 66d98f4537e1..e5ea37c567d6 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2618,20 +2618,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vnor (define_insn "*fuse_vxor_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (xor:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vxor %3,%1,%0\;vnor %3,%3,%2 vxor %3,%1,%0\;vnor %3,%3,%2 vxor %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,144 vxor %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vand -> vor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 98b56b788f03..d713d10a1dbc 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -241,6 +241,7 @@ sub gen_logical_addsubf "vor_vnor"=> 128, "vnor_vxor" => 135, "vnor_vor"=> 143, + "vxor_vnor" => 144, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:d8dc55ea7482cad5d7e776d3b080c233672e commit d8dc55ea7482cad5d7e776d3b080c233672e Author: Michael Meissner Date: Tue Jun 10 17:40:41 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector andc => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index f6dc26e9c1f2..dd8401d48228 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2495,20 +2495,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vandc -> vnor (define_insn "*fuse_vandc_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vandc %3,%1,%0\;vnor %3,%3,%2 vandc %3,%1,%0\;vnor %3,%3,%2 vandc %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,208 vandc %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector veqv -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 62f2b9e36d89..d89e78d4da03 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -246,6 +246,7 @@ sub gen_logical_addsubf "veqv_vor"=> 159, "vorc_vxor" => 180, "vorc_vor"=> 191, + "vandc_vnor" => 208, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:71bee51da51e6cd1ce25dd48c69c2aaf8840791b commit 71bee51da51e6cd1ce25dd48c69c2aaf8840791b Author: Michael Meissner Date: Tue Jun 10 18:07:26 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector nor => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index c1be0e5ff8f1..01b7fda17ecc 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2414,20 +2414,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vnor -> vnand (define_insn "*fuse_vnor_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (not:VM (match_operand:VM 1 "altivec_register_operand" "v,v,v,v" - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (and:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (not:VM (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v" + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vnor %3,%1,%0\;vnand %3,%3,%2 vnor %3,%1,%0\;vnand %3,%3,%2 vnor %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,247 vnor %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 4c70237d2d27..d4965b6df864 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -254,6 +254,7 @@ sub gen_logical_addsubf "vnand_vnand" => 241, "vorc_vnand" => 244, "veqv_vnand" => 246, + "vnor_vnand" => 247, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:426e6a7c25056fcd5303ce1de6f3beb36b41194a commit 426e6a7c25056fcd5303ce1de6f3beb36b41194a Author: Michael Meissner Date: Tue Jun 10 18:00:38 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector orc => nand fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index 241b8a494fb1..96d8951049c9 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2447,20 +2447,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vnand (define_insn "*fuse_vorc_vnand" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(ior:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v")) - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(ior:VM (not:VM (ior:VM (not:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v")) + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vorc %3,%1,%0\;vnand %3,%3,%2 vorc %3,%1,%0\;vnand %3,%3,%2 vorc %3,%1,%0\;vnand %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,244 vorc %4,%1,%0\;vnand %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vxor -> vnand diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 728a447c65a9..77d3e999eb93 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -252,6 +252,7 @@ sub gen_logical_addsubf "vnand_vxor" => 225, "vnand_vor" => 239, "vnand_vnand" => 241, + "vorc_vnand" => 244, ); KIND: foreach $kind ('scalar','vector') {
[gcc(refs/users/meissner/heads/work210-sha)] PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations
https://gcc.gnu.org/g:05eece5a4cb62268780e5568a25dc6784f9b8c47 commit 05eece5a4cb62268780e5568a25dc6784f9b8c47 Author: Michael Meissner Date: Tue Jun 10 17:15:11 2025 -0400 PR target/117251: Add PowerPC XXEVAL support to speed up SHA3 calculations 2025-06-10 Michael Meissner gcc/ PR target/117251 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl (gen_logical_addsubf): Add support to generate vector or => nor fusion if XXEVAL is supported. Diff: --- gcc/config/rs6000/fusion.md| 15 +-- gcc/config/rs6000/genfusion.pl | 1 + 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md index c2a2ebf4bfaf..c55e9d4abd67 100644 --- a/gcc/config/rs6000/fusion.md +++ b/gcc/config/rs6000/fusion.md @@ -2576,20 +2576,23 @@ ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vor -> vnor (define_insn "*fuse_vor_vnor" - [(set (match_operand:VM 3 "altivec_register_operand" "=&0,&1,&v,v") -(and:VM (not:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" "v,v,v,v") - (match_operand:VM 1 "altivec_register_operand" "v,v,v,v"))) - (not:VM (match_operand:VM 2 "altivec_register_operand" "v,v,v,v" - (clobber (match_scratch:VM 4 "=X,X,X,&v"))] + [(set (match_operand:VM 3 "vector_fusion_operand" "=&0,&1,&v,wa,v") +(and:VM (not:VM (ior:VM (match_operand:VM 0 "vector_fusion_operand" "v,v,v,wa,v") + (match_operand:VM 1 "vector_fusion_operand" "v,v,v,wa,v"))) + (not:VM (match_operand:VM 2 "vector_fusion_operand" "v,v,v,wa,v" + (clobber (match_scratch:VM 4 "=X,X,X,X,&v"))] "(TARGET_P10_FUSION)" "@ vor %3,%1,%0\;vnor %3,%3,%2 vor %3,%1,%0\;vnor %3,%3,%2 vor %3,%1,%0\;vnor %3,%3,%2 + xxeval %x3,%x2,%x1,%x0,128 vor %4,%1,%0\;vnor %3,%4,%2" [(set_attr "type" "fused_vector") (set_attr "cost" "6") - (set_attr "length" "8")]) + (set_attr "length" "8") + (set_attr "prefixed" "*,*,*,yes,*") + (set_attr "isa" "*,*,*,xxeval,*")]) ;; logical-logical fusion pattern generated by gen_logical_addsubf ;; vector vorc -> vnor diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 9df4c8d6527e..58f900640bef 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -238,6 +238,7 @@ sub gen_logical_addsubf "vnor_vnor" => 112, "vor_vxor"=> 120, "vor_vor" => 127, + "vor_vnor"=> 128, ); KIND: foreach $kind ('scalar','vector') {
[gcc r16-1408] internal-fn: Fix up .POPCOUNT expansion
https://gcc.gnu.org/g:9e9c8aaab10ffeeb58c4936b55e8126ad5e31307 commit r16-1408-g9e9c8aaab10ffeeb58c4936b55e8126ad5e31307 Author: Jakub Jelinek Date: Wed Jun 11 07:00:27 2025 +0200 internal-fn: Fix up .POPCOUNT expansion Apparently my ranger during expansion patch broke bootstrap on aarch64-linux, while building libsupc++, there is endless recursion on __builtin_popcountl (x) == 1 expansion. The hack to temporarily replace SSA_NAME_VAR of the lhs which replaced the earlier hack to temporarily change the gimple_call_lhs relies on the lhs being expanded with EXPAND_WRITE when expanding that ifn call. Unfortunately, in two spots I was using expand_normal (lhs) instead of expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE) which was used everywhere else in internal-fn.cc. This happened to work fine in the past, but doesn't anymore. git blame shows it was my patch using these incorrect calls. 2025-06-11 Jakub Jelinek * internal-fn.cc (expand_POPCOUNT): Use expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE) instead of expand_normal (lhs). Diff: --- gcc/internal-fn.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index a0a73fefb906..3f4ac937367d 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -5561,7 +5561,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt) expand_unary_optab_fn (fn, stmt, popcount_optab); rtx_insn *popcount_insns = end_sequence (); start_sequence (); - rtx plhs = expand_normal (lhs); + rtx plhs = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); rtx pcmp = emit_store_flag (NULL_RTX, EQ, plhs, const1_rtx, lhsmode, 0, 0); if (pcmp == NULL_RTX) { @@ -5603,7 +5603,7 @@ expand_POPCOUNT (internal_fn fn, gcall *stmt) { start_sequence (); emit_insn (cmp_insns); - plhs = expand_normal (lhs); + plhs = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); if (GET_MODE (cmp) != GET_MODE (plhs)) cmp = convert_to_mode (GET_MODE (plhs), cmp, 1); /* For `<= 1`, we need to produce `2 - cmp` or `cmp ? 1 : 2` as that
[gcc r16-1410] testsuite: Add -mpopcnt and -mabm variants of PR90693 tests
https://gcc.gnu.org/g:e477e7cd104af96c55379f69125db3f1c350c9ed commit r16-1410-ge477e7cd104af96c55379f69125db3f1c350c9ed Author: Jakub Jelinek Date: Wed Jun 11 07:16:06 2025 +0200 testsuite: Add -mpopcnt and -mabm variants of PR90693 tests My r16-1398 patch broke bootstrap on aarch64-linux and powerpc64le-linux at least. Fixed with r16-1408. The following patch just adds testcases with which the bug can be reproduced also on x86_64-linux where it hasn't been caught by the testsuite (while there are 2 tests with it, both where compiled with -mno-abm -mno-popcnt and so didn't trigger the right path). This patch just includes those tests in 4 further ones, two with -mpopcnt and two with -mabm flags. 2025-06-11 Jakub Jelinek PR tree-optimization/90693 * gcc.target/i386/pr90693-3.c: New test. * gcc.target/i386/pr90693-4.c: New test. * gcc.target/i386/pr90693-5.c: New test. * gcc.target/i386/pr90693-6.c: New test. Diff: --- gcc/testsuite/gcc.target/i386/pr90693-3.c | 5 + gcc/testsuite/gcc.target/i386/pr90693-4.c | 5 + gcc/testsuite/gcc.target/i386/pr90693-5.c | 5 + gcc/testsuite/gcc.target/i386/pr90693-6.c | 5 + 4 files changed, 20 insertions(+) diff --git a/gcc/testsuite/gcc.target/i386/pr90693-3.c b/gcc/testsuite/gcc.target/i386/pr90693-3.c new file mode 100644 index ..601c83c1d586 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90693-3.c @@ -0,0 +1,5 @@ +/* PR tree-optimization/90693 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mpopcnt" } */ + +#include "pr90693.c" diff --git a/gcc/testsuite/gcc.target/i386/pr90693-4.c b/gcc/testsuite/gcc.target/i386/pr90693-4.c new file mode 100644 index ..b149159d3b97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90693-4.c @@ -0,0 +1,5 @@ +/* PR tree-optimization/90693 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mpopcnt" } */ + +#include "pr90693-2.c" diff --git a/gcc/testsuite/gcc.target/i386/pr90693-5.c b/gcc/testsuite/gcc.target/i386/pr90693-5.c new file mode 100644 index ..0a6a637a44b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90693-5.c @@ -0,0 +1,5 @@ +/* PR tree-optimization/90693 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabm" } */ + +#include "pr90693.c" diff --git a/gcc/testsuite/gcc.target/i386/pr90693-6.c b/gcc/testsuite/gcc.target/i386/pr90693-6.c new file mode 100644 index ..4040b5226501 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90693-6.c @@ -0,0 +1,5 @@ +/* PR tree-optimization/90693 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabm" } */ + +#include "pr90693-2.c"
[gcc r16-1409] ranger: Handle the theoretical case of GIMPLE_COND with one succ edge during expansion [PR120434]
https://gcc.gnu.org/g:f3dde39e597f48832208f423fb20f29674ce49ae commit r16-1409-gf3dde39e597f48832208f423fb20f29674ce49ae Author: Jakub Jelinek Date: Wed Jun 11 07:03:04 2025 +0200 ranger: Handle the theoretical case of GIMPLE_COND with one succ edge during expansion [PR120434] On Tue, Jun 10, 2025 at 10:51:25AM -0400, Andrew MacLeod wrote: > Edge range should be fine, and really that assert doesnt really need to be > there. > > Where the issue could arise is in gimple-range-fold.cc in > fold_using_range::range_of_range_op() where we see something like: > > else if (is_a (s) && gimple_bb (s)) > { > basic_block bb = gimple_bb (s); > edge e0 = EDGE_SUCC (bb, 0); > edge e1 = EDGE_SUCC (bb, 1); > > if (!single_pred_p (e0->dest)) > e0 = NULL; > if (!single_pred_p (e1->dest)) > e1 = NULL; > src.register_outgoing_edges (as_a (s), > as_a (r), e0, e1); > > Althogh, now that I look at it, it doesn't need much adjustment, just the > expectation that there are 2 edges. I suppose EDGE_SUCC (bb, 1) cpould > potentially trap if there is only one edge. we'd just have to guard it and > alloow for that case This patch implements that. 2025-06-11 Jakub Jelinek PR middle-end/120434 * gimple-range-fold.cc: Include rtl.h. (fold_using_range::range_of_range_op): Handle bb ending with GIMPLE_COND during RTL expansion where there is only one succ edge instead of two. Diff: --- gcc/gimple-range-fold.cc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc index aed5c7dc21eb..d18b37b33800 100644 --- a/gcc/gimple-range-fold.cc +++ b/gcc/gimple-range-fold.cc @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see #include "sreal.h" #include "ipa-cp.h" #include "ipa-prop.h" +#include "rtl.h" // Construct a fur_source, and set the m_query field. fur_source::fur_source (range_query *q) @@ -778,11 +779,14 @@ fold_using_range::range_of_range_op (vrange &r, { basic_block bb = gimple_bb (s); edge e0 = EDGE_SUCC (bb, 0); - edge e1 = EDGE_SUCC (bb, 1); + /* During RTL expansion one of the edges can be removed +if expansion proves the jump is unconditional. */ + edge e1 = single_succ_p (bb) ? NULL : EDGE_SUCC (bb, 1); + gcc_checking_assert (e1 || currently_expanding_to_rtl); if (!single_pred_p (e0->dest)) e0 = NULL; - if (!single_pred_p (e1->dest)) + if (e1 && !single_pred_p (e1->dest)) e1 = NULL; src.register_outgoing_edges (as_a (s), as_a (r), e0, e1);
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vremu.vv combine case 0 with GR2VR cost 0, 2 and 15
https://gcc.gnu.org/g:62503999c8624c0878cd1a955ecb29680d524d12 commit 62503999c8624c0878cd1a955ecb29680d524d12 Author: Pan Li Date: Mon Jun 9 16:33:52 2025 +0800 RISC-V: Add test for vec_duplicate + vremu.vv combine case 0 with GR2VR cost 0, 2 and 15 Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vremu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u8.c: New test. Signed-off-by: Pan Li (cherry picked from commit 0bdea31036e8268edd1b4ea3ed07478c07c96ad1) Diff: --- .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c | 2 + .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 + .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u16.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u32.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u64.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-u8.c | 15 ++ 17 files changed, 280 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c index 92fbf227d563..474fed2be15d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c @@ -12,6 +12,7 @@ DEF_VX_BINARY_CASE_0_WRAP(T, &, and) DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) DEF_VX_BINARY_CASE_0_WRAP(T, /, div) +DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -20,3 +21,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vdivu.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vremu.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c index f487b42820ee..28c0524c9934 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c @@ -12,6 +12,7 @@ DEF_VX_BINARY_CASE_0_WRAP(T, &, and) DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) DEF_VX_BINARY_CASE_0_WRAP(T, /, div) +DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -20,3 +21,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vdivu.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vremu.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c index 761d25c0833a..62c1ee996fd9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c +++ b/gcc/testsuite/gcc.target
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [RISC-V] Fix ICE due to splitter emitting constant loads directly
https://gcc.gnu.org/g:4c63893d1c1a92052016cfdb18854f9712bdf949 commit 4c63893d1c1a92052016cfdb18854f9712bdf949 Author: Jeff Law Date: Tue Jun 10 06:38:52 2025 -0600 [RISC-V] Fix ICE due to splitter emitting constant loads directly This is a fix for a bug found internally in Ventana using the cf3 testsuite. cf3 looks to be dead as a project and likely subsumed by modern fuzzers. In fact internally we tripped another issue with cf3 that had already been reported by Edwin with the fuzzer he runs. Anyway, the splitter in question blindly emits the 2nd adjusted constant into a register, that's not valid if the constant requires any kind of synthesis -- and it well could since we're mostly focused on the first constant turning into something that can be loaded via LUI without increasing the cost of the second constant. Instead of using the split RTL template, this just emits the code we want directly, using riscv_move_insn to synthesize the constant into the provided temporary register. Tested in my system. Waiting on upstream CI's verdict before moving forward. gcc/ * config/riscv/riscv.md (lui-constraintand_to_or): Do not use the RTL template for split code. Emit it directly taking care to avoid emitting a constant load that needed synthesis. Fix formatting. gcc/testsuite/ * gcc.target/riscv/ventana-16122.c: New test. (cherry picked from commit b93d8873cda88f0892c7782b274904fa8d3751fb) Diff: --- gcc/config/riscv/riscv.md | 18 +- gcc/testsuite/gcc.target/riscv/ventana-16122.c | 19 +++ 2 files changed, 32 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6d3c80a04c74..3aed25c25880 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -884,7 +884,7 @@ ;; Where C1 is not a LUI operand, but ~C1 is a LUI operand (define_insn_and_split "*lui_constraint_and_to_or" - [(set (match_operand:X 0 "register_operand" "=r") + [(set (match_operand:X 0 "register_operand" "=r") (plus:X (and:X (match_operand:X 1 "register_operand" "r") (match_operand 2 "const_int_operand")) (match_operand 3 "const_int_operand"))) @@ -898,13 +898,21 @@ <= riscv_const_insns (operands[3], false)))" "#" "&& reload_completed" - [(set (match_dup 4) (match_dup 5)) - (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4))) - (set (match_dup 4) (match_dup 6)) - (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))] + [(const_int 0)] { operands[5] = GEN_INT (~INTVAL (operands[2])); operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3]))); + +/* This is always a LUI operand, so it's safe to just emit. */ +emit_move_insn (operands[4], operands[5]); + +rtx x = gen_rtx_IOR (word_mode, operands[1], operands[4]); +emit_move_insn (operands[0], x); + +/* This may require multiple steps to synthesize. */ +riscv_emit_move (operands[4], operands[6]); +x = gen_rtx_MINUS (word_mode, operands[0], operands[4]); +emit_move_insn (operands[0], x); } [(set_attr "type" "arith")]) diff --git a/gcc/testsuite/gcc.target/riscv/ventana-16122.c b/gcc/testsuite/gcc.target/riscv/ventana-16122.c new file mode 100644 index ..59e6467b57c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/ventana-16122.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target { rv64 } } } */ + +extern void NG (void); +typedef signed char int8_t; +typedef signed short int16_t; +typedef signed int int32_t; +void f74(void) { + int16_t x309 = 0x7fff; + volatile int32_t x310 = 0x7fff; + int8_t x311 = 59; + int16_t x312 = -0x8000; + static volatile int32_t t74 = 614992577; + +t74 = (x309==((x310^x311)%x312)); + +if (t74 != 0) { NG(); } else { ; } + +} +
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [RISC-V] Enable more if-conversion on RISC-V
https://gcc.gnu.org/g:65b255f2b11679630d29e4d77557e97caf7c2653 commit 65b255f2b11679630d29e4d77557e97caf7c2653 Author: Jeff Law Date: Mon Jun 9 06:55:21 2025 -0600 [RISC-V] Enable more if-conversion on RISC-V Another czero related adjustment. This time in costing of conditional move sequences. Essentially a copy from a promoted subreg can and should be ignored from a costing standpoint. We had some code to do this, but its conditions were too strict. No real surprises evaluating spec. This should be a minor, but probably not measurable improvement in x264 and xz. It is if-converting more in some particular harm to hot routines, but not necessarily in the hot parts of those routines. It's been tested on riscv32-elf and riscv64-elf. Versions of this have bootstrapped and regression tested as well, though perhaps not this exact version. Waiting on pre-commit testing. gcc/ * config/riscv/riscv.cc (riscv_noce_conversion_profitable_p): Relax condition for adjustments due to copies from promoted SUBREGs. (cherry picked from commit af3de9e20968c8fb0f5b950e4b0753a28a1d1dc3) Diff: --- gcc/config/riscv/riscv.cc | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index f98072cca7ce..14ac2f3cdbc1 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -4609,16 +4609,14 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq, rtx dest = SET_DEST (x); - /* Do something similar for the moves that are likely to + /* Do something similar for the moves that are likely to turn into NOP moves by the time the register allocator is -done. These are also side effects of how our sCC expanders -work. We'll want to check and update LAST_DEST here too. */ - if (last_dest - && REG_P (dest) +done. We don't require src to be something set in this +sequence, just a promoted SUBREG. */ + if (REG_P (dest) && GET_MODE (dest) == SImode && SUBREG_P (src) - && SUBREG_PROMOTED_VAR_P (src) - && REGNO (SUBREG_REG (src)) == REGNO (last_dest)) + && SUBREG_PROMOTED_VAR_P (src)) { riscv_if_info.original_cost += COSTS_N_INSNS (1); riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Reconcile the existing test for vremu.vx combine
https://gcc.gnu.org/g:2e3b57db30a7b75eebc0e58470ada83c3e68c219 commit 2e3b57db30a7b75eebc0e58470ada83c3e68c219 Author: Pan Li Date: Mon Jun 9 16:28:50 2025 +0800 RISC-V: Reconcile the existing test for vremu.vx combine Some existing vrem related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Adjust the asm check for vremu. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit b59354cf309052de6a1c297f06411691c03bfd24) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c index ad918a9b800a..10de7c268e5e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c @@ -4,8 +4,8 @@ /* { dg-final { scan-assembler-times {\tvrem\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvrem\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvremu\.vv} 5 } } */ -/* { dg-final { scan-assembler-times {\tvremu\.vx} 3 } } */ +/* { dg-final { scan-assembler-times {\tvremu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvremu\.vx} } } */ /* { dg-final { scan-tree-dump-times "\.COND_LEN_MOD" 16 "optimized" } } */ /* { dg-final { scan-assembler-not {\tvmv1r\.v} } } */ /* { dg-final { scan-assembler-not {\tvmv2r\.v} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c index 4e28f99e2886..cf187a2bde7c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c @@ -5,8 +5,8 @@ /* { dg-final { scan-assembler-times {\tvrem\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvrem\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvremu\.vv} 4 } } */ -/* { dg-final { scan-assembler-times {\tvremu\.vx} 4 } } */ +/* { dg-final { scan-assembler-times {\tvremu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvremu\.vx} } } */ /* { dg-final { scan-tree-dump-times "\.COND_LEN_MOD" 16 "optimized" } } */ /* { dg-final { scan-assembler-not {\tvmv1r\.v} } } */ /* { dg-final { scan-assembler-not {\tvmv2r\.v} } } */
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Combine vec_duplicate + vremu.vv to vremu.vx on GR2VR cost
https://gcc.gnu.org/g:1af3a0a0a9fbd5c74b1f623cdb50ace115ee3c97 commit 1af3a0a0a9fbd5c74b1f623cdb50ace115ee3c97 Author: Pan Li Date: Mon Jun 9 16:24:34 2025 +0800 RISC-V: Combine vec_duplicate + vremu.vv to vremu.vx on GR2VR cost This patch would like to combine the vec_duplicate + vremu.vv to the vremu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP)\ void\ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x;\ } DEF_VX_BINARY(int32_t, /) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ sllia3,a3,32 15 │ srlia3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ sllia4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vremu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ sllia3,a3,32 13 │ srlia3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ sllia4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vremu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new case UMOD. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op umod. Signed-off-by: Pan Li (cherry picked from commit 85de2b8b58e1644f6d5f0f182426122416b19e6f) Diff: --- gcc/config/riscv/riscv-v.cc | 1 + gcc/config/riscv/riscv.cc| 1 + gcc/config/riscv/vector-iterators.md | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index c31ec9e9b419..420baa587dc2 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -5570,6 +5570,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx op_2, case DIV: case UDIV: case MOD: +case UMOD: icode = code_for_pred_scalar (code, mode); break; default: diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 14ac2f3cdbc1..d5ab128f05ff 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3950,6 +3950,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case DIV: case UDIV: case MOD: + case UMOD: *total = get_vector_binary_rtx_cost (op, scalar2vr_cost); break; default: diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index b1fd607320ef..42fc04c0ad38 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -4042,7 +4042,7 @@ ]) (define_code_iterator any_int_binop_no_shift_v_vdup [ - plus minus and ior xor mult div udiv mod + plus minus and ior xor mult div udiv mod umod ]) (define_code_iterator any_int_binop_no_shift_vdup_v [
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn
https://gcc.gnu.org/g:807bdaaea37a400347258a8be14f9c7e35378195 commit 807bdaaea37a400347258a8be14f9c7e35378195 Author: Vineet Gupta Date: Sun Jun 8 14:54:37 2025 -0700 RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn This showed up when debugging the testcase for PR119164. RISC-V FRM mode-switching state machine has special handling for transitions to and from a call_insn as FRM needs to saved/restored around calls despite it not being a callee-saved reg; rather it's a "global" reg which can be temporarily modified "locally" with a static RM. Thus a call needs to see the prior global state, hence the restore (from a prior backup) before the call. Corollarily any call can potentially clobber the FRM, thus post-call it needs to be it needs to be re-read/saved. The following example demostrate this: - insns 2, 4, 6 correspond to actual user code, - rest 1, 3, 5, 6 are frm save/restore insns generated by mode switch for the above described ABI semantics. test_float_point_frm_static: 1: frrma5 <-- 2: fsrmi 2 3: fsrma5 <-- 4: callnormalize_vl 5: frrma5 <-- 6: fsrmi 3 7: fsrma5 <-- Current implementation of RISC-V TARGET_MODE_NEEDED has special handling if the call_insn is last insn of BB, to ensure FRM save/reads are emitted on all the edges. However it doesn't work as intended and is borderline bogus for following reasons: - It fails to detect call_insn as last of BB (PR119164 test) if the next BB starts with a code label (say due to call being conditional). Granted this is a deficiency of API next_nonnote_nondebug_insn_bb () which incorrectly returns next BB code_label as opposed to returning NULL (and this behavior is kind of relied upon by much of gcc). This causes missed/delayed state transition to DYN. - If code is tightened to actually detect above such as: - rtx_insn *insn = next_nonnote_nondebug_insn_bb (cur_insn); - if (!insn) + if (BB_END (BLOCK_FOR_INSN (cur_insn)) == cur_insn) edge insertion happens but ends up splitting the BB which generic mode-sw doesn't expect and ends up hittng an ICE. - TARGET_MODE_NEEDED hook typically don't modify the CFG. - For abnormal edges, insert_insn_end_basic_block () is called, which by design on encountering call_insn as last in BB, inserts new insn BEFORE the call, not after. So this is just all wrong and ripe for removal. Moreover there seems to be no testsuite coverage for this code path at all. Results don't change at all if this is removed. The total number of FRM read/writes emitted (static count) across all benchmarks of a SPEC2017 -Ofast -march=rv64gcv build decrease slightly so its a net win even if minimal but the real gain is reduced complexity and maintenance. Before Patch --- frrm fsrmi fsrm frrm fsrmi frrm perlbench_r 4204 4204 cpugcc_r 1670 17 1670 17 bwaves_r 1601 1601 mcf_r 1100 1100 cactusBSSN_r 790 27 760 27 namd_r 1190 63 1190 63 parest_r 2180 114 1680 114 <-- povray_r 1231 17 1231 17 lbm_r600 600 omnetpp_r 1701 1701 wrf_r 2287 13 19562287 13 1956 cpuxalan_r 1701 1701 ldecod_r 1100 1100 x264_r 1401 1401 blender_r 724 12 182 724 12 182 cam4_r 324 13 169 324 13 169 deepsjeng_r 1100 1100 imagick_r 265 16 34 265 16 34 leela_r 1200 1200 nab_r 1301 1301 exchange2_r 1601 1601 fotonik3d_r 200 11 200 11 roms_r 330 23 330 23 xz_r600 600 --- 4551 55 26234498 55 2623 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_frm_emit_after_bb_end): Delete. (riscv_frm_mode_needed): Remove call riscv_frm_emit_after_bb_end. Signed-off-by: Vineet Gupta (cherry picked from commit 01b89455b09df72285a85e4fda1ff14fe4191d9e) Diff: --- gcc/config/riscv/riscv.cc | 44 ---
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vdivu.vv combine case 1 with GR2VR cost 0, 1 and 2
https://gcc.gnu.org/g:f47952a91134c4c78e0eeae2f7b455030eeb8ad8 commit f47952a91134c4c78e0eeae2f7b455030eeb8ad8 Author: Pan Li Date: Fri Jun 6 09:51:10 2025 +0800 RISC-V: Add test for vec_duplicate + vdivu.vv combine case 1 with GR2VR cost 0, 1 and 2 Add asm dump check test for vec_duplicate + vdivu.vv combine to vdivu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vdivu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit c01830fa809fa18d1d54b29a89cb65f3bb8f5676) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c | 2 ++ 12 files changed, 24 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c index 4bc0850f6737..58e4a1e96d6c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c @@ -11,6 +11,7 @@ DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16) +DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -18,3 +19,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vand.vx} } } */ /* { dg-final { scan-assembler {vor.vx} } } */ /* { dg-final { scan-assembler {vxor.vx} } } */ +/* { dg-final { scan-assembler {vdivu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c index 255273d767f0..3d5f53568dbe 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c @@ -11,6 +11,7 @@ DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X4) +DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -18,3 +19,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vand.vx} } } */ /* { dg-final { scan-assembler {vor.vx} } } */ /* { dg-final { scan-assembler {vxor.vx} } } */ +/* { dg-final { scan-assembler {vdivu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c index d21f61b49e73..0edb9257a7a7 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c @@ -11,6 +11,7 @@ DEF_VX_BINARY_REVERSE_CASE_1_WRAP(T, -, rsub, VX_BINARY_REVERSE_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY) +DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -18,3 +19,4 @@ DEF_VX_BINARY_CASE_1_WRAP
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [RISC-V] Improve signed division by 2^n
https://gcc.gnu.org/g:5dd9e41c33464836a07c7cd93111a439993e26d8 commit 5dd9e41c33464836a07c7cd93111a439993e26d8 Author: Jeff Law Date: Thu Jun 5 16:58:45 2025 -0600 [RISC-V] Improve signed division by 2^n So another class of cases where we can do better than a zicond sequence. Like the prior patch this came up evaluating some code from Shreya to detect more conditional move cases. This patch allows us to use the "splat the sign bit" idiom to efficiently select between 0 and 2^n-1. That's particularly important for signed division by a power of two. For signed division by a power of 2, you conditionally add 2^n-1 to the numerator, then right shift that result. Using zicond somewhat naively you get something like this (for n / 4096): > li a5,4096 > addia5,a5,-1 > sltia4,a0,0 > add a5,a0,a5 > czero.eqz a5,a5,a4 > czero.nez a0,a0,a4 > add a0,a0,a5 > sraia0,a0,12 After this patch you get this instead: > sraia5,a0,63 > srlia5,a5,52 > add a0,a5,a0 > sraia0,a0,12 It's not *that* much faster, but it's certainly shorter. So the trick here is that after splatting the sign bit we have 0, -1. So a subsequent logical shift right would generate 0 or 2^n-1. Yes, there a nice variety of other constant pairs we can select between. Some notes have been added to the PR I opened yesterday. The first thing we need to do is throttle back zicond generation. Unfortunately we don't see the constants from the division-by-2^n algorithm, so we have to disable for all lt/ge 0 cases. This can have small negative impacts. I looked at this across spec and didn't see anything I was particularly worried about and numerous small improvements from that alone. With that in place we need to recognize the form seen by combine. Essentially it sees the splat of the sign bit feeding a logical AND. We split that into two right shifts. This has survived in my tester. Waiting on upstream pre-commit before moving forward. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Avoid zicond in some cases involving sign bit tests. * config/riscv/riscv.md: Split a splat of the sign bit feeding a masking off high bits into a pair of right shifts. gcc/testsuite * gcc.target/riscv/nozicond-3.c: New test. (cherry picked from commit 409ea888f73b2d4ae17686b28d33ca4634dafcfb) Diff: --- gcc/config/riscv/riscv.cc | 34 + gcc/config/riscv/riscv.md | 18 +++ gcc/testsuite/gcc.target/riscv/nozicond-3.c | 11 ++ 3 files changed, 63 insertions(+) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 3254ec9f9e13..413eae05f4c9 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5393,6 +5393,40 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) rtx op0 = XEXP (op, 0); rtx op1 = XEXP (op, 1); + /* For some tests, we can easily construct a 0, -1 value + which can then be used to synthesize more efficient + sequences that don't use zicond. */ + if ((code == LT || code == GE) + && (REG_P (op0) || SUBREG_P (op0)) + && op1 == CONST0_RTX (GET_MODE (op0))) +{ + /* The code to expand signed division by a power of 2 uses a +conditional add by 2^n-1 idiom. It can be more efficiently +synthesized without zicond using srai+srli+add. + +But we don't see the constants here. Just a conditional move +with registers as the true/false values. So this is a little +over-aggressive and can result in a few missed if-conversions. */ + if ((REG_P (cons) || SUBREG_P (cons)) + && (REG_P (alt) || SUBREG_P (alt))) + return false; + + /* If one value is a nonzero constant and the other value is +not a constant, then avoid zicond as more efficient sequences +using the splatted sign bit are often possible. */ + if (CONST_INT_P (alt) + && alt != CONST0_RTX (mode) + && !CONST_INT_P (cons)) + return false; + + if (CONST_INT_P (cons) + && cons != CONST0_RTX (mode) + && !CONST_INT_P (alt)) + return false; + + /* If we need more special cases, add them here. */ +} + if (((TARGET_ZICOND_LIKE || (arith_operand (cons, mode) && arith_operand (alt, mode))) && (GET_MODE_CLASS (mode) == MODE_INT)) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 92fe7c7741a2..6d3c80a04c74 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -4834,6 +4834,24 @@ [(set_attr
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition [PR119164]
https://gcc.gnu.org/g:e9f1bc45ff3500d57d3ad5d7372b5092f2c121fc commit e9f1bc45ff3500d57d3ad5d7372b5092f2c121fc Author: Vineet Gupta Date: Sun Jun 8 14:55:01 2025 -0700 RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition [PR119164] FRM mode switching state machine has DYN as default state which it also fallsback to after transitioning to other states such as DYN_CALL. Currently TARGET_MODE_EMIT generates a FRM restore on any transition to DYN leading to spurious/extraneous FRM restores. Only do this if an interim static Rounding Mode was observed in the state machine. Fixes the extraneous FRM read/write in PR119164 (and also PR119832 w/o need for TARGET_MODE_CONFLUENCE). Also reduces the number of FRM writes in SPEC2017 -Ofast -mrv64gcv build significantly. BeforeAfter - - frrm fsrmi fsrm frrm fsrmi frrm perlbench_r 4204 1701 cpugcc_r 1670 17 1100 bwaves_r 1601 1601 mcf_r 1100 1100 cactusBSSN_r 760 27 1901 namd_r 1190 63 1401 parest_r 1680 114 2401 povray_r 1231 17 2616 lbm_r600 600 omnetpp_r 1701 1701 wrf_r 2287 13 19561268 13 1603 cpuxalan_r 1701 1701 ldecod_r 1100 1100 x264_r 1401 1100 blender_r 724 12 182 61 12 42 cam4_r 324 13 169 45 13 20 deepsjeng_r 1100 1100 imagick_r 265 16 34 132 16 25 leela_r 1200 1200 nab_r 1301 1301 exchange2_r 1601 1601 fotonik3d_r 200 11 1901 roms_r 330 23 2101 xz_r600 600 ----- 4498 55 26231804 55 1707 ----- 7176 3566 ----- PR target/119164 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_emit_frm_mode_set): check STATIC_FRM_P for transition to DYN. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr119164.c: New test. Signed-off-by: Vineet Gupta (cherry picked from commit 3c0f3b74bf6011b12fe12821ba6e1079309d9445) Diff: --- gcc/config/riscv/riscv.cc | 2 +- gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c | 22 ++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index a1bb51af2be4..1e56ee5dcb63 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12277,7 +12277,7 @@ riscv_emit_frm_mode_set (int mode, int prev_mode) && prev_mode != riscv_vector::FRM_DYN && prev_mode != riscv_vector::FRM_DYN_CALL) /* Restore frm value when switch to DYN mode. */ - || (mode == riscv_vector::FRM_DYN + || (STATIC_FRM_P (cfun) && mode == riscv_vector::FRM_DYN && prev_mode != riscv_vector::FRM_DYN_CALL); if (restore_p) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c new file mode 100644 index ..a39a7f177f05 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c @@ -0,0 +1,22 @@ +/* Reduced from SPEC2017 blender: node_texture_util.c. + The conditional function call was tripping mode switching state machine */ + +/* { dg-do compile } */ +/* { dg-options " -Ofast -march=rv64gcv_zvl256b -ftree-vectorize -mrvv-vector-bits=zvl" } */ + +void *a; +float *b; +short c; +void d(); +void e() { + if (a) +d(); + if (c) { +b[0] = b[0] * 0.5f + 0.5f; +b[1] = b[1] * 0.5f + 0.5f; + } +} + +/* { dg-final { scan-assembler-not {frrm\s+[axs][0-9]+} } } */ +/* { dg-final { scan-assembler-not {fsrmi\s+[01234]} } } */ +/* { dg-final { scan-assembler-not {fsrm\s+[axs][0-9]+} } } */
[gcc r16-1394] libstdc++: Implement LWG3528 make_from_tuple can perform (the equivalent of) a C-style cast
https://gcc.gnu.org/g:73edc003c0a8f0badc7027e6deefd3a573300b03 commit r16-1394-g73edc003c0a8f0badc7027e6deefd3a573300b03 Author: Yihan Wang Date: Mon Jun 9 11:07:51 2025 +0100 libstdc++: Implement LWG3528 make_from_tuple can perform (the equivalent of) a C-style cast Implement LWG3528 to make std::make_from_tuple SFINAE friendly. libstdc++-v3/ChangeLog: * include/std/tuple (__can_make_from_tuple): New variable template. (__make_from_tuple_impl): Add static_assert. (make_from_tuple): Constrain using __can_make_from_tuple. * testsuite/20_util/tuple/dr3528.cc: New test. Signed-off-by: Yihan Wang Co-authored-by: Jonathan Wakely Reviewed-by: Tomasz Kamiński Diff: --- libstdc++-v3/include/std/tuple | 24 -- libstdc++-v3/testsuite/20_util/tuple/dr3528.cc | 46 ++ 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple index 2e69af13a98b..b39ce710984c 100644 --- a/libstdc++-v3/include/std/tuple +++ b/libstdc++-v3/include/std/tuple @@ -2939,19 +2939,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #endif #ifdef __cpp_lib_make_from_tuple // C++ >= 17 + template >>> +constexpr bool __can_make_from_tuple = false; + + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3528. make_from_tuple can perform (the equivalent of) a C-style cast + template +constexpr bool __can_make_from_tuple<_Tp, _Tuple, index_sequence<_Idx...>> + = is_constructible_v<_Tp, + decltype(std::get<_Idx>(std::declval<_Tuple>()))...>; + template constexpr _Tp __make_from_tuple_impl(_Tuple&& __t, index_sequence<_Idx...>) -{ return _Tp(std::get<_Idx>(std::forward<_Tuple>(__t))...); } +{ + static_assert(__can_make_from_tuple<_Tp, _Tuple, index_sequence<_Idx...>>); + return _Tp(std::get<_Idx>(std::forward<_Tuple>(__t))...); +} #if __cpp_lib_tuple_like // >= C++23 template #else template #endif -constexpr _Tp +constexpr auto make_from_tuple(_Tuple&& __t) noexcept(__unpack_std_tuple) +#ifdef __cpp_concepts // >= C++20 +-> _Tp +requires __can_make_from_tuple<_Tp, _Tuple> +#else +-> __enable_if_t<__can_make_from_tuple<_Tp, _Tuple>, _Tp> +#endif { constexpr size_t __n = tuple_size_v>; #if __has_builtin(__reference_constructs_from_temporary) diff --git a/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc b/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc new file mode 100644 index ..c20ff95e12da --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/tuple/dr3528.cc @@ -0,0 +1,46 @@ +// { dg-do compile { target c++17 } } + +// LWG 3528. make_from_tuple can perform (the equivalent of) a C-style cast + +#include +#include +#include + +template +using make_t = decltype(std::make_from_tuple(std::declval())); + +template +constexpr bool can_make = false; +template +constexpr bool can_make>> = true; + +static_assert( can_make> ); +static_assert( can_make&> ); +static_assert( can_make&> ); +static_assert( can_make> ); +static_assert( can_make&&> ); +static_assert( can_make, std::pair> ); +static_assert( can_make, std::array> ); +static_assert( can_make> ); +static_assert( can_make> ); +static_assert( can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make&> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make> ); + +struct Two +{ + Two(const char*, int); +}; + +static_assert( can_make> ); +static_assert( ! can_make> ); +static_assert( can_make> ); +static_assert( ! can_make> ); +static_assert( ! can_make, std::array> );
[gcc r16-1397] Check if constant is a member before returning it.
https://gcc.gnu.org/g:6a4da727020b24b02b062f4bff718c9a5699629c commit r16-1397-g6a4da727020b24b02b062f4bff718c9a5699629c Author: Andrew MacLeod Date: Tue Jun 10 12:11:18 2025 -0400 Check if constant is a member before returning it. set_range_from_bitmask checks the new bitmask, and if it is a constant, simply returns the constant. It never checks if that constant is actually within the range. If it is not, the result should be UNDEFINED. * value-range.cc (irange::set_range_from_bitmask): When the bitmask result is a singleton, check if it is contained in the range. Diff: --- gcc/value-range.cc | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index ed3760fa6ff6..e2d75f59c2e0 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -2268,7 +2268,11 @@ irange::set_range_from_bitmask () // If all the bits are known, this is a singleton. if (m_bitmask.mask () == 0) { - set (m_type, m_bitmask.value (), m_bitmask.value ()); + // Make sure the singleton is within the range. + if (contains_p (m_bitmask.value ())) + set (m_type, m_bitmask.value (), m_bitmask.value ()); + else + set_undefined (); return true; }
[gcc r16-1402] RISC-V: testsuite: fix an obvious build error
https://gcc.gnu.org/g:0005b1e577135bf0345447529d138f4d15618ec0 commit r16-1402-g0005b1e577135bf0345447529d138f4d15618ec0 Author: Vineet Gupta Date: Tue May 20 14:15:53 2025 -0700 RISC-V: testsuite: fix an obvious build error For a non-multilib build, I see following errors. | FAIL: gcc.target/riscv/rvv/vtype-call-clobbered.c (test for excess errors) | Excess errors: | TC-INSTxyz/sysroot/usr/include/gnu/stubs.h:14:11: fatal error: gnu/stubs-lp64.h: No such file or directory compilation terminated. The test selects non default ABI lp64 (vs. lp64d) for no real reason. Fix that. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vtype-call-clobbered.c: Fix -mabi. Signed-off-by: Vineet Gupta Diff: --- gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c index be9f312aa508..78c8a4af8166 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vtype-call-clobbered.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-march=rv64gcv -mabi=lp64 -O2" } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O2" } */ /* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */ #include "riscv_vector.h"
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [to-be-committed][RISC-V] Handle 32bit operands in condition for conditional moves
https://gcc.gnu.org/g:38f54b2523adf37f545c92b5dac751a421047731 commit 38f54b2523adf37f545c92b5dac751a421047731 Author: Jeff Law Date: Sat Jun 7 07:48:46 2025 -0600 [to-be-committed][RISC-V] Handle 32bit operands in condition for conditional moves So here's the next chunk of conditional move work from Shreya. It's been a long standing wart that the conditional move expander does not support sub-word operands in the comparison. Particularly since we have support routines to handle the necessary extensions for that case. This patch adjusts the expander to use riscv_extend_comparands rather than fail for that case. I've built spec2017 before/after this and we definitely get more conditional moves and they look sensible from a performance standpoint. None are likely hitting terribly hot code, so I wouldn't expect any performance jumps. Waiting on pre-commit testing to do its thing. * config/riscv/riscv.cc (riscv_expand_conditional_move): Use riscv_extend_comparands to extend sub-word comparison arguments. Co-authored-by: Jeff Law (cherry picked from commit 59a3da733a79f621700dd9ddc11a0efc07237c3a) Diff: --- gcc/config/riscv/riscv.cc | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 99eeba64b6f9..dd29059412b1 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5436,13 +5436,18 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) machine_mode mode0 = GET_MODE (op0); machine_mode mode1 = GET_MODE (op1); - /* An integer comparison must be comparing WORD_MODE objects. We -must enforce that so that we don't strip away a sign_extension -thinking it is unnecessary. We might consider using -riscv_extend_operands if they are not already properly extended. */ + /* An integer comparison must be comparing WORD_MODE objects. +Extend the comparison arguments as necessary. */ if ((INTEGRAL_MODE_P (mode0) && mode0 != word_mode) || (INTEGRAL_MODE_P (mode1) && mode1 != word_mode)) - return false; + riscv_extend_comparands (code, &op0, &op1); + + /* We might have been handed back a SUBREG. Just to make things +easy, force it into a REG. */ + if (!REG_P (op0) && !CONST_INT_P (op0)) + op0 = force_reg (word_mode, op0); + if (!REG_P (op1) && !CONST_INT_P (op1)) + op1 = force_reg (word_mode, op1); /* In the fallback generic case use MODE rather than WORD_MODE for the output of the SCC instruction, to match the mode of the NEG
[gcc r16-1395] libstdc++: Make __max_size_type and __max_diff_type structural
https://gcc.gnu.org/g:1f402fe23b0d4cf024688a729f4c86c37144d54a commit r16-1395-g1f402fe23b0d4cf024688a729f4c86c37144d54a Author: Patrick Palka Date: Tue Jun 10 10:15:25 2025 -0400 libstdc++: Make __max_size_type and __max_diff_type structural This patch makes these integer-class types structural types by public-izing their data members so that they could be used as NTTP types. I don't think this is required by the standard, but it seems like a useful extension. libstdc++-v3/ChangeLog: * include/bits/max_size_type.h (__max_size_type::_M_val): Make public instead of private. (__max_size_type::_M_msb): Likewise. (__max_diff_type::_M_rep): Likewise. * testsuite/std/ranges/iota/max_size_type.cc: Verify __max_diff_type and __max_size_type are structural. Reviewed-by: Tomasz Kamiński Reviewed-by: Jonathan Wakely Diff: --- libstdc++-v3/include/bits/max_size_type.h | 4 ++-- libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc | 7 +++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/max_size_type.h b/libstdc++-v3/include/bits/max_size_type.h index 5bec0b5a519a..e602b1b4bee5 100644 --- a/libstdc++-v3/include/bits/max_size_type.h +++ b/libstdc++-v3/include/bits/max_size_type.h @@ -425,10 +425,11 @@ namespace ranges using __rep = unsigned long long; #endif static constexpr size_t _S_rep_bits = sizeof(__rep) * __CHAR_BIT__; -private: + __rep _M_val = 0; unsigned _M_msb:1 = 0; +private: constexpr explicit __max_size_type(__rep __val, int __msb) noexcept : _M_val(__val), _M_msb(__msb) @@ -752,7 +753,6 @@ namespace ranges { return !(__l < __r); } #endif -private: __max_size_type _M_rep = 0; friend class __max_size_type; diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc index 3e6f954ceb0c..4739d9e2f790 100644 --- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc +++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc @@ -400,6 +400,13 @@ static_assert(max_diff_t(max_size_t(1) << (numeric_limits::digits-1)) == numeric_limits::min()); +// Verify that the types are structural types and can therefore be used +// as NTTP types. +template struct Su { static_assert(V*V == V+132); }; +template struct Ss { static_assert(V*V == V+132); }; +template struct Su<12>; +template struct Ss<12>; + int main() {
[gcc r16-1368] ada: Fix Value_Decimal to raise Constraint_Error on boundary values
https://gcc.gnu.org/g:5db0a4c3c736a2164774344c3c1b4c3b34e59a75 commit r16-1368-g5db0a4c3c736a2164774344c3c1b4c3b34e59a75 Author: Eric Botcazou Date: Tue Mar 18 22:44:15 2025 +0100 ada: Fix Value_Decimal to raise Constraint_Error on boundary values Even though the issue is not user-visible, it's a (minor) departure from the specification of the procedure. gcc/ada/ChangeLog: * libgnat/s-valued.adb (Integer_to_Decimal): Add Extra parameter and use its value to call Bad_Value on boundary values. (Scan_Decimal): Adjust call to Integer_to_Decimal. (Value_Decimal): Likewise. Diff: --- gcc/ada/libgnat/s-valued.adb | 27 +++ 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/gcc/ada/libgnat/s-valued.adb b/gcc/ada/libgnat/s-valued.adb index dfef9a885e52..cc2cffc72a63 100644 --- a/gcc/ada/libgnat/s-valued.adb +++ b/gcc/ada/libgnat/s-valued.adb @@ -39,13 +39,15 @@ package body System.Value_D is -- We need an unsigned type large enough to represent the mantissa package Impl is new Value_R (Uns, 1, 2**(Int'Size - 1), Round => False); - -- We do not use the Extra digit for decimal fixed-point types + -- We do not use the Extra digit for decimal fixed-point types, except to + -- effectively ensure that overflow is detected near the boundaries. function Integer_to_Decimal (Str: String; Val: Uns; Base : Unsigned; ScaleB : Integer; + Extra : Unsigned; Minus : Boolean; Scale : Integer) return Int; -- Convert the real value from integer to decimal representation @@ -59,6 +61,7 @@ package body System.Value_D is Val: Uns; Base : Unsigned; ScaleB : Integer; + Extra : Unsigned; Minus : Boolean; Scale : Integer) return Int is @@ -126,6 +129,10 @@ package body System.Value_D is end if; end Unsigned_To_Signed; + -- Local variables + + E : Uns := Uns (Extra); + begin -- If the base of the value is 10 or its scaling factor is zero, then -- add the scales (they are defined in the opposite sense) and apply @@ -143,9 +150,10 @@ package body System.Value_D is end loop; while S > 0 loop - if V <= Uns'Last / 10 then - V := V * 10; + if V <= (Uns'Last - E) / 10 then + V := V * 10 + E; S := S - 1; + E := 0; else Bad_Value (Str); end if; @@ -193,8 +201,9 @@ package body System.Value_D is Z := 10 ** Integer'Max (0, -Scale); for J in 1 .. LS loop - if V <= Uns'Last / Uns (B) then -V := V * Uns (B); + if V <= (Uns'Last - E) / Uns (B) then +V := V * Uns (B) + E; +E := 0; else Bad_Value (Str); end if; @@ -207,7 +216,7 @@ package body System.Value_D is raise Program_Error; end if; --- Perform a scale divide operation with rounding to match 'Image +-- Perform a scaled divide operation with rounding to match 'Image Scaled_Divide (Unsigned_To_Signed (V), Y, Z, Q, R, Round => True); @@ -238,7 +247,8 @@ package body System.Value_D is begin Val := Impl.Scan_Raw_Real (Str, Ptr, Max, Base, Scl, Extra, Minus); - return Integer_to_Decimal (Str, Val (1), Base, Scl (1), Minus, Scale); + return +Integer_to_Decimal (Str, Val (1), Base, Scl (1), Extra, Minus, Scale); end Scan_Decimal; --- @@ -255,7 +265,8 @@ package body System.Value_D is begin Val := Impl.Value_Raw_Real (Str, Base, Scl, Extra, Minus); - return Integer_to_Decimal (Str, Val (1), Base, Scl (1), Minus, Scale); + return +Integer_to_Decimal (Str, Val (1), Base, Scl (1), Extra, Minus, Scale); end Value_Decimal; end System.Value_D;
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
https://gcc.gnu.org/g:2b418bd916ce1d993c66a4658e4e7804d9256c45 commit 2b418bd916ce1d993c66a4658e4e7804d9256c45 Author: Vineet Gupta Date: Sun Jun 8 14:44:29 2025 -0700 RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE This is effectively reverting e5d1f538bb7d "(RISC-V: Allow different dynamic floating point mode to be merged)" while retaining the testcase. The change itself is valid, however it obfuscates the deficiencies in current frm mode switching code. Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating net more FRM restores (writes) vs. the rest of this changeset. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove. (riscv_mode_confluence): Ditto. (TARGET_MODE_CONFLUENCE): Ditto. Signed-off-by: Vineet Gupta (cherry picked from commit ac0fea67b9591197a3f21dd4fb924d87cc559e7e) Diff: --- gcc/config/riscv/riscv.cc | 37 - 1 file changed, 37 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index aa8cd97b3102..d032578f19a4 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -12485,41 +12485,6 @@ riscv_mode_needed (int entity, rtx_insn *insn, HARD_REG_SET) } } -/* Return TRUE if the rouding mode is dynamic. */ - -static bool -riscv_dynamic_frm_mode_p (int mode) -{ - return mode == riscv_vector::FRM_DYN -|| mode == riscv_vector::FRM_DYN_CALL -|| mode == riscv_vector::FRM_DYN_EXIT; -} - -/* Implement TARGET_MODE_CONFLUENCE. */ - -static int -riscv_mode_confluence (int entity, int mode1, int mode2) -{ - switch (entity) -{ -case RISCV_VXRM: - return VXRM_MODE_NONE; -case RISCV_FRM: - { - /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible. - Although we already try to set the mode needed to FRM_DYN after a - function call, there are still some corner cases where both FRM_DYN - and FRM_DYN_CALL may appear on incoming edges. */ - if (riscv_dynamic_frm_mode_p (mode1) - && riscv_dynamic_frm_mode_p (mode2)) - return riscv_vector::FRM_DYN; - return riscv_vector::FRM_NONE; - } -default: - gcc_unreachable (); -} -} - /* Return TRUE that an insn is asm. */ static bool @@ -15123,8 +15088,6 @@ synthesize_and (rtx operands[3]) #define TARGET_MODE_EMIT riscv_emit_mode_set #undef TARGET_MODE_NEEDED #define TARGET_MODE_NEEDED riscv_mode_needed -#undef TARGET_MODE_CONFLUENCE -#define TARGET_MODE_CONFLUENCE riscv_mode_confluence #undef TARGET_MODE_AFTER #define TARGET_MODE_AFTER riscv_mode_after #undef TARGET_MODE_ENTRY
[gcc r16-1396] cobol: Variety of small changes in answer to cppcheck diagnostics.
https://gcc.gnu.org/g:70c3dd9a81cdefcaf24a66ec0c1ceddf5d3984dd commit r16-1396-g70c3dd9a81cdefcaf24a66ec0c1ceddf5d3984dd Author: James K. Lowden Date: Tue Jun 10 10:34:28 2025 -0400 cobol: Variety of small changes in answer to cppcheck diagnostics. Remove non-ASCII input and blank lines from gcobol.1. Restrict cobol.clean target to compiler object files. gcc/cobol/ChangeLog: * Make-lang.in: cobol.clean does not remove libgcobol files. * cdf.y: Suppress 1 cppcheck false positive. * cdfval.h (scanner_parsing): Partial via cppcheck for PR119324. * gcobol.1: Fix groff errors. * gcobolspec.cc (append_arg): Const parameter. * parse_ante.h (intrinsic_call_2): Avoid NULL dereference. Diff: --- gcc/cobol/Make-lang.in | 3 +-- gcc/cobol/cdf.y | 1 + gcc/cobol/cdfval.h | 8 gcc/cobol/gcobol.1 | 12 +--- gcc/cobol/gcobolspec.cc | 2 +- gcc/cobol/parse_ante.h | 2 +- 6 files changed, 17 insertions(+), 11 deletions(-) diff --git a/gcc/cobol/Make-lang.in b/gcc/cobol/Make-lang.in index 993e4c6ffb02..5f293e1f9874 100644 --- a/gcc/cobol/Make-lang.in +++ b/gcc/cobol/Make-lang.in @@ -351,8 +351,7 @@ cobol.srcman: cobol.mostlyclean: cobol.clean: - rm -fr gcobol cobol1 cobol/*\ - ../*/libgcobol/* + rm -fr gcobol cobol1 cobol/* cobol.distclean: diff --git a/gcc/cobol/cdf.y b/gcc/cobol/cdf.y index 0440d0216af9..e4d2feaaf52e 100644 --- a/gcc/cobol/cdf.y +++ b/gcc/cobol/cdf.y @@ -891,6 +891,7 @@ verify_integer( const YDFLTYPE& loc, const cdfval_base_t& val ) { return true; } +// cppcheck-suppress returnTempReference const cdfval_base_t& cdfval_base_t::operator()( const YDFLTYPE& loc ) { static cdfval_t zero(0); diff --git a/gcc/cobol/cdfval.h b/gcc/cobol/cdfval.h index 09c21ab08e50..c4387b080661 100644 --- a/gcc/cobol/cdfval.h +++ b/gcc/cobol/cdfval.h @@ -38,6 +38,14 @@ bool scanner_parsing(); +/* cdfval_base_t has no constructor because otherwise: + * cobol/cdf.h:172:7: note: ‘YDFSTYPE::YDFSTYPE()’ is implicitly deleted + * because the default definition would be ill-formed: + * 172 | union YDFSTYPE + * + * We use the derived type cdfval_t, which can be properly constructed and + * operated on, but tell Bison only about its POD base class. + */ struct YDFLTYPE; struct cdfval_base_t { bool off; diff --git a/gcc/cobol/gcobol.1 b/gcc/cobol/gcobol.1 index 0ce890e97229..6db54009fcf7 100644 --- a/gcc/cobol/gcobol.1 +++ b/gcc/cobol/gcobol.1 @@ -39,7 +39,7 @@ compiles \*[lang] source code to object code, and optionally produces an executable binary or shared object. As a GCC component, it accepts all options that affect code-generation and linking. Options specific to \*[lang] are listed below. -.Bl -tag -width \0\0debug +.Bl -tag -width "\0\0debug" .It Fl main Ar filename .Nm will generate a @@ -197,14 +197,12 @@ Otherwise, columns 1-6 are examined. If those characters are all digits or blanks, the file is assumed to be in .Em "fixed-form reference format", also with the indicator in column 7. - If not auto-detected as .Em "fixed-form reference format" or .Em "extended source format", the file is assumed to be in .Em "free-form reference format". - .Pp . .It Fl fcobol-exceptions Ar exception Op Ns , Ns Ar exception Ns ... @@ -1088,7 +1086,7 @@ the directive must appear before .Pp To test a feature-set variable, use .Dl >>IF Ar feature Li DEFINED -.. +. .Ss Copybooks .Nm supports the CDF @@ -1294,7 +1292,7 @@ stores and converts numbers. Converting the floating-point value to the numeric display value 0055110 is done by multiplying 55.10...\& by 1,000 and then truncating the result to an integer. And it turns out that even -though 55.11 can’t be represented in floating-point as an exact value, +though 55.11 can't be represented in floating-point as an exact value, the product of the multiplication, 55110, is an exact value. .Pp In cases where it is important for conversions to have predictable @@ -1325,7 +1323,7 @@ specified for a calculation, then the intermediate result becomes a . .Ss A warning about binary floating point comparison The cardinal rule when doing comparisons involving floating-point -values: Never, ever, test for equality. It’s just not worth the hassle. +values: Never, ever, test for equality. It's just not worth the hassle. .Pp For example: .Bd -literal @@ -1361,7 +1359,7 @@ and you really test the code. And then avoid it anyway. .Pp Finally, it is observably the case that the .Nm -implementations of floating-point conversions and comparisons don’t +implementations of floating-point conversions and comparisons don't precisely match the behavior of other \*[lang] compilers. .Pp You have been warned. diff --git a/gcc/cobol/gcobolspec.cc b/gcc/cobol/gcobolspec.cc index d1ffc97f8ca5..70784d7e3570 100644 --- a/gcc/cobol/gcobolspec.cc +++ b/gcc/cobol/
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] [RISC-V] Handle 32bit operands in condition for conditional moves
https://gcc.gnu.org/g:e47d4027be18683eb090b3601e50c854a6c60c5b commit e47d4027be18683eb090b3601e50c854a6c60c5b Author: Shreya Munnangi Date: Sun Jun 8 08:42:53 2025 -0600 [RISC-V] Handle 32bit operands in condition for conditional moves So here's the next chunk of conditional move work from Shreya. It's been a long standing wart that the conditional move expander does not support sub-word operands in the comparison. Particularly since we have support routines to handle the necessary extensions for that case. This patch adjusts the expander to use riscv_extend_comparands rather than fail for that case. I've built spec2017 before/after this and we definitely get more conditional moves and they look sensible from a performance standpoint. None are likely hitting terribly hot code, so I wouldn't expect any performance jumps. Waiting on pre-commit testing to do its thing. gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Use riscv_extend_comparands to extend sub-word comparison arguments. Co-authored-by: Jeff Law (cherry picked from commit 2523c15430d980c380684c3df49f9ae016b8647d) Diff: --- gcc/config/riscv/riscv.cc | 141 ++ 1 file changed, 79 insertions(+), 62 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index dd29059412b1..aa8cd97b3102 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -5389,11 +5389,18 @@ riscv_expand_conditional_branch (rtx label, rtx_code code, rtx op0, rtx op1) bool riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) { - machine_mode mode = GET_MODE (dest); + machine_mode dst_mode = GET_MODE (dest); + machine_mode cond_mode = GET_MODE (dest); rtx_code code = GET_CODE (op); rtx op0 = XEXP (op, 0); rtx op1 = XEXP (op, 1); + /* General note. This is called from the conditional move + expander. That simplifies the cases we need to worry about + as we know the destination will have the same mode as the + true/false arms. Furthermore we know that mode will be + DI/SI for rv64 or SI for rv32. */ + /* For some tests, we can easily construct a 0, -1 value which can then be used to synthesize more efficient sequences that don't use zicond. */ @@ -5416,12 +5423,12 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) not a constant, then avoid zicond as more efficient sequences using the splatted sign bit are often possible. */ if (CONST_INT_P (alt) - && alt != CONST0_RTX (mode) + && alt != CONST0_RTX (dst_mode) && !CONST_INT_P (cons)) return false; if (CONST_INT_P (cons) - && cons != CONST0_RTX (mode) + && cons != CONST0_RTX (dst_mode) && !CONST_INT_P (alt)) return false; @@ -5429,8 +5436,9 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) } if (((TARGET_ZICOND_LIKE - || (arith_operand (cons, mode) && arith_operand (alt, mode))) - && (GET_MODE_CLASS (mode) == MODE_INT)) + || (arith_operand (cons, dst_mode) && arith_operand (alt, dst_mode))) + && GET_MODE_CLASS (dst_mode) == MODE_INT + && GET_MODE_CLASS (cond_mode) == MODE_INT) || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV) { machine_mode mode0 = GET_MODE (op0); @@ -5449,13 +5457,13 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) if (!REG_P (op1) && !CONST_INT_P (op1)) op1 = force_reg (word_mode, op1); - /* In the fallback generic case use MODE rather than WORD_MODE for -the output of the SCC instruction, to match the mode of the NEG + /* In the fallback generic case use DST_MODE rather than WORD_MODE +for the output of the SCC instruction, to match the mode of the NEG operation below. The output of SCC is 0 or 1 boolean, so it is valid for input in any scalar integer mode. */ rtx tmp = gen_reg_rtx ((TARGET_ZICOND_LIKE || TARGET_SFB_ALU || TARGET_XTHEADCONDMOV) -? word_mode : mode); +? word_mode : dst_mode); bool invert = false; /* Canonicalize the comparison. It must be an equality comparison @@ -5484,7 +5492,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) else return false; - op = gen_rtx_fmt_ee (invert ? EQ : NE, mode, tmp, const0_rtx); + op = gen_rtx_fmt_ee (invert ? EQ : NE, cond_mode, tmp, const0_rtx); /* We've generated a new comparison. Update the local variables. */ code = GET_CODE (op); @@ -5503,10 +5511,10 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx cons, rtx alt) arm of the conditional move. That allows us to sup
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Reconcile the existing test for vdivu.vx combine
https://gcc.gnu.org/g:f633e6fe9a173e001890629d659e2ecd68be2fea commit f633e6fe9a173e001890629d659e2ecd68be2fea Author: Pan Li Date: Fri Jun 6 10:03:50 2025 +0800 RISC-V: Reconcile the existing test for vdivu.vx combine Some existing vdiv related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Adjust the asm check for vdivu. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit 08a0b6dabd76c8ca4366a59c2fdcd1ef8f8b1cb9) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c index 4685ed22a784..a8be5edcc70c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c @@ -5,8 +5,8 @@ /* { dg-final { scan-assembler-times {\tvdiv\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvdiv\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vv} 5 } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vx} 3 } } */ +/* { dg-final { scan-assembler-times {\tvdivu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvdivu\.vx} } } */ /* { dg-final { scan-assembler-times {\tvfdiv\.vv} 6 } } */ /* { dg-final { scan-assembler-not {\tvfdiv\.vf} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c index 59c48d2d9bae..7feee0ec154a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c @@ -5,8 +5,8 @@ /* { dg-final { scan-assembler-times {\tvdiv\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvdiv\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vv} 5 } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vx} 3 } } */ +/* { dg-final { scan-assembler-times {\tvdivu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvdivu\.vx} } } */ /* Division by constant is done by calculating a reciprocal and then multiplying. Hence we do not expect 6 vfdivs. */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c index b574dc42182c..766b17fc37da 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c @@ -5,8 +5,8 @@ /* { dg-final { scan-assembler-times {\tvdiv\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvdiv\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vv} 4 } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vx} 4 } } */ +/* { dg-final { scan-assembler-times {\tvdivu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvdivu\.vx} } } */ /* { dg-final { scan-assembler-times {\tvfdiv\.vv} 6 } } */ /* { dg-final { scan-assembler-not {\tvfdiv\.vf} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c index 9b46c6be0efb..c59c66439f89 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c @@ -5,8 +5,8 @@ /* { dg-final { scan-assembler-times {\tvdiv\.vv} 8 } } */ /* { dg-final { scan-assembler-not {\tvdiv\.vx} } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vv} 4 } } */ -/* { dg-final { scan-assembler-times {\tvdivu\.vx} 4 } } */ +/* { dg-final { scan-assembler-times {\tvdivu\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvdivu\.vx} } } */ /* Division by constant is done by calculating a reciprocal and then multiplying. Hence we do not expect 6 vfdivs. */
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vdivu.vv combine case 0 with GR2VR cost 0, 2 and 15
https://gcc.gnu.org/g:f34bc1ecce268061e8ad8ef7248203a50ecd0036 commit f34bc1ecce268061e8ad8ef7248203a50ecd0036 Author: Pan Li Date: Fri Jun 6 09:49:56 2025 +0800 RISC-V: Add test for vec_duplicate + vdivu.vv combine case 0 with GR2VR cost 0, 2 and 15 Add asm dump check test for vec_duplicate + vdivu.vv combine to vdivu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vdivu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u8.c: New test. Signed-off-by: Pan Li (cherry picked from commit 2ca7622fd7b32fd538edea8fd8bd8b97ba07ef16) Diff: --- .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c | 2 + .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 + .../riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u16.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u32.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u64.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vdiv-run-1-u8.c | 15 ++ 17 files changed, 280 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c index 7e107d30191d..92fbf227d563 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c @@ -11,6 +11,7 @@ DEF_VX_BINARY_REVERSE_CASE_0_WRAP(T, -, rsub); DEF_VX_BINARY_CASE_0_WRAP(T, &, and) DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) +DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -18,3 +19,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) /* { dg-final { scan-assembler-times {vand.vx} 1 } } */ /* { dg-final { scan-assembler-times {vor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vdivu.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c index f8ffab78067a..f487b42820ee 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c @@ -11,6 +11,7 @@ DEF_VX_BINARY_REVERSE_CASE_0_WRAP(T, -, rsub); DEF_VX_BINARY_CASE_0_WRAP(T, &, and) DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) +DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -18,3 +19,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) /* { dg-final { scan-assembler-times {vand.vx} 1 } } */ /* { dg-final { scan-assembler-times {vor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vdivu.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c index 31d294567e83..761d25c0833a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c +++ b/gcc/
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Support -mcpu for XiangShan Kunminghu cpu.
https://gcc.gnu.org/g:523071dcf5e5abecc50f96b24ad3d35dadbbe7cd commit 523071dcf5e5abecc50f96b24ad3d35dadbbe7cd Author: Jiawei Date: Wed Jun 4 17:56:49 2025 +0800 RISC-V: Support -mcpu for XiangShan Kunminghu cpu. This patch adds support for the XiangShan Kunminghu CPU in GCC, allowing the use of the `-mcpu=xiangshan-kunminghu` option. XiangShan-KunMingHu is the third-generation open-source high-performance RISC-V processor.[1] You can find the corresponding ISA extension from the XiangShan Github repository.[2] The latest news of KunMingHu can be found in the XiangShan Biweekly.[3] [1] https://github.com/OpenXiangShan/XiangShan-User-Guide/releases. [2] https://github.com/OpenXiangShan/XiangShan/blob/master/src/main/scala/xiangshan/Parameters.scala [3] https://docs.xiangshan.cc/zh-cn/latest/blog A dedicated scheduling model for KunMingHu's hybrid pipeline will be proposed in a subsequent PR. gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): New cpu tune. (RISCV_CORE): New cpu. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/mcpu-xiangshan-kunminghu.c: New test. Co-Authored-By: Jiawei Chen Co-Authored-By: Yangyu Chen Co-Authored-By: Tang Haojin (cherry picked from commit f0cd40f71ba424bde94dcddbf1df67bb100b82ef) Diff: --- gcc/config/riscv/riscv-cores.def | 14 gcc/doc/invoke.texi| 4 +- .../gcc.target/riscv/mcpu-xiangshan-kunminghu.c| 95 ++ 3 files changed, 111 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def index 118fef23cad4..cff7c77a0bd7 100644 --- a/gcc/config/riscv/riscv-cores.def +++ b/gcc/config/riscv/riscv-cores.def @@ -48,6 +48,7 @@ RISCV_TUNE("xt-c910v2", generic, generic_ooo_tune_info) RISCV_TUNE("xt-c920", generic, generic_ooo_tune_info) RISCV_TUNE("xt-c920v2", generic, generic_ooo_tune_info) RISCV_TUNE("xiangshan-nanhu", xiangshan, xiangshan_nanhu_tune_info) +RISCV_TUNE("xiangshan-kunminghu", xiangshan, generic_ooo_tune_info) RISCV_TUNE("generic-ooo", generic_ooo, generic_ooo_tune_info) RISCV_TUNE("size", generic, optimize_size_tune_info) RISCV_TUNE("mips-p8700", mips_p8700, mips_p8700_tune_info) @@ -154,6 +155,19 @@ RISCV_CORE("xiangshan-nanhu", "rv64imafdc_zba_zbb_zbc_zbs_" "svinval_zicbom_zicboz", "xiangshan-nanhu") +RISCV_CORE("xiangshan-kunminghu", "rv64imafdcbvh_sdtrig_sha_shcounterenw_" + "shgatpa_shlcofideleg_shtvala_shvsatpa_shvstvala_shvstvecd_" + "smaia_smcsrind_smdbltrp_smmpm_smnpm_smrnmi_smstateen_" + "ssaia_ssccptr_sscofpmf_sscounterenw_sscsrind_ssdbltrp_" + "ssnpm_sspm_ssstateen_ssstrict_sstc_sstvala_sstvecd_" + "ssu64xl_supm_svade_svbare_svinval_svnapot_svpbmt_za64rs_" + "zacas_zawrs_zba_zbb_zbc_zbkb_zbkc_zbkx_zbs_zcb_zcmop_" + "zfa_zfh_zfhmin_zic64b_zicbom_zicbop_zicboz_ziccif_" + "zicclsm_ziccrse_zicntr_zicond_zicsr_zifencei_zihintpause_" + "zihpm_zimop_zkn_zknd_zkne_zknh_zksed_zksh_zkt_zvbb_zvfh_" + "zvfhmin_zvkt_zvl128b_zvl32b_zvl64b", + "xiangshan-kunminghu") + RISCV_CORE("mips-p8700", "rv64imafd_zicsr_zmmul_" "zaamo_zalrsc_zba_zbb", "mips-p8700") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index cfcb4b3cf978..ab49ba00c0a7 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -31123,8 +31123,8 @@ Permissible values for this option are: @samp{mips-p8700}, @samp{sifive-e20}, @samp{sifive-e76}, @samp{sifive-s21}, @samp{sifive-s51}, @samp{sifive-s54}, @samp{sifive-s76}, @samp{sifive-u54}, @samp{sifive-u74}, @samp{sifive-x280}, @samp{sifive-xp450}, @samp{sifive-x670}, @samp{thead-c906}, @samp{tt-ascalon-d8}, -@samp{xiangshan-nanhu}, @samp{xt-c908}, @samp{xt-c908v}, @samp{xt-c910}, @samp{xt-c910v2}, -@samp{xt-c920}, @samp{xt-c920v2}. +@samp{xiangshan-nanhu}, @samp{xiangshan-kunminghu}, @samp{xt-c908}, @samp{xt-c908v}, +@samp{xt-c910}, @samp{xt-c910v2}, @samp{xt-c920}, @samp{xt-c920v2}. Note that @option{-mcpu} does not override @option{-march} or @option{-mtune}. diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-xiangshan-kunminghu.c b/gcc/testsuite/gcc.target/riscv/mcpu-xiangshan-kunminghu.c new file mode 100644 index ..e3ae65c46444 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/mcpu-xiangshan-kunminghu.c @@ -0,0 +1,95 @@ +/* { dg-do compile { target { rv64 } } } */ +/* { dg-skip-if "-march given" { *-*-* } { "-march=*" } } */ +/* { dg-
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Combine vec_duplicate + vidvu.vv to vdivu.vx on GR2VR cost
https://gcc.gnu.org/g:7e0e5701d71b649ed911a5c236af8f8fd714bbc4 commit 7e0e5701d71b649ed911a5c236af8f8fd714bbc4 Author: Pan Li Date: Fri Jun 6 09:33:21 2025 +0800 RISC-V: Combine vec_duplicate + vidvu.vv to vdivu.vx on GR2VR cost This patch would like to combine the vec_duplicate + vdivu.vv to the vdivu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP)\ void\ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x;\ } DEF_VX_BINARY(int32_t, /) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ sllia3,a3,32 15 │ srlia3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ sllia4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vdivu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ sllia3,a3,32 13 │ srlia3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ sllia4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vdivu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new case UDIV. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op divu. Signed-off-by: Pan Li (cherry picked from commit be205ec675ed79275e694dda90f0f97fc6ac0e7a) Diff: --- gcc/config/riscv/riscv-v.cc | 1 + gcc/config/riscv/riscv.cc| 1 + gcc/config/riscv/vector-iterators.md | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index a41317f322f7..6a7eb7161b37 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -5568,6 +5568,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx op_2, case XOR: case MULT: case DIV: +case UDIV: icode = code_for_pred_scalar (code, mode); break; default: diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 413eae05f4c9..99eeba64b6f9 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3943,6 +3943,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN switch (GET_CODE (op)) { case DIV: + case UDIV: *total = get_vector_binary_rtx_cost (op, scalar2vr_cost); break; default: diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 86f31f3afabb..36301b0be6e7 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -4042,7 +4042,7 @@ ]) (define_code_iterator any_int_binop_no_shift_v_vdup [ - plus minus and ior xor mult div + plus minus and ior xor mult div udiv ]) (define_code_iterator any_int_binop_no_shift_vdup_v [
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Regen riscv-ext.texi [NFC]
https://gcc.gnu.org/g:e3ada8eb1b917128350fc45c00092b1687506e41 commit e3ada8eb1b917128350fc45c00092b1687506e41 Author: Kito Cheng Date: Tue Jun 10 10:32:37 2025 +0800 RISC-V: Regen riscv-ext.texi [NFC] Regenerates the `riscv-ext.texi` file in the GCC documentation. gcc/ChangeLog: * doc/riscv-ext.texi: Regen. (cherry picked from commit fce42e15063e76d2535586a25d560e2340ca8ac9) Diff: --- gcc/doc/riscv-ext.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/riscv-ext.texi b/gcc/doc/riscv-ext.texi index e69a2df768d4..c3ed1bfb5936 100644 --- a/gcc/doc/riscv-ext.texi +++ b/gcc/doc/riscv-ext.texi @@ -520,7 +520,7 @@ @item smrnmi @tab 1.0 -@tab Resumable Non-Maskable Interrupts +@tab Resumable non-maskable interrupts @item smstateen @tab 1.0
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: frm/mode-switch: robustify call_insn backtracking [PR120203]
https://gcc.gnu.org/g:3458870cad987cf3e3bf5add23399f1ca285 commit 3458870cad987cf3e3bf5add23399f1ca285 Author: Vineet Gupta Date: Sun Jun 8 14:55:11 2025 -0700 RISC-V: frm/mode-switch: robustify call_insn backtracking [PR120203] As described in prior patches of this series, FRM mode switching state machine has special handling around calls. After a call_insn, if in DYN_CALL state, it needs to transition back to DYN, which requires back checking if prev insn was indeed a call. Defering/delaying this could lead to unncessary final transitions leading to extraenous FRM save/restores. However the current back checking of call_insn was too coarse-grained. It used prev_nonnote_nondebug_insn_bb () which implies current insn to be in the same BB as the call_insn, which need not always be true. The problem is not with the API, but the use thereof. Fix this by tracking call_insn more explicitly in TARGET_MODE_NEEDED. - On seeing a call_insn, record a "call note". - On subsequent insns if a "call note" is seen, do the needed state switch and clear the note. - Remove the old BB based search. The number of FRM read/writes across SPEC2017 -Ofast -mrv64gcv improves. Before After ---- frrm fsrmi fsrm frrm fsrmi frrm perlbench_r 1701 1701 cpugcc_r 1100 1100 bwaves_r 1601 1601 mcf_r 1100 1100 cactusBSSN_r 1901 1901 namd_r 1401 1401 parest_r 2401 2401 povray_r 2616 2616 lbm_r 600 600 omnetpp_r 1701 1701 wrf_r1268 13 1603 613 13 82 cpuxalan_r 1701 1701 ldecod_r 1100 1100 x264_r 1100 1100 blender_r 61 12 42 39 12 16 cam4_r 45 13 20 40 13 17 deepsjeng_r 1100 1100 imagick_r 132 16 25 33 16 18 leela_r 1200 1200 nab_r 1301 1301 exchange2_r 1601 1601 fotonik3d_r 1901 1901 roms_r 2101 2101 xz_r 600 600 --- 1804 55 17071023 55 150 --- 3566 1228 --- While this was a missed-optimization exercise, testing exposed a latent bug as additional testsuite failure, captured as PR120203. The existing test float-point-dynamic-frm-74.c was missing FRM save after a call which this fixes (as a side-effect of robust call state tracking). |frrma5 |fsrmi 1 | |vfadd.vv v1,v8,v9 |fsrma5 |beq a1,zero,.L2 | |callnormalize_vl_1 |frrma5 | | .L3: |fsrmi 3 |vfadd.vv v8,v8,v9 |fsrma5 |jr ra | | .L2: |callnormalize_vl_2 |frrma5 <-- missing |j .L3 PR target/120203 gcc/ChangeLog: * config/riscv/riscv.cc (CFUN_IN_CALL): New macro. (struct mode_switching_info): Add new field. (riscv_frm_adjust_mode_after_call): Remove. (riscv_frm_mode_needed): Track call_insn. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/float-point-dynamic-frm-74.c: Expect an additional FRRM. Signed-off-by: Vineet Gupta (cherry picked from commit fd042192094c456e275c53dfe92383bec1e9fca3) Diff: --- gcc/config/riscv/riscv.cc | 42 +- .../riscv/rvv/base/float-point-dynamic-frm-74.c| 2 +- 2 files changed, 17 insertions(+), 27 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 1e56ee5dcb63..3fd18c1646dc 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -107,6 +107,8 @@ along with GCC; see the file COPYING3. If not see /* True the mode switching has static frm, or false. */ #define STATIC_FRM_P(c) ((c)->machine->mode_sw_info.static_frm_p)
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Reconcile the existing test for vrem.vx combine
https://gcc.gnu.org/g:b48ccf650b32ff2af612029da693b83c23c490bc commit b48ccf650b32ff2af612029da693b83c23c490bc Author: Pan Li Date: Sun Jun 8 16:50:52 2025 +0800 RISC-V: Reconcile the existing test for vrem.vx combine Some existing vrem related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Adjust the asm check for vrem. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit 4df4acf002cc3672478edb43f374cef3ffbd1f54) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c | 4 ++-- gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c index a87a6c70df1f..ad918a9b800a 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c @@ -2,8 +2,8 @@ #include "vrem-template.h" -/* { dg-final { scan-assembler-times {\tvrem\.vv} 5 } } */ -/* { dg-final { scan-assembler-times {\tvrem\.vx} 3 } } */ +/* { dg-final { scan-assembler-times {\tvrem\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvrem\.vx} } } */ /* { dg-final { scan-assembler-times {\tvremu\.vv} 5 } } */ /* { dg-final { scan-assembler-times {\tvremu\.vx} 3 } } */ /* { dg-final { scan-tree-dump-times "\.COND_LEN_MOD" 16 "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c index 938169574aac..4e28f99e2886 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c @@ -3,8 +3,8 @@ #include "vrem-template.h" -/* { dg-final { scan-assembler-times {\tvrem\.vv} 4 } } */ -/* { dg-final { scan-assembler-times {\tvrem\.vx} 4 } } */ +/* { dg-final { scan-assembler-times {\tvrem\.vv} 8 } } */ +/* { dg-final { scan-assembler-not {\tvrem\.vx} } } */ /* { dg-final { scan-assembler-times {\tvremu\.vv} 4 } } */ /* { dg-final { scan-assembler-times {\tvremu\.vx} 4 } } */ /* { dg-final { scan-tree-dump-times "\.COND_LEN_MOD" 16 "optimized" } } */
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vrem.vv combine case 0 with GR2VR cost 0, 2 and 15
https://gcc.gnu.org/g:a4bcae5ad4962e1c6f6337ae93e75aeddecc922e commit a4bcae5ad4962e1c6f6337ae93e75aeddecc922e Author: Pan Li Date: Sun Jun 8 16:53:05 2025 +0800 RISC-V: Add test for vec_duplicate + vrem.vv combine case 0 with GR2VR cost 0, 2 and 15 Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vrem.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i8.c: New test. Signed-off-by: Pan Li (cherry picked from commit daee1935f4e366c09fc085905cb49bbf264c5663) Diff: --- .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c | 2 + .../gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c | 2 + .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 + .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i16.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i32.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i64.c| 15 ++ .../riscv/rvv/autovec/vx_vf/vx_vrem-run-1-i8.c | 15 ++ 17 files changed, 280 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c index d88e76b5d99c..893d910538ca 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c @@ -13,6 +13,7 @@ DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) DEF_VX_BINARY_CASE_0_WRAP(T, *, mul) DEF_VX_BINARY_CASE_0_WRAP(T, /, div) +DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -22,3 +23,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vmul.vx} 1 } } */ /* { dg-final { scan-assembler-times {vdiv.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vrem.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c index 53189c21d041..26170de40d0c 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c @@ -13,6 +13,7 @@ DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) DEF_VX_BINARY_CASE_0_WRAP(T, *, mul) DEF_VX_BINARY_CASE_0_WRAP(T, /, div) +DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -22,3 +23,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, /, div) /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vmul.vx} 1 } } */ /* { dg-final { scan-assembler-times {vdiv.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vrem.vx} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c index 5059beb4c6de..04d1fcb5f81f 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c +++ b/gcc/testsuite/gcc.target/ris
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vremu.vv combine case 1 with GR2VR cost 0, 1 and 2
https://gcc.gnu.org/g:64b8deea76e40d16bbcb66ed7204f72cfe4d9669 commit 64b8deea76e40d16bbcb66ed7204f72cfe4d9669 Author: Pan Li Date: Mon Jun 9 16:35:47 2025 +0800 RISC-V: Add test for vec_duplicate + vremu.vv combine case 1 with GR2VR cost 0, 1 and 2 Add asm dump check test for vec_duplicate + vremu.vv combine to vremu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vremu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit bcabb6b0c707271b86a59be755f295ab7c125df1) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c | 2 ++ 12 files changed, 24 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c index 58e4a1e96d6c..16ccaea251b8 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c @@ -12,6 +12,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -20,3 +21,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vor.vx} } } */ /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vdivu.vx} } } */ +/* { dg-final { scan-assembler {vremu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c index 3d5f53568dbe..0e2ab8d7838d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c @@ -12,6 +12,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -20,3 +21,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vor.vx} } } */ /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vdivu.vx} } } */ +/* { dg-final { scan-assembler {vremu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c index 0edb9257a7a7..80eb8a4752e3 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c @@ -12,6 +12,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, &, and, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -20,3 +21,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) /* { dg-final { scan-as
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Add test for vec_duplicate + vrem.vv combine case 1 with GR2VR cost 0, 1 and 2
https://gcc.gnu.org/g:725444b6f84ccd8bf5c6864dbf29582ccead900f commit 725444b6f84ccd8bf5c6864dbf29582ccead900f Author: Pan Li Date: Sun Jun 8 16:55:34 2025 +0800 RISC-V: Add test for vec_duplicate + vrem.vv combine case 1 with GR2VR cost 0, 1 and 2 Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check for vrem.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto. Signed-off-by: Pan Li (cherry picked from commit 8d745f6d70172132a594dcc650a6d489e7246eda) Diff: --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c | 2 ++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c | 2 ++ 12 files changed, 24 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c index 1e409dea08b7..b35e4b712f08 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c @@ -13,6 +13,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, *, mul, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +23,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vmul.vx} } } */ /* { dg-final { scan-assembler {vdiv.vx} } } */ +/* { dg-final { scan-assembler {vrem.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c index 2f242c73717e..fb01a6ab92d9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c @@ -13,6 +13,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, *, mul, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +23,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vmul.vx} } } */ /* { dg-final { scan-assembler {vdiv.vx} } } */ +/* { dg-final { scan-assembler {vrem.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c index f027bd8129e4..d9341d6b4d24 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c @@ -13,6 +13,7 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, *, mul, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) +DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +23,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) /* { dg-final { scan-assemble
[gcc(refs/vendors/riscv/heads/gcc-15-with-riscv-opts)] RISC-V: Combine vec_duplicate + vrem.vv to vrem.vx on GR2VR cost
https://gcc.gnu.org/g:852bbabde714134af1940772450e059d26bc5b65 commit 852bbabde714134af1940772450e059d26bc5b65 Author: Pan Li Date: Sun Jun 8 16:48:33 2025 +0800 RISC-V: Combine vec_duplicate + vrem.vv to vrem.vx on GR2VR cost This patch would like to combine the vec_duplicate + vrem.vv to the vrem.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP)\ void\ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x;\ } DEF_VX_BINARY(int32_t, /) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ sllia3,a3,32 15 │ srlia3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ sllia4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vrem.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ sllia3,a3,32 13 │ srlia3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ sllia4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vrem.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new case MOD. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op mod. Signed-off-by: Pan Li (cherry picked from commit b96e319dbd19328a2243b2950155be57532c213b) Diff: --- gcc/config/riscv/riscv-v.cc | 1 + gcc/config/riscv/riscv.cc| 1 + gcc/config/riscv/vector-iterators.md | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 6a7eb7161b37..c31ec9e9b419 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -5569,6 +5569,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx op_2, case MULT: case DIV: case UDIV: +case MOD: icode = code_for_pred_scalar (code, mode); break; default: diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 3fd18c1646dc..f98072cca7ce 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3949,6 +3949,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN { case DIV: case UDIV: + case MOD: *total = get_vector_binary_rtx_cost (op, scalar2vr_cost); break; default: diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 36301b0be6e7..b1fd607320ef 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -4042,7 +4042,7 @@ ]) (define_code_iterator any_int_binop_no_shift_v_vdup [ - plus minus and ior xor mult div udiv + plus minus and ior xor mult div udiv mod ]) (define_code_iterator any_int_binop_no_shift_vdup_v [
[gcc r16-1392] [RISC-V] Fix ICE due to splitter emitting constant loads directly
https://gcc.gnu.org/g:b93d8873cda88f0892c7782b274904fa8d3751fb commit r16-1392-gb93d8873cda88f0892c7782b274904fa8d3751fb Author: Jeff Law Date: Tue Jun 10 06:38:52 2025 -0600 [RISC-V] Fix ICE due to splitter emitting constant loads directly This is a fix for a bug found internally in Ventana using the cf3 testsuite. cf3 looks to be dead as a project and likely subsumed by modern fuzzers. In fact internally we tripped another issue with cf3 that had already been reported by Edwin with the fuzzer he runs. Anyway, the splitter in question blindly emits the 2nd adjusted constant into a register, that's not valid if the constant requires any kind of synthesis -- and it well could since we're mostly focused on the first constant turning into something that can be loaded via LUI without increasing the cost of the second constant. Instead of using the split RTL template, this just emits the code we want directly, using riscv_move_insn to synthesize the constant into the provided temporary register. Tested in my system. Waiting on upstream CI's verdict before moving forward. gcc/ * config/riscv/riscv.md (lui-constraintand_to_or): Do not use the RTL template for split code. Emit it directly taking care to avoid emitting a constant load that needed synthesis. Fix formatting. gcc/testsuite/ * gcc.target/riscv/ventana-16122.c: New test. Diff: --- gcc/config/riscv/riscv.md | 18 +- gcc/testsuite/gcc.target/riscv/ventana-16122.c | 19 +++ 2 files changed, 32 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6d3c80a04c74..3aed25c25880 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -884,7 +884,7 @@ ;; Where C1 is not a LUI operand, but ~C1 is a LUI operand (define_insn_and_split "*lui_constraint_and_to_or" - [(set (match_operand:X 0 "register_operand" "=r") + [(set (match_operand:X 0 "register_operand" "=r") (plus:X (and:X (match_operand:X 1 "register_operand" "r") (match_operand 2 "const_int_operand")) (match_operand 3 "const_int_operand"))) @@ -898,13 +898,21 @@ <= riscv_const_insns (operands[3], false)))" "#" "&& reload_completed" - [(set (match_dup 4) (match_dup 5)) - (set (match_dup 0) (ior:X (match_dup 1) (match_dup 4))) - (set (match_dup 4) (match_dup 6)) - (set (match_dup 0) (minus:X (match_dup 0) (match_dup 4)))] + [(const_int 0)] { operands[5] = GEN_INT (~INTVAL (operands[2])); operands[6] = GEN_INT ((~INTVAL (operands[2])) | (-INTVAL (operands[3]))); + +/* This is always a LUI operand, so it's safe to just emit. */ +emit_move_insn (operands[4], operands[5]); + +rtx x = gen_rtx_IOR (word_mode, operands[1], operands[4]); +emit_move_insn (operands[0], x); + +/* This may require multiple steps to synthesize. */ +riscv_emit_move (operands[4], operands[6]); +x = gen_rtx_MINUS (word_mode, operands[0], operands[4]); +emit_move_insn (operands[0], x); } [(set_attr "type" "arith")]) diff --git a/gcc/testsuite/gcc.target/riscv/ventana-16122.c b/gcc/testsuite/gcc.target/riscv/ventana-16122.c new file mode 100644 index ..59e6467b57c0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/ventana-16122.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target { rv64 } } } */ + +extern void NG (void); +typedef signed char int8_t; +typedef signed short int16_t; +typedef signed int int32_t; +void f74(void) { + int16_t x309 = 0x7fff; + volatile int32_t x310 = 0x7fff; + int8_t x311 = 59; + int16_t x312 = -0x8000; + static volatile int32_t t74 = 614992577; + +t74 = (x309==((x310^x311)%x312)); + +if (t74 != 0) { NG(); } else { ; } + +} +
[gcc r12-11132] tree-sra: Do not create stores into const aggregates (PR111873)
https://gcc.gnu.org/g:16d6a270b11a00d30966d42d9bc086e5873b5632 commit r12-11132-g16d6a270b11a00d30966d42d9bc086e5873b5632 Author: Martin Jambor Date: Wed May 14 12:08:24 2025 +0200 tree-sra: Do not create stores into const aggregates (PR111873) This patch fixes (hopefully the) one remaining place where gimple SRA was still creating a load into const aggregates. It occurs when there is a replacement for a load but that replacement is not type compatible - typically because it is a single field structure. I have used testcases from duplicates because the original test-case no longer reproduces for me. gcc/ChangeLog: 2025-05-13 Martin Jambor PR tree-optimization/111873 * tree-sra.cc (sra_modify_expr): When processing a load which has a type-incompatible replacement, do not store the contents of the replacement into the original aggregate when that aggregate is const. gcc/testsuite/ChangeLog: 2025-05-13 Martin Jambor * gcc.dg/ipa/pr120044-1.c: New test. * gcc.dg/ipa/pr120044-2.c: Likewise. * gcc.dg/tree-ssa/pr114864.c: Likewise. (cherry picked from commit 9d039eff453f777c58642ff16178c1ce2a4be6ab) Diff: --- gcc/testsuite/gcc.dg/ipa/pr120044-1.c| 17 + gcc/testsuite/gcc.dg/ipa/pr120044-2.c| 17 + gcc/testsuite/gcc.dg/tree-ssa/pr114864.c | 15 +++ gcc/tree-sra.cc | 4 +++- 4 files changed, 52 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/ipa/pr120044-1.c b/gcc/testsuite/gcc.dg/ipa/pr120044-1.c new file mode 100644 index ..f9fee3e85afb --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr120044-1.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-early-inlining -fno-tree-fre -fno-tree-pre -fno-code-hoisting -fno-inline" } */ + +struct a { + int b; +} const c; +void d(char p, struct a e) { + while (e.b) +; +} +static unsigned short f(const struct a g) { + d(g.b, g); + return g.b; +} +int main() { + return f(c); +} diff --git a/gcc/testsuite/gcc.dg/ipa/pr120044-2.c b/gcc/testsuite/gcc.dg/ipa/pr120044-2.c new file mode 100644 index ..5130791f5444 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr120044-2.c @@ -0,0 +1,17 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-early-inlining -fno-tree-fre -fno-tree-pre -fno-code-hoisting -fno-ipa-cp" } */ + +struct a { + int b; +} const c; +void d(char p, struct a e) { + while (e.b) +; +} +static unsigned short f(const struct a g) { + d(g.b, g); + return g.b; +} +int main() { + return f(c); +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr114864.c b/gcc/testsuite/gcc.dg/tree-ssa/pr114864.c new file mode 100644 index ..cd9b94c094fc --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr114864.c @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options "-O1 -fno-tree-dce -fno-tree-fre" } */ + +struct a { + int b; +} const c; +void d(const struct a f) {} +void e(const struct a f) { + f.b == 0 ? 1 : f.b; + d(f); +} +int main() { + e(c); + return 0; +} diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc index 91af2aef8b4c..4a7836bc257b 100644 --- a/gcc/tree-sra.cc +++ b/gcc/tree-sra.cc @@ -3871,8 +3871,10 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write) } else { - gassign *stmt; + if (TREE_READONLY (access->base)) + return false; + gassign *stmt; if (access->grp_partial_lhs) repl = force_gimple_operand_gsi (gsi, repl, true, NULL_TREE, true, GSI_SAME_STMT);
[gcc r16-1400] expand: Use less costly from sign and zero extensions for values where value range says they don't h
https://gcc.gnu.org/g:2e4688a7202d73baeb4de18ca4591e6b0985f4a4 commit r16-1400-g2e4688a7202d73baeb4de18ca4591e6b0985f4a4 Author: Jakub Jelinek Date: Tue Jun 10 20:06:14 2025 +0200 expand: Use less costly from sign and zero extensions for values where value range says they don't have MSB set [PR120434] On top of the just posted patch, the following patch attempts to use value range to see if MSB is known to be false and for scalar integral extension in that case tries to expand both sign and zero extension and chooses based on RTX costs the cheaper one (if the costs are the same uses what it used before, TYPE_UNSIGNED (TREE_TYPE (treeop0)) based. The patch regresses the gcc.target/i386/pr78103-3.c test, will post a separate patch for that momentarily (with the intent that if all 3 patches are approved, I'll commit the PR78103 related one before this one). 2025-06-10 Jakub Jelinek PR middle-end/120434 * expr.cc (expand_expr_real_2) : If get_range_pos_neg at -O2 for scalar integer extension suggests the most significant bit of op0 is not set, try both unsigned and signed conversion and choose the cheaper one. If both are the same cost, choose one based on TYPE_UNSIGNED (TREE_TYPE (treeop0)). * gcc.target/i386/pr120434-2.c: New test. Diff: --- gcc/expr.cc| 64 +++--- gcc/testsuite/gcc.target/i386/pr120434-2.c | 15 +++ 2 files changed, 74 insertions(+), 5 deletions(-) diff --git a/gcc/expr.cc b/gcc/expr.cc index 08a58fb5564e..ac4fdfaa2181 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -9879,14 +9879,68 @@ expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode, op0 = gen_rtx_fmt_e (TYPE_UNSIGNED (TREE_TYPE (treeop0)) ? ZERO_EXTEND : SIGN_EXTEND, mode, op0); + else if (SCALAR_INT_MODE_P (GET_MODE (op0)) + && optimize >= 2 + && SCALAR_INT_MODE_P (mode) + && (GET_MODE_SIZE (as_a (mode)) + > GET_MODE_SIZE (as_a (GET_MODE (op0 + && get_range_pos_neg (treeop0, +currently_expanding_gimple_stmt) == 1) + { + /* If argument is known to be positive when interpreted +as signed, we can expand it as both sign and zero +extension. Choose the cheaper sequence in that case. */ + bool speed_p = optimize_insn_for_speed_p (); + rtx uns_ret = NULL_RTX, sgn_ret = NULL_RTX; + do_pending_stack_adjust (); + start_sequence (); + if (target == NULL_RTX) + uns_ret = convert_to_mode (mode, op0, 1); + else + convert_move (target, op0, 1); + rtx_insn *uns_insns = end_sequence (); + start_sequence (); + if (target == NULL_RTX) + sgn_ret = convert_to_mode (mode, op0, 0); + else + convert_move (target, op0, 0); + rtx_insn *sgn_insns = end_sequence (); + unsigned uns_cost = seq_cost (uns_insns, speed_p); + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); + bool was_tie = false; + + /* If costs are the same then use as tie breaker the other other +factor. */ + if (uns_cost == sgn_cost) + { + uns_cost = seq_cost (uns_insns, !speed_p); + sgn_cost = seq_cost (sgn_insns, !speed_p); + was_tie = true; + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, ";; positive extension:%s unsigned cost: %u; " + "signed cost: %u\n", +was_tie ? " (needed tie breaker)" : "", +uns_cost, sgn_cost); + if (uns_cost < sgn_cost + || (uns_cost == sgn_cost && TYPE_UNSIGNED (TREE_TYPE (treeop0 + { + emit_insn (uns_insns); + sgn_ret = uns_ret; + } + else + emit_insn (sgn_insns); + if (target == NULL_RTX) + op0 = sgn_ret; + else + op0 = target; + } else if (target == 0) - op0 = convert_to_mode (mode, op0, - TYPE_UNSIGNED (TREE_TYPE - (treeop0))); + op0 = convert_to_mode (mode, op0, TYPE_UNSIGNED (TREE_TYPE (treeop0))); else { - convert_move (target, op0, - TYPE_UNSIGNED (TREE_TYPE (treeop0))); + convert_move (target, op0, TYPE_UNSIGNED (TREE_TYPE (treeop0))); op0 = target; } diff --git a/gcc/testsuite/gcc.target/i386/pr120434-2.c b/gcc/testsuite/gcc.target/i386/pr120434-2.c new file mode 100644 index ..4381e3b3bfdc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr120434-2.c @@ -0,0 +1,15 @@ +/* PR middle-end
[gcc r16-1398] expand, ranger: Use ranger during expansion [PR120434]
https://gcc.gnu.org/g:8154fc95f097a146f9c80edcaafb2baff73065b5 commit r16-1398-g8154fc95f097a146f9c80edcaafb2baff73065b5 Author: Jakub Jelinek Date: Tue Jun 10 20:04:52 2025 +0200 expand, ranger: Use ranger during expansion [PR120434] As the following testcase shows, during expansion we use value range info in lots of places, but sadly currently use only the global ranges. It is mostly through get_range_pos_neg function, which uses get_global_range_query ()->range_of_expr (arg1, arg2) but other spots use it directly. On the testcase at the end of the patch, in foo we don't know range of x, so emit the at least on x86_64 less efficient signed division in that case. In bar, the default def SSA_NAME has global range and we try to expand the division both as signed and unsigned because the range proves they will have the same result and choose the cheaper one. And finally in baz, we have VARYING in global range, but can do better if we ask for range at the statement we're expanding. The main problem of using the ranger during expansion is that things are in flux, the already expanded basic blocks switch their IL from gimple to RTL (bb->flags & BB_RTL) and the gimple stmts are gone, PHI nodes even earlier, etc. The patch attempts to make the ranger usable by keeping (bb->flags & BB_RTL) == 0 on basic blocks for longer, in particular until the last expand_gimple_basic_block call for the function. Instead of changing the IL right away, it uses a vector indexed by bb->index to hold the future BB_HEAD/BB_END. I had to do a few changes on the ranger side and maybe testing in the wild will show a few extra cases, but I think those are tolerable and can be guarded with currently_expanding_to_rtl so that we don't punt on consistency checks on normal GIMPLE. In particular, even with the patch there will still be some BB_RTL bbs in the IL, e.g. the initial block after ENTRY, ENTRY and EXIT blocks and from time to time others as well, but those should never contain anything intreresting from the ranger POV. And switch expansion can drop the default edge if it is __builtin_unreachable. Also, had to change the internal call TER expansion, the current way of temporarily changing gimple_call_lhs ICEd badly in the ranger, so I'm instead temporarily changing SSA_NAME_VAR of the SSA_NAME. 2025-06-10 Jakub Jelinek PR middle-end/120434 * cfgrtl.h (update_bb_for_insn_chain): Declare. * cfgrtl.cc (update_bb_for_insn_chain): No longer static. * cfgexpand.h (expand_remove_edge): Declare. * cfgexpand.cc: Include "gimple-range.h". (head_end_for_bb): New variable. (label_rtx_for_bb): Drop ATTRIBUTE_UNUSED from bb argument. Use head_end_for_bb if possible for non-BB_RTL bbs. (expand_remove_edge): New function. (maybe_cleanup_end_of_block): Use it instead of remove_edge. (expand_gimple_cond): Don't clear EDGE_TRUE_VALUE and EDGE_FALSE_VALUE just yet. Use head_end_for_bb elts instead of BB_END and update_bb_for_insn_chain instead of update_bb_for_insn. (expand_gimple_tailcall): Use expand_remove_edge instead of remove_edge. Use head_end_for_bb elts instead of BB_END and update_bb_for_insn_chain instead of update_bb_for_insn. (expand_gimple_basic_block): Don't change bb to BB_RTL here, instead use head_end_for_bb elts instead of BB_HEAD and BB_END. Use update_bb_for_insn_chain instead of update_bb_for_insn. (pass_expand::execute): Enable ranger before expand_gimple_basic_block calls and create head_end_for_bb vector. Disable ranger after those calls, turn still non-BB_RTL blocks into BB_RTL and set their BB_HEAD and BB_END from head_end_for_bb elts, and clear EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags on edges. Release head_end_for_bb vector. * tree-outof-ssa.cc (expand_phi_nodes): Don't clear phi nodes here. * tree.h (get_range_pos_neg): Add gimple * argument defaulted to NULL. * tree.cc (get_range_pos_neg): Add stmt argument. Use get_range_query (cfun) instead of get_global_range_query () and pass stmt as third argument to range_of_expr. * expr.cc (expand_expr_divmod): Pass currently_expanding_gimple_stmt to get_range_pos_neg. (expand_expr_real_1) : Change internal fn handling to avoid temporarily overwriting gimple_call_lhs of ifn, instead temporarily overwrite SSA_NAME_VAR of its lhs. (maybe_optimize_pow2p_mod_cmp): Pass currently_expanding_gimple_stmt to get_range_pos_neg. (maybe_optimize_mod_cmp): Li
[gcc r16-1399] i386: Handle ZERO_EXTEND like SIGN_EXTEND in bsr patterns [PR120434]
https://gcc.gnu.org/g:54da199f28da07166a44eae7d53acb9e3abe1306 commit r16-1399-g54da199f28da07166a44eae7d53acb9e3abe1306 Author: Jakub Jelinek Date: Tue Jun 10 20:07:06 2025 +0200 i386: Handle ZERO_EXTEND like SIGN_EXTEND in bsr patterns [PR120434] The just posted second PR120434 patch causes +FAIL: gcc.target/i386/pr78103-3.c scan-assembler m(leaq|addq|incq)M +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not mmovlM+ +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not msubqM +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not mxor[lq]M While the patch generally improves code generation by often using ZERO_EXTEND instead of SIGN_EXTEND, where the former is often for free on x86_64 while the latter requires an extra instruction or larger instruction than one with just zero extend, the PR78103 combine patterns and splitters were written only with SIGN_EXTEND in mind. As CLZ is UB on 0 and otherwise returns just [0,63] and is xored with 63, ZERO_EXTEND does the same thing there as SIGN_EXTEND. 2025-06-10 Jakub Jelinek PR middle-end/120434 * config/i386/i386.md (*bsr_rex64_2): Rename to ... (*bsr_rex64_2): ... this. Use any_extend instead of sign_extend. (*bsr_2): Rename to ... (*bsr_2): ... this. Use any_extend instead of sign_extend. (bsr splitters after those): Use any_extend instead of sign_extend. Diff: --- gcc/config/i386/i386.md | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 8eee44756eba..99f382497148 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -21512,11 +21512,12 @@ (set_attr "mode" "SI")]) ; As bsr is undefined behavior on zero and for other input -; values it is in range 0 to 63, we can optimize away sign-extends. -(define_insn_and_split "*bsr_rex64_2" +; values it is in range 0 to 63, we can optimize away sign-extends +; or zero-extends. +(define_insn_and_split "*bsr_rex64_2" [(set (match_operand:DI 0 "register_operand") (xor:DI - (sign_extend:DI + (any_extend:DI (minus:SI (const_int 63) (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand")) @@ -21538,9 +21539,9 @@ operands[3] = lowpart_subreg (SImode, operands[2], DImode); }) -(define_insn_and_split "*bsr_2" +(define_insn_and_split "*bsr_2" [(set (match_operand:DI 0 "register_operand") - (sign_extend:DI + (any_extend:DI (xor:SI (minus:SI (const_int 31) @@ -21617,7 +21618,7 @@ (minus:DI (match_operand:DI 2 "const_int_operand") (xor:DI - (sign_extend:DI + (any_extend:DI (minus:SI (const_int 63) (subreg:SI (clz:DI (match_operand:DI 1 "nonimmediate_operand")) @@ -21647,7 +21648,7 @@ [(set (match_operand:DI 0 "register_operand") (minus:DI (match_operand:DI 2 "const_int_operand") - (sign_extend:DI + (any_extend:DI (xor:SI (minus:SI (const_int 31) (clz:SI (match_operand:SI 1 "nonimmediate_operand")))
[gcc r16-1393] gcn: Add experimental MI300 (gfx942) support
https://gcc.gnu.org/g:37b454b7e171bd8a792cbe4c57ea0f9702afa22d commit r16-1393-g37b454b7e171bd8a792cbe4c57ea0f9702afa22d Author: Tobias Burnus Date: Tue Jun 10 15:12:47 2025 +0200 gcn: Add experimental MI300 (gfx942) support As gfx942 and gfx950 belong to gfx9-4-generic, the latter two are also added. Note that there are no specific optimizations for MI300, yet. For none of the mentioned devices, any multilib is build by default; use '--with-multilib-list=' when configuring GCC to build them alongside. gfx942 was added in LLVM (and its mc assembler, used by GCC) in version 18, generic support in LLVM 19 and gfx950 in LLVM 20. gcc/ChangeLog: * config/gcn/gcn-devices.def: Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-opts.h (TARGET_CDNA3, TARGET_CDNA3_PLUS, TARGET_GLC_NAME, TARGET_TARGET_SC_CACHE): Define. (TARGET_ARCHITECTED_FLAT_SCRATCH): Use also for CDNA3. * config/gcn/gcn.h (gcn_isa): Add ISA_CDNA3 to the enum. * config/gcn/gcn.cc (print_operand): Update 'g' to use TARGET_GLC_NAME; add 'G' to print TARGET_GLC_NAME unconditionally. * config/gcn/gcn-valu.md (scatter, gather): Use TARGET_GLC_NAME. * config/gcn/gcn.md: Use %G instead of glc; use 'buffer_inv sc1' for TARGET_TARGET_SC_CACHE. * doc/invoke.texi (march): Add gfx942, gfx950 and gfx9-4-generic. * doc/install.texi (amdgcn*-*-*): Add gfx942, gfx950 and gfx9-4-generic. * config/gcn/gcn-tables.opt: Regenerate. libgomp/ChangeLog: * testsuite/libgomp.c/declare-variant-4.h (gfx942): New variant function. * testsuite/libgomp.c/declare-variant-4-gfx942.c: New test. Diff: --- gcc/config/gcn/gcn-devices.def | 33 gcc/config/gcn/gcn-opts.h | 13 +- gcc/config/gcn/gcn-tables.opt | 9 ++ gcc/config/gcn/gcn-valu.md | 8 +- gcc/config/gcn/gcn.cc | 8 +- gcc/config/gcn/gcn.h | 2 + gcc/config/gcn/gcn.md | 168 + gcc/doc/install.texi | 17 ++- gcc/doc/invoke.texi| 10 ++ .../testsuite/libgomp.c/declare-variant-4-gfx942.c | 8 + libgomp/testsuite/libgomp.c/declare-variant-4.h| 8 + 11 files changed, 208 insertions(+), 76 deletions(-) diff --git a/gcc/config/gcn/gcn-devices.def b/gcc/config/gcn/gcn-devices.def index af1420382e2f..426acf0cb7a5 100644 --- a/gcc/config/gcn/gcn-devices.def +++ b/gcc/config/gcn/gcn-devices.def @@ -171,6 +171,28 @@ GCN_DEVICE(gfx90c, GFX90C, 0x32, ISA_GCN5, /* Generic Name */ GFX9_GENERIC ) +GCN_DEVICE(gfx942, GFX942, 0x4c, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_ANY, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 512, + /* Generic code obj version */ 0, /* non-generic */ + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + +GCN_DEVICE(gfx950, GFX950, 0x4f, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_ANY, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 512, + /* Generic code obj version */ 0, /* non-generic */ + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + GCN_DEVICE(gfx9-generic, GFX9_GENERIC, 0x051, ISA_GCN5, /* XNACK default */ HSACO_ATTR_ANY, /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED, @@ -182,6 +204,17 @@ GCN_DEVICE(gfx9-generic, GFX9_GENERIC, 0x051, ISA_GCN5, /* Generic Name */ NONE ) +GCN_DEVICE(gfx9-4-generic, GFX9_4_GENERIC, 0x05f, ISA_CDNA3, + /* XNACK default */ HSACO_ATTR_ANY, + /* SRAM_ECC default */ HSACO_ATTR_UNSUPPORTED, + /* WAVE64 mode */ HSACO_ATTR_UNSUPPORTED, + /* CU mode */ HSACO_ATTR_UNSUPPORTED, + /* Max ISA VGPRs */ 256, + /* Generic code obj version */ 1, + /* Architecture Family */ GFX9, + /* Generic Name */ NONE + ) + /* GCN GFX10.3 (RDNA 2) */ GCN_DEVICE(gfx1030, GFX1030, 0x36, ISA_RDNA2, diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 88f562dfc1e1..bcea14f3fe7a 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -33,7 +33,8 @@ extern enum gcn_isa { ISA_RDNA2, ISA_RDNA3, ISA_CDNA1, - ISA_CDNA2 + ISA_CDNA2, + ISA_CDNA3 } gcn_isa; #define TARGET_GCN5 (gcn_isa == ISA_GCN5) @@ -41,6 +42,8 @@ extern enum gcn_isa { #define TARGET_CDNA1_PLUS (gcn_isa >= ISA_CDNA1)
[gcc r16-1389] ada: Remove redundant guard against attribute with no expressions
https://gcc.gnu.org/g:5dc946f78e6d0ba73fd33990b3a353f113ecdd64 commit r16-1389-g5dc946f78e6d0ba73fd33990b3a353f113ecdd64 Author: Piotr Trojanek Date: Wed Mar 26 18:42:10 2025 +0100 ada: Remove redundant guard against attribute with no expressions We intentionally allow First to work on No_List, so there is no need to guard against a No_List. Code cleanup; semantics is unaffected. gcc/ada/ChangeLog: * sem_attr.adb (Resolve_Attribute): Remove redundant guard. Diff: --- gcc/ada/sem_attr.adb | 1 - 1 file changed, 1 deletion(-) diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb index d4034d28da60..4f5047f7b974 100644 --- a/gcc/ada/sem_attr.adb +++ b/gcc/ada/sem_attr.adb @@ -13014,7 +13014,6 @@ package body Sem_Attr is -- their Entity attribute to reference their discriminal. if Expander_Active - and then Present (Expressions (N)) and then Attr_Id /= Attribute_Make then declare
[gcc r16-1376] ada: VAST: create treewalker
https://gcc.gnu.org/g:3f30c88d17742ec2dff131c470b4ee6483d455e7 commit r16-1376-g3f30c88d17742ec2dff131c470b4ee6483d455e7 Author: Bob Duff Date: Mon Mar 24 16:21:53 2025 -0400 ada: VAST: create treewalker Walks all trees (not just the main unit), deals with switches and flags. Doesn't check much of anything yet (asserts that "unused" nodes are not present). Move decisions (what tree(s) to check, what switches enable checking) from the caller to the body of VAST. gcc/ada/ChangeLog: * vast.adb: Initial implementation. * vast.ads: Rename procedure. Remove parameter; body should decide what to do. * lib.ads (ipu): Minor: Rewrite comment for brevity, and because of an inconvenient misspelling. (Num_Units): Not used; remove. (Remove_Unit): Minor: Remove "Currently" (which was current a decade ago from) comment. * lib.adb (Num_Units): Not used; remove. * debug_a.adb (Debug_A_Entry): Fix bug: Use Write_Name_For_Debug, so this won't crash on the Error node. * debug.adb: Document -gnatd_V and -gnatd_W compiler switches. * exp_ch6.adb (Validate_Subprogram_Calls): Remove redundant check for Serious_Errors_Detected. (We turn off code gen when errors are detected.) * frontend.adb: Move decisions into VAST body. * namet.ads (Present): Remove unnecessary overriding; these are inherited by the derived types. * namet.adb (Present): Likewise. Diff: --- gcc/ada/debug.adb| 11 +++-- gcc/ada/debug_a.adb | 7 +-- gcc/ada/exp_ch6.adb | 10 ++--- gcc/ada/frontend.adb | 4 +- gcc/ada/lib.adb | 9 gcc/ada/lib.ads | 13 ++ gcc/ada/namet.adb| 18 gcc/ada/namet.ads| 8 gcc/ada/vast.adb | 123 --- gcc/ada/vast.ads | 7 +-- 10 files changed, 139 insertions(+), 71 deletions(-) diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb index 3a39ec89c40f..f250d7429a96 100644 --- a/gcc/ada/debug.adb +++ b/gcc/ada/debug.adb @@ -186,8 +186,8 @@ package body Debug is -- d_S -- d_T Output trace information on invocation path recording -- d_U Disable prepending messages with "error:". - -- d_V Enable verifications on the expanded tree - -- d_W + -- d_V Enable VAST (verifications on the expanded tree) + -- d_W Enable VAST in verbose mode -- d_X Disable assertions to check matching of extra formals -- d_Y -- d_Z @@ -1065,8 +1065,11 @@ package body Debug is -- d_U Disable prepending 'error:' to error messages. This used to be the -- default and can be seen as the opposite of -gnatU. - -- d_V Enable verification of the expanded code before calling the backend - -- and generate error messages on each inconsistency found. + -- d_V Enable VAST (Verifier for the Ada Semantic Tree). This does + -- verification of the expanded code before calling the backend. + + -- d_W Same as d_V, but also prints lots of tracing/debugging output + -- as it walks the tree. -- d_X Disable assertions to check matching of extra formals; switch added -- temporarily to disable these checks until this work is complete if diff --git a/gcc/ada/debug_a.adb b/gcc/ada/debug_a.adb index d36ae696af64..8d68fc8eff7d 100644 --- a/gcc/ada/debug_a.adb +++ b/gcc/ada/debug_a.adb @@ -83,11 +83,8 @@ package body Debug_A is case Nkind (N) is when N_Has_Chars => - Write_Str (" """); - if Present (Chars (N)) then - Write_Str (Get_Name_String (Chars (N))); - end if; - Write_Str (); + Write_Str (" "); + Write_Name_For_Debug (Chars (N)); when others => null; end case; diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb index 3a45b1c59340..2a246adbb8a3 100644 --- a/gcc/ada/exp_ch6.adb +++ b/gcc/ada/exp_ch6.adb @@ -9938,15 +9938,15 @@ package body Exp_Ch6 is -- Start of processing for Validate_Subprogram_Calls begin - -- No action required if we are not generating code or compiling sources - -- that have errors. + -- No action if we are not generating code (including if we have + -- errors). - if Serious_Errors_Detected > 0 -or else Operating_Mode /= Generate_Code - then + if Operating_Mode /= Generate_Code then return; end if; + pragma Assert (Serious_Errors_Detected = 0); + Check_Calls (N); end Validate_Subprogram_Calls; diff --git a/gcc/ada/frontend.adb b/gcc/ada/frontend.adb index 12cea9c794a3..d5376788ce42 100644 --- a/gcc/ada/frontend.adb +++ b/gcc/ada/frontend.adb @@ -506,9 +506,7 @@ begin -- Verify the validity of the tree - if
[gcc r15-9812] ada: Fix infinite loop with aggregate in generic unit
https://gcc.gnu.org/g:8a4b72a2d99918d6bc315f2664a22457b9848ce7 commit r15-9812-g8a4b72a2d99918d6bc315f2664a22457b9848ce7 Author: Eric Botcazou Date: Thu Mar 20 23:29:33 2025 +0100 ada: Fix infinite loop with aggregate in generic unit Root_Type does not return the same type for the private and the full view of a derived private tagged type when both derive from an interface type. gcc/ada/ChangeLog: * sem_ch12.adb (Copy_Generic_Node): Do not call Root_Type to find the root type of an aggregate of a derived tagged type. Diff: --- gcc/ada/sem_ch12.adb | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb index d93788b779e9..02c7c3696e82 100644 --- a/gcc/ada/sem_ch12.adb +++ b/gcc/ada/sem_ch12.adb @@ -9340,9 +9340,6 @@ package body Sem_Ch12 is and then Nkind (Ancestor_Type (N)) in N_Entity then declare - Root_Typ : constant Entity_Id := - Root_Type (Ancestor_Type (N)); - Typ : Entity_Id := Ancestor_Type (N); begin @@ -9351,7 +9348,7 @@ package body Sem_Ch12 is Switch_View (Typ); end if; - exit when Typ = Root_Typ; + exit when Etype (Typ) = Typ; Typ := Etype (Typ); end loop;