Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

Richard Sandiford Mon, 12 Aug 2024 12:50:16 -0700

Jeff Law <j...@ventanamicro.com> writes:
> So this is another nasty latent bug exposed by ext-dce.
>
> Similar to the prior m68k failure it's another problem with how we 
> handle paradoxical subregs on big endian targets.
>
> In this instance when we remove the hard subregs we take something like:
>
> (subreg:DI (reg:SI 0) 0)
>
> And turn it into
>
> (reg:SI -1)
>
> Which is clearly wrong.  (reg:SI 0) is correct.
>
>
> The transformation happens in alter_subreg, but I really wanted to fix 
> this in subreg_regno since we could have similar problems in some of the 
> other callers of subreg_regno.
>
> Unfortunately reload depends on the current behavior of subreg_regno; in 
> the cases where the return value is an invalid register, the wrong half 
> of a register pair, etc the resulting bogus value is detected by reload 
> and triggers reloading of the inner object.  So that's the new comment 
> in subreg_regno.
>
> The second best place to fix is alter_subreg which is what this patch 
> does.  If presented with a paradoxical subreg, then the base register 
> number should always be REGNO (SUBREG_REG (object)).  It's just how 
> paradoxicals are designed to work.
>
> I haven't tried to fix the other places that call subreg_regno.  After 
> being burned by reload, I'm more than a bit worried about unintended 
> fallout.
>
> I must admit I'm surprised we haven't stumbled over this before and that 
> it didn't fix any failures on the big endian embedded targets.
>
> Boostrapped & regression tested on x86_64, also went through all the 
> embedded targets in my tester and bootstrapped on m68k & s390x to get 
> some additional big endian testing.
>
> Pushing to the trunk.
>
> jeff
>
>
> commit e9738e77674e23f600315ca1efed7d1c7944d0cc
> Author: Jeff Law <j...@ventanamicro.com>
> Date:   Mon Aug 12 07:29:25 2024 -0600
>
>     [rtl-optimization/116244] Don't create bogus regs in alter_subreg
>     
>     So this is another nasty latent bug exposed by ext-dce.
>     
>     Similar to the prior m68k failure it's another problem with how we handle
>     paradoxical subregs on big endian targets.
>     
>     In this instance when we remove the hard subregs we take something like:
>     
>     (subreg:DI (reg:SI 0) 0)
>     
>     And turn it into
>     
>     (reg:SI -1)
>     
>     Which is clearly wrong.  (reg:SI 0) is correct.
>     
>     The transformation happens in alter_subreg, but I really wanted to fix 
> this in
>     subreg_regno since we could have similar problems in some of the other 
> callers
>     of subreg_regno.
>     
>     Unfortunately reload depends on the current behavior of subreg_regno; in 
> the
>     cases where the return value is an invalid register, the wrong half of a
>     register pair, etc the resulting bogus value is detected by reload and 
> triggers
>     reloading of the inner object.  So that's the new comment in subreg_regno.
>     
>     The second best place to fix is alter_subreg which is what this patch 
> does.  If
>     presented with a paradoxical subreg, then the base register number should
>     always be REGNO (SUBREG_REG (object)).  It's just how paradoxicals are 
> designed
>     to work.
>     
>     I haven't tried to fix the other places that call subreg_regno.  After 
> being
>     burned by reload, I'm more than a bit worried about unintended fallout.
>     
>     I must admit I'm surprised we haven't stumbled over this before and that 
> it
>     didn't fix any failures on the big endian embedded targets.
>     
>     Boostrapped & regression tested on x86_64, also went through all the 
> embedded
>     targets in my tester and bootstrapped on m68k & s390x to get some 
> additional
>     big endian testing.
>     
>     Pushing to the trunk.
>     
>             rtl-optimization/116244
>     gcc/
>             * rtlanal.cc (subreg_regno): Update comment.
>             * final.cc (alter_subrg): Always use REGNO (SUBREG_REG ()) to get
>             the base regsiter for paradoxical subregs.
>     
>     gcc/testsuite/
>             * g++.target/m68k/m68k.exp: New test driver.
>             * g++.target/m68k/pr116244.C: New test.
>
> diff --git a/gcc/final.cc b/gcc/final.cc
> index eb9e065d9f0..0167b2f8602 100644
> --- a/gcc/final.cc
> +++ b/gcc/final.cc
> @@ -3123,7 +3123,17 @@ alter_subreg (rtx *xp, bool final_p)
>         unsigned int regno;
>         poly_int64 offset;
>  
> -       regno = subreg_regno (x);
> +       /* A paradoxical should always be REGNO (y) + 0.  Using subreg_regno
> +          for something like (subreg:DI (reg:SI N) 0) on a WORDS_BIG_ENDIAN
> +          target will return N-1 which is catastrophic for N == 0 and just
> +          wrong for other cases.
> +
> +          Fixing subreg_regno would be a better option, except that reload
> +          depends on its current behavior.  */
> +       if (paradoxical_subreg_p (x))
> +         regno = REGNO (y);
> +       else
> +         regno = subreg_regno (x);


Are you sure that's right?  For a 32-bit big-endian target,
(subreg:DI (reg:SI 1) 0) really should simplify to (reg:DI 0) rather
than (reg:DI 1).

Like you say, this leads to an invalid result for (reg:SI 0) in place
of (reg:SI 1).  But AIUI, it's reload's/LRA's job to ensure that that
never happens.  This is part of the reason why LRA/IRA need to track
the widest referenced mode.

So it sounds like something might have gone wrong earlier/elsewhere.

Thanks,
Richard

>         if (subreg_lowpart_p (x))
>           offset = byte_lowpart_offset (GET_MODE (x), GET_MODE (y));
>         else
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index 4158a531bdd..6f6e6544755 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -4313,7 +4313,16 @@ lowpart_subreg_regno (unsigned int regno, machine_mode 
> xmode,
>    return simplify_subreg_regno (regno, xmode, offset, ymode);
>  }
>  
> -/* Return the final regno that a subreg expression refers to.  */
> +/* Return the final regno that a subreg expression refers to.
> +
> +   Callers such as reload_inner_reg_of_subreg rely on this returning
> +   the simplified hard reg, even if that result is not a valid regno for
> +   the given mode.  That triggers reloading the inner part of the
> +   subreg.
> +
> +   That inherently means other uses of this routine probably need
> +   to be audited for their behavior when requested subreg can't
> +   be expressed as a hard register after apply the offset.  */
>  unsigned int
>  subreg_regno (const_rtx x)
>  {
> diff --git a/gcc/testsuite/g++.target/m68k/m68k.exp 
> b/gcc/testsuite/g++.target/m68k/m68k.exp
> new file mode 100644
> index 00000000000..8f6416e9fdf
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/m68k/m68k.exp
> @@ -0,0 +1,34 @@
> +# Copyright (C) 2019-2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# GCC testsuite that uses the `dg.exp' driver.
> +
> +# Exit immediately if this isn't a m68k target.
> +if ![istarget m68k*-*-*] then {
> +  return
> +}
> +
> +# Load support procs.
> +load_lib g++-dg.exp
> +
> +# Initialize `dg'.
> +dg-init
> +
> +# Main loop.
> +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.C]] "" ""
> +
> +# All done.
> +dg-finish
> diff --git a/gcc/testsuite/g++.target/m68k/pr116244.C 
> b/gcc/testsuite/g++.target/m68k/pr116244.C
> new file mode 100644
> index 00000000000..2e78832efbb
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/m68k/pr116244.C
> @@ -0,0 +1,226 @@
> +// { dg-do compile }
> +// { dg-additional-options "-std=gnu++20 -O2 -w -march=isaa" }
> +namespace std {
> +struct __conditional {
> +  template <typename _Tp, typename> using type = _Tp;
> +};
> +template <bool, typename _If, typename _Else>
> +using __conditional_t = __conditional::type<_If, _Else>;
> +template <typename> constexpr bool is_lvalue_reference_v = true;
> +template <typename> bool is_same_v;
> +template <typename _From, typename _To>
> +constexpr bool is_convertible_v = __is_convertible(_From, _To);
> +template <typename> bool is_invocable_v;
> +template <typename...> using common_reference_t = int;
> +template <typename _Iterator> struct iterator_traits : _Iterator {};
> +namespace __detail {
> +template <typename, typename _Up>
> +concept __same_as = is_same_v<_Up>;
> +}
> +template <typename _Tp, typename _Up>
> +concept same_as = __detail::__same_as<_Up, _Tp>;
> +template <typename _Derived, typename _Base>
> +concept derived_from = is_convertible_v<_Derived, _Base>;
> +template <typename, typename>
> +concept common_reference_with =
> +    same_as<common_reference_t<>, common_reference_t<>>;
> +namespace __detail {
> +template <typename _Tp>
> +concept __boolean_testable = requires(_Tp __t) { __t; };
> +template <typename _Tp, typename>
> +concept __weakly_eq_cmp_with = requires(_Tp __t) { __t; };
> +} // namespace __detail
> +template <typename... _Args>
> +concept invocable = is_invocable_v<_Args...>;
> +template <typename>
> +concept regular_invocable = invocable<>;
> +template <typename _Fn>
> +concept predicate = __detail::__boolean_testable<_Fn>;
> +template <typename _Rel, typename, typename>
> +concept relation = predicate<_Rel>;
> +template <typename _Rel, typename _Tp, typename _Up>
> +concept strict_weak_order = relation<_Rel, _Tp, _Up>;
> +namespace {
> +struct duration {
> +  duration() = default;
> +  template <typename _Rep2> duration(_Rep2 __rep) : __r(__rep) {}
> +  long long count() { return __r; }
> +  long long __r;
> +};
> +long long operator+(duration __lhs, duration __rhs) {
> +  long __trans_tmp_1 = __lhs.count();
> +  return __trans_tmp_1 + __rhs.count();
> +}
> +struct time_point {
> +  time_point();
> +  time_point(duration __dur) : __d(__dur) {}
> +  duration time_since_epoch() { return __d; }
> +  duration __d;
> +};
> +time_point operator+(time_point __lhs, duration __rhs) {
> +  duration __trans_tmp_2 = __lhs.time_since_epoch();
> +  return time_point(__trans_tmp_2 + __rhs);
> +}
> +template <typename> using sys_time = time_point;
> +} // namespace
> +template <typename> class allocator;
> +namespace {
> +struct less {};
> +} // namespace
> +struct forward_iterator_tag {};
> +namespace __detail {
> +template <typename _Iter> struct __iter_traits_impl {
> +  using type = iterator_traits<_Iter>;
> +};
> +template <typename _Iter> using __iter_traits = 
> __iter_traits_impl<_Iter>::type;
> +template <typename> struct __iter_concept_impl;
> +template <typename _Iter>
> +  requires requires { typename _Iter; }
> +struct __iter_concept_impl<_Iter> {
> +  using type = __iter_traits<_Iter>::iterator_concept;
> +};
> +template <typename _Iter>
> +using __iter_concept = __iter_concept_impl<_Iter>::type;
> +template <typename _In>
> +concept __indirectly_readable_impl = common_reference_with<_In, _In>;
> +} // namespace __detail
> +template <typename _In>
> +concept indirectly_readable = __detail::__indirectly_readable_impl<_In>;
> +template <typename _Iter>
> +concept input_or_output_iterator = requires(_Iter __i) { __i; };
> +template <typename _Sent, typename _Iter>
> +concept sentinel_for = __detail::__weakly_eq_cmp_with<_Sent, _Iter>;
> +template <typename _Iter>
> +concept forward_iterator =
> +    derived_from<__detail::__iter_concept<_Iter>, forward_iterator_tag>;
> +template <typename _Fn, typename>
> +concept indirectly_regular_unary_invocable = regular_invocable<_Fn>;
> +template <typename _Fn, typename _I1, typename _I2>
> +concept indirect_strict_weak_order = strict_weak_order<_Fn, _I1, _I2>;
> +namespace __detail {
> +template <typename, typename> struct __projected;
> +}
> +template <indirectly_readable _Iter,
> +          indirectly_regular_unary_invocable<_Iter> _Proj>
> +using projected = __detail::__projected<_Iter, _Proj>;
> +namespace ranges::__access {
> +template <typename _Tp> auto __begin(_Tp __t) { return __t.begin(); }
> +} // namespace ranges::__access
> +namespace __detail {
> +template <typename _Tp>
> +using __range_iter_t = decltype(ranges::__access::__begin(_Tp()));
> +}
> +template <typename _Tp> struct iterator_traits<_Tp *> {
> +  typedef forward_iterator_tag iterator_concept;
> +  using reference = _Tp;
> +};
> +} // namespace std
> +template <typename _Iterator, typename> struct __normal_iterator {
> +  using iterator_concept = std::__detail::__iter_concept<_Iterator>;
> +  std::iterator_traits<_Iterator>::reference operator*();
> +  void operator++();
> +};
> +template <typename _IteratorL, typename _IteratorR, typename _Container>
> +bool operator==(__normal_iterator<_IteratorL, _Container>,
> +                __normal_iterator<_IteratorR, _Container>);
> +int _M_get_sys_info_ri;
> +namespace std {
> +template <typename> struct allocator_traits;
> +template <typename _Tp> struct allocator_traits<allocator<_Tp>> {
> +  using pointer = _Tp *;
> +};
> +namespace {
> +namespace __detail {
> +template <typename _Tp>
> +concept __maybe_borrowed_range = is_lvalue_reference_v<_Tp>;
> +}
> +int end;
> +template <typename>
> +concept range = requires { end; };
> +template <typename _Tp>
> +concept borrowed_range = __detail::__maybe_borrowed_range<_Tp>;
> +template <typename _Tp> using iterator_t = 
> std::__detail::__range_iter_t<_Tp>;
> +template <typename _Tp>
> +concept forward_range = forward_iterator<iterator_t<_Tp>>;
> +struct dangling;
> +template <input_or_output_iterator _It, sentinel_for<_It> _Sent = _It>
> +struct subrange {
> +  _It begin();
> +  _Sent end();
> +};
> +template <range _Range>
> +using borrowed_subrange_t =
> +    __conditional_t<borrowed_range<_Range>, subrange<iterator_t<_Range>>,
> +                    dangling>;
> +} // namespace
> +template <typename _Alloc> struct _Vector_base {
> +  typedef allocator_traits<_Alloc>::pointer pointer;
> +};
> +template <typename _Tp, typename _Alloc = allocator<_Tp>> struct vector {
> +  __normal_iterator<typename _Vector_base<_Alloc>::pointer, int> begin();
> +};
> +template <typename _Tp> struct __shared_ptr_access {
> +  _Tp *operator->();
> +};
> +namespace chrono {
> +struct weekday {
> +  char _M_wd;
> +  weekday _S_from_days() {
> +    long __trans_tmp_3;
> +    return 7 > __trans_tmp_3;
> +  }
> +  weekday(unsigned __wd) : _M_wd(__wd == 7 ?: __wd) {}
> +  weekday() : weekday{_S_from_days()} {}
> +  unsigned c_encoding() { return _M_wd; }
> +  friend duration operator-(weekday __x, weekday __y) {
> +    auto __n = __x.c_encoding() - __y.c_encoding();
> +    return int(__n) >= 0 ? __n : duration{};
> +  }
> +};
> +struct year_month_day {
> +  year_month_day _S_from_days(duration __dp) {
> +    auto __r0 = int(__dp.count()), __u2 = 5 * __r0;
> +    year_month_day{__u2};
> +  }
> +  year_month_day();
> +  year_month_day(int);
> +  year_month_day(sys_time<long> __dp)
> +      : year_month_day(_S_from_days(__dp.time_since_epoch())) {}
> +};
> +struct time_zone {
> +  void _M_get_sys_info() const;
> +};
> +} // namespace chrono
> +namespace ranges {
> +struct {
> +  template <
> +      forward_range _Range, typename _Tp, typename _Proj,
> +      indirect_strict_weak_order<_Tp, projected<_Range, _Proj>> _Comp = less>
> +  borrowed_subrange_t<_Range> operator()(_Range, _Tp, _Comp, _Proj);
> +} equal_range;
> +} // namespace ranges
> +} // namespace std
> +namespace std::chrono {
> +struct Rule;
> +unsigned day_of_week;
> +struct _Node {
> +  vector<Rule> rules;
> +};
> +char name;
> +struct Rule {
> +  int from;
> +  void start_time(int, long) {
> +    year_month_day ymd;
> +    sys_time<long> date;
> +    duration diff = day_of_week - weekday{};
> +    ymd = date + diff;
> +  }
> +};
> +__shared_ptr_access<_Node> __trans_tmp_5;
> +void time_zone::_M_get_sys_info() const {
> +  auto node = __trans_tmp_5;
> +  auto rules = ranges::equal_range(node->rules, _M_get_sys_info_ri, {}, 
> name);
> +  for (auto rule : rules)
> +    rule.start_time(rule.from, {});
> +}
> +} // namespace std::chrono

Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg

Reply via email to