Re: RFC: Faster for_each_rtx-like iterators

Trevor Saunders Wed, 07 May 2014 18:21:13 -0700

On Wed, May 07, 2014 at 09:52:49PM +0100, Richard Sandiford wrote:
> I noticed for_each_rtx showing up in profiles and thought I'd have a go
> at using worklist-based iterators instead.  So far I have three:
> 
>   FOR_EACH_SUBRTX: iterates over const_rtx subrtxes of a const_rtx
>   FOR_EACH_SUBRTX_VAR: iterates over rtx subrtxes of an rtx
>   FOR_EACH_SUBRTX_PTR: iterates over subrtx pointers of an rtx *
> 
> with FOR_EACH_SUBRTX_PTR being the direct for_each_rtx replacement.
> 
> I made FOR_EACH_SUBRTX the "default" (unsuffixed) version because
> most walks really don't modify the structure.  I think we should
> encourage const_rtxes to be used whereever possible.  E.g. it might
> make it easier to have non-GC storage for temporary rtxes in future.
> 
> I've locally replaced all for_each_rtx calls in the generic code with
> these iterators and they make things reproducably faster.  The speed-up
> on full --enable-checking=release ./cc1 and ./cc1plus times is only about 1%,
> but maybe that's enough to justify the churn.


seems pretty nice, and it seems like it'll make code a little more
readable too :)

> Implementation-wise, the main observation is that most subrtxes are part
> of a single contiguous sequence of "e" fields.  E.g. when compiling an
> oldish combine.ii on x86_64-linux-gnu with -O2, we iterate over the
> subrtxes of 7,636,542 rtxes.  Of those:
> 
> (A) 4,459,135 (58.4%) are leaf rtxes with no "e" or "E" fields,
> (B) 3,133,875 (41.0%) are rtxes with a single block of "e" fields and
>                       no "E" fields, and
> (C)    43,532 (00.6%) are more complicated.
> 
> (A) is really a special case of (B) in which the block has zero length.
> Those are the only two cases that really need to be handled inline.
> The implementation does this by having a mapping from an rtx code to the
> bounds of its "e" sequence, in the form of a start index and count.
> 
> Out of (C), the vast majority (43,509) are PARALLELs.  However, as you'd
> probably expect, bloating the inline code with that case made things
> slower rather than faster.
> 
> The vast majority (in fact all in the combine.ii run above) of iterations
> can be done with a 16-element stack worklist.  We obviously still need a
> heap fallback for the pathological cases though.
> 
> I spent a bit of time trying different iterator implementations and
> seeing which produced the best code.  Specific results from that were:
> 
> - The storage used for the worklist is separate from the iterator,
>   in order to avoid capturing iterator fields.
> 
> - Although the natural type of the storage would be auto_vec <..., 16>,
>   that produced some overhead compared with a separate stack array and heap
>   vector pointer.  With the heap vector pointer, the only overhead is an
>   assignment in the constructor and an "if (x) release (x)"-style sequence
>   in the destructor.  I think the extra complication over auto_vec is worth
>   it because in this case the heap version is so very rarely needed.

hm, where does the overhead come from exactly? it seems like if  its
 faster to use vec<T, va_heap, vl_embedd> *foo; we should fix something
 about vectors since this isn't the only place it could matter.  does it
 matter if you use vec<T, va_heap, vl_embedd> * or vec<T> ? the second
 is basically just a wrapper around the former I'd expect has no effect.
 I'm not saying you're doing the wrong thing here, but if we can make
 generic vectors faster we probably should ;) or is the issue the
 __builtin_expect()s you can add?

> - Several existing for_each_rtx callbacks have something like:
> 
>     if (GET_CODE (x) == CONST)
>       return -1;
> 
>   or:
> 
>     if (CONSTANT_P (x))
>       return -1;
> 
>   to avoid walking subrtxes of constants.  That can be done without
>   extra code checks and branches by having a separate code->bound
>   mapping in which all constants are treated as leaf rtxes.  This usage
>   should be common enough to outweigh the cache penalty of two arrays.
> 
>   The choice between iterating over constants or not is given in the
>   final parameter of the FOR_EACH_* iterator.

less repitition \O/

> - The maximum number of fields in (B)-type rtxes is 3.  We get better
>   code by making that explicit rather than having a general loop.
> 
> - (C) codes map to an "e" count of UCHAR_MAX, so we can use a single
>   check to test for that and for cases where the stack worklist is
>   too small.

 can we use uint8_t?

> To give an example:
> 
> /* Callback for for_each_rtx, that returns 1 upon encountering a VALUE
>    whose UID is greater than the int uid that D points to.  */
> 
> static int
> refs_newer_value_cb (rtx *x, void *d)
> {
>   if (GET_CODE (*x) == VALUE && CSELIB_VAL_PTR (*x)->uid > *(int *)d)
>     return 1;
> 
>   return 0;
> }
> 
> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>    that of V.  */
> 
> static bool
> refs_newer_value_p (rtx expr, rtx v)
> {
>   int minuid = CSELIB_VAL_PTR (v)->uid;
> 
>   return for_each_rtx (&expr, refs_newer_value_cb, &minuid);
> }
> 
> becomes:
> 
> /* Return TRUE if EXPR refers to a VALUE whose uid is greater than
>    that of V.  */
> 
> static bool
> refs_newer_value_p (const_rtx expr, rtx v)
> {
>   int minuid = CSELIB_VAL_PTR (v)->uid;
>   subrtx_iterator::array_type array;

some reason to not hide it in the macro?

>   FOR_EACH_SUBRTX (iter, array, expr, NONCONST)
>     if (GET_CODE (*iter) == VALUE && CSELIB_VAL_PTR (*iter)->uid > minuid)
>       return true;
>   return false;
> }
> 
> The iterator also allows subrtxes of a specific rtx to be skipped;
> this is the equivalent of returning -1 from a for_each_rtx callback.
> It also allows the current rtx to be replaced in the worklist by
> another.  E.g.:
> 
> static void
> mark_constants_in_pattern (rtx insn)
> {
>   subrtx_iterator::array_type array;
>   FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL)
>     {
>       const_rtx x = *iter;
>       if (GET_CODE (x) == SYMBOL_REF)
>       {
>         if (CONSTANT_POOL_ADDRESS_P (x))
>           {
>             struct constant_descriptor_rtx *desc = SYMBOL_REF_CONSTANT (x);
>             if (desc->mark == 0)
>               {
>                 desc->mark = 1;
>                 iter.substitute (desc->constant);
>               }
>           }
>         else if (TREE_CONSTANT_POOL_ADDRESS_P (x))
>           {
>             tree decl = SYMBOL_REF_DECL (x);
>             if (!TREE_ASM_WRITTEN (DECL_INITIAL (decl)))
>               {
>                 n_deferred_constants--;
>                 output_constant_def_contents (CONST_CAST_RTX (x));
>               }
>           }
>       }
>     }
> }
> 
> where the iter.substitute avoids a recursive walk.  (The CONST_CAST_RTX
> is "temporary".)
> 
> Does this look like it's worth pursuing?  If so, is it OK to start
> submitting now or should it wait until 4.9.1 is out?  If it is OK,
> I plan to convert the ports too and get rid of for_each_rtx.
> 
> Thanks,
> Richard
> 
> 
> diff --git a/gcc/rtl-iter.h b/gcc/rtl-iter.h
> new file mode 100644
> index 0000000..1bc171d
> --- /dev/null
> +++ b/gcc/rtl-iter.h
> @@ -0,0 +1,293 @@
> +/* RTL iterators
> +   Copyright (C) 2014 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/* This structure describes the subrtxes of an rtx as follows:
> +
> +   - if the rtx has no subrtxes, START and COUNT are both 0.
> +
> +   - if all the subrtxes of an rtx are stored in a contiguous block
> +     of XEXPs ("e"s), START is the index of the first XEXP and COUNT
> +     is the number of them.
> +
> +   - otherwise START is arbitrary and COUNT is UCHAR_MAX.  */
> +struct rtx_subrtx_bound_info {
> +  unsigned char start;
> +  unsigned char count;
> +};
> +extern rtx_subrtx_bound_info rtx_all_subrtx_bounds[];
> +extern rtx_subrtx_bound_info rtx_nonconst_subrtx_bounds[];
> +
> +/* Return true if CODE has no subrtxes.  */
> +
> +static inline bool
> +leaf_code_p (enum rtx_code code)
> +{
> +  return rtx_all_subrtx_bounds[code].count == 0;
> +}
> +
> +/* Used to iterate over subrtxes of an rtx.  T abstracts the type of
> +   access.  */
> +template <typename T>
> +class generic_subrtx_iterator
> +{
> +  static const size_t LOCAL_ELEMS = 16;
> +  typedef typename T::value_type value_type;
> +  typedef typename T::rtx_type rtx_type;
> +  typedef typename T::rtunion_type rtunion_type;
> +
> +public:
> +  struct array_type
> +  {
> +    array_type ();
> +    ~array_type ();
> +    value_type stack[LOCAL_ELEMS];
> +    vec <value_type, va_heap, vl_embed> *heap;
> +  };
> +  generic_subrtx_iterator (array_type &, value_type,
> +                        const rtx_subrtx_bound_info *);
> +  ~generic_subrtx_iterator ();
> +
> +  value_type operator * () const;
> +  bool at_end () const;
> +  void next ();
> +  void skip_subrtxes ();
> +  void substitute (value_type);
> +
> +private:
> +  /* The bounds to use for iterating over subrtxes.  */
> +  const rtx_subrtx_bound_info *m_bounds;
> +
> +  /* The storage used for the worklist.  */
> +  array_type &m_array;
> +
> +  /* The current rtx.  */
> +  value_type m_current;
> +
> +  /* The base of the current worklist.  */
> +  value_type *m_base;
> +
> +  /* The number of subrtxes in M_BASE.  */
> +  size_t m_end;
> +
> +  /* The following booleans shouldn't end up in registers or memory
> +     but just direct control flow.  */
> +
> +  /* True if the iteration is over.  */
> +  bool m_done;
> +
> +  /* True if we should skip the subrtxes of M_CURRENT.  */
> +  bool m_skip;
> +
> +  /* True if M_CURRENT has been replaced with a different rtx.  */
> +  bool m_substitute;
> +
> +  static void free_array (array_type &);
> +  static size_t add_subrtxes_to_queue (array_type &, value_type *, size_t,
> +                                    rtx_type);
> +  static value_type *add_single_to_queue (array_type &, value_type *, size_t,
> +                                       value_type);
> +};
> +
> +template <typename T>
> +inline generic_subrtx_iterator <T>::array_type::array_type () : heap (0) {}
> +
> +template <typename T>
> +inline generic_subrtx_iterator <T>::array_type::~array_type ()
> +{
> +  if (__builtin_expect (heap != 0, false))
> +    free_array (*this);
> +}
> +
> +/* Iterate over X and its subrtxes, in arbitrary order.  Use ARRAY to
> +   store the worklist.  We use an external array in order to avoid
> +   capturing the fields of this structure when taking the address of
> +   the array.  Use BOUNDS to find the bounds of simple "e"-string codes.  */
> +
> +template <typename T>
> +inline generic_subrtx_iterator <T>::
> +generic_subrtx_iterator (array_type &array, value_type x,
> +                      const rtx_subrtx_bound_info *bounds)
> +  : m_bounds (bounds),
> +    m_array (array),
> +    m_current (x),
> +    m_base (m_array.stack),
> +    m_end (0),
> +    m_done (false),
> +    m_skip (false),
> +    m_substitute (false)
> +{
> +}
> +
> +template <typename T>
> +inline generic_subrtx_iterator <T>::~generic_subrtx_iterator ()
> +{
> +}

 What's wrong with just letting the compiler generate that for you?

> +/* Return the current subrtx.  */
> +
> +template <typename T>
> +inline typename T::value_type
> +generic_subrtx_iterator <T>::operator * () const
> +{
> +  return m_current;
> +}
> +
> +/* Return true if the iteration has finished.  */
> +
> +template <typename T>
> +inline bool
> +generic_subrtx_iterator <T>::at_end () const
> +{
> +  return m_done;
> +}
> +
> +/* Move on to the next subrtx.  */
> +
> +template <typename T>
> +inline void
> +generic_subrtx_iterator <T>::next ()
> +{
> +  if (m_substitute)
> +    {
> +      m_substitute = false;
> +      m_skip = false;
> +      return;
> +    }
> +  if (!m_skip)
> +    {
> +      /* Add the subrtxes of M_CURRENT.  */
> +      rtx_type x = T::get_rtx (m_current);
> +      if (__builtin_expect (x != 0, true))
> +     {
> +       enum rtx_code code = GET_CODE (x);
> +       ssize_t count = m_bounds[code].count;
> +       if (count > 0)
> +         {
> +           /* Handle the simple case of a single "e" block that is known
> +              to fit into the current array.  */
> +           if (__builtin_expect (m_end + count <= LOCAL_ELEMS + 1, true))
> +             {
> +               /* Set M_CURRENT to the first subrtx and queue the rest.  */
> +               ssize_t start = m_bounds[code].start;
> +               rtunion_type *src = &x->u.fld[start];
> +               if (__builtin_expect (count > 2, false))
> +                 m_base[m_end++] = T::get_value (src[2].rt_rtx);
> +               if (count > 1)
> +                 m_base[m_end++] = T::get_value (src[1].rt_rtx);
> +               m_current = T::get_value (src[0].rt_rtx);
> +               return;
> +             }
> +           /* Handle cases which aren't simple "e" sequences or where
> +              the sequence might overrun M_BASE.  */
> +           count = add_subrtxes_to_queue (m_array, m_base, m_end, x);
> +           if (count > 0)
> +             {
> +               m_end += count;
> +               if (m_end > LOCAL_ELEMS)
> +                 m_base = m_array.heap->address ();
> +               m_current = m_base[--m_end];
> +               return;
> +             }
> +         }
> +     }
> +    }
> +  else
> +    m_skip = false;
> +  if (m_end == 0)
> +    m_done = true;
> +  else
> +    m_current = m_base[--m_end];
> +}
> +
> +/* Skip the subrtxes of the current rtx.  */
> +
> +template <typename T>
> +inline void
> +generic_subrtx_iterator <T>::skip_subrtxes ()
> +{
> +  m_skip = true;
> +}
> +
> +/* Ignore the subrtxes of the current rtx and look at X instead.  */
> +
> +template <typename T>
> +inline void
> +generic_subrtx_iterator <T>::substitute (value_type x)
> +{
> +  m_substitute = true;
> +  m_current = x;
> +}
> +
> +/* Iterators for const_rtx.  */
> +struct const_rtx_accessor
> +{
> +  typedef const_rtx value_type;
> +  typedef const_rtx rtx_type;
> +  typedef const rtunion rtunion_type;
> +  static rtx_type get_rtx (value_type x) { return x; }
> +  static value_type get_value (rtx_type x) { return x; }
> +};
> +typedef generic_subrtx_iterator <const_rtx_accessor> subrtx_iterator;
> +
> +/* Iterators for non-constant rtx.  */
> +struct rtx_var_accessor
> +{
> +  typedef rtx value_type;
> +  typedef rtx rtx_type;
> +  typedef rtunion rtunion_type;
> +  static rtx_type get_rtx (value_type x) { return x; }
> +  static value_type get_value (rtx_type x) { return x; }
> +};
> +typedef generic_subrtx_iterator <rtx_var_accessor> subrtx_var_iterator;
> +
> +/* Iterators for rtx *.  */
> +struct rtx_ptr_accessor
> +{
> +  typedef rtx *value_type;
> +  typedef rtx rtx_type;
> +  typedef rtunion rtunion_type;
> +  static rtx_type get_rtx (value_type ptr) { return *ptr; }
> +  static value_type get_value (rtx_type &x) { return &x; }
> +};
> +typedef generic_subrtx_iterator <rtx_ptr_accessor> subrtx_ptr_iterator;
> +
> +#define ALL_BOUNDS rtx_all_subrtx_bounds
> +#define NONCONST_BOUNDS rtx_nonconst_subrtx_bounds
> +
> +/* Use ITER to iterate over const_rtx X and its recursive subrtxes,
> +   using subrtx_iterator::array ARRAY as the storage for the worklist.
> +   ARRAY can be reused for multiple consecutive iterations but shouldn't
> +   be shared by two concurrent iterations.  TYPE is ALL if all subrtxes
> +   are of interest or NONCONST if it is safe to ignore subrtxes of
> +   constants.  */
> +#define FOR_EACH_SUBRTX(ITER, ARRAY, X, TYPE) \
> +  for (subrtx_iterator ITER (ARRAY, X, TYPE##_BOUNDS); !ITER.at_end (); \
> +       ITER.next ())
> +
> +/* Like FOR_EACH_SUBRTX, but iterate over subrtxes of an rtx X.  */
> +#define FOR_EACH_SUBRTX_VAR(ITER, ARRAY, X, TYPE) \
> +  for (subrtx_var_iterator ITER (ARRAY, X, TYPE##_BOUNDS); !ITER.at_end (); \
> +       ITER.next ())
> +
> +/* Like FOR_EACH_SUBRTX, but iterate over subrtx pointers of rtx pointer X.
> +   For example, if X is &PATTERN (insn) and the pattern is a SET, iterate
> +   over &PATTERN (insn), &SET_DEST (PATTERN (insn)), etc.  */
> +#define FOR_EACH_SUBRTX_PTR(ITER, ARRAY, X, TYPE) \
> +  for (subrtx_ptr_iterator ITER (ARRAY, X, TYPE##_BOUNDS); !ITER.at_end (); \
> +       ITER.next ())
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index f3471b1..df52788 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree.h"
>  #include "emit-rtl.h"  /* FIXME: Can go away once crtl is moved to rtl.h.  */
>  #include "addresses.h"
> +#include "rtl-iter.h"
>  
>  /* Forward declarations */
>  static void set_of_1 (rtx, const_rtx, void *);
> @@ -62,6 +63,9 @@ static unsigned int num_sign_bit_copies1 (const_rtx, enum 
> machine_mode, const_rt
>     -1 if a code has no such operand.  */
>  static int non_rtx_starting_operands[NUM_RTX_CODE];
>  
> +rtx_subrtx_bound_info rtx_all_subrtx_bounds[NUM_RTX_CODE];
> +rtx_subrtx_bound_info rtx_nonconst_subrtx_bounds[NUM_RTX_CODE];
> +
>  /* Truncation narrows the mode from SOURCE mode to DESTINATION mode.
>     If TARGET_MODE_REP_EXTENDED (DESTINATION, DESTINATION_REP) is
>     SIGN_EXTEND then while narrowing we also have to enforce the
> @@ -78,6 +82,93 @@ static int non_rtx_starting_operands[NUM_RTX_CODE];
>  static unsigned int
>  num_sign_bit_copies_in_rep[MAX_MODE_INT + 1][MAX_MODE_INT + 1];
>  
> +/* Store X into index I of ARRAY.  ARRAY is known to have at least I
> +   elements.  Return the new base of ARRAY.  */
> +
> +template <typename T>
> +typename T::value_type *
> +generic_subrtx_iterator <T>::add_single_to_queue (array_type &array,
> +                                               value_type *base,
> +                                               size_t i, value_type x)
> +{
> +  if (base == array.stack)
> +    {
> +      if (i < LOCAL_ELEMS)
> +     {
> +       base[i] = x;
> +       return base;
> +     }

it seems like this first little bit might be worth inlining, but I guess
you tested and it wasn't.

> +      gcc_checking_assert (i == LOCAL_ELEMS);
> +      vec_safe_grow (array.heap, i + 1);
> +      base = array.heap->address ();
> +      memcpy (base, array.stack, sizeof (array.stack));

that threw me for a sec until I realized it was just manually doing what
auto_vec does in this case.

> +      base[LOCAL_ELEMS] = x;
> +      return base;
> +    }
> +  unsigned int length = array.heap->length ();
> +  if (length > i)
> +    {
> +      gcc_checking_assert (base == array.heap->address ());
> +      base[i] = x;
> +      return base;
> +    }
> +  else
> +    {
> +      gcc_checking_assert (i == length);
> +      vec_safe_push (array.heap, x);
> +      return array.heap->address ();

this chunk seems redundant with what vec_safe_push does do you actually
need to pass in |i| at all?

> +    }
> +}
> +
> +/* Add the subrtxes of X to worklist ARRAY, starting at END.  Return the
> +   number of elements added to the worklist.  */
> +
> +template <typename T>
> +size_t
> +generic_subrtx_iterator <T>::add_subrtxes_to_queue (array_type &array,
> +                                                 value_type *base,
> +                                                 size_t end, rtx_type x)
> +{
> +  const char *format = GET_RTX_FORMAT (GET_CODE (x));
> +  size_t orig_end = end;
> +  for (int i = 0; format[i]; ++i)
> +    if (format[i] == 'e')
> +      {
> +     value_type subx = T::get_value (x->u.fld[i].rt_rtx);
> +     if (__builtin_expect (end < LOCAL_ELEMS, true))
> +       base[end++] = subx;
> +     else
> +       base = add_single_to_queue (array, base, end++, subx);
> +      }
> +    else if (format[i] == 'E')
> +      {
> +     int length = GET_NUM_ELEM (x->u.fld[i].rt_rtvec);
> +     rtx *vec = x->u.fld[i].rt_rtvec->elem;
> +     if (__builtin_expect (end + length <= LOCAL_ELEMS, true))
> +       for (int j = 0; j < length; j++)
> +         base[end++] = T::get_value (vec[j]);
> +     else
> +       for (int j = 0; j < length; j++)
> +         base = add_single_to_queue (array, base, end++,
> +                                     T::get_value (vec[j]));
> +      }
> +  return end - orig_end;
> +}
> +
> +template <typename T>
> +void
> +generic_subrtx_iterator <T>::free_array (array_type &array)
> +{
> +  vec_free (array.heap);
> +}
> +
> +template <typename T>
> +const size_t generic_subrtx_iterator <T>::LOCAL_ELEMS;
> +
> +template class generic_subrtx_iterator <const_rtx_accessor>;
> +template class generic_subrtx_iterator <rtx_var_accessor>;
> +template class generic_subrtx_iterator <rtx_ptr_accessor>;
> +
>  /* Return 1 if the value of X is unstable
>     (would be different at a different point in the program).
>     The frame pointer, arg pointer, etc. are considered stable
> @@ -5324,8 +5415,42 @@ truncated_to_mode (enum machine_mode mode, const_rtx x)
>    return false;
>  }
>  
> +/* Return true if RTX code CODE has a single sequence of zero or more
> +   "e" operands and no rtvec operands.  Initialize its rtx_all_subrtx_bounds
> +   entry in that case.  */
> +
> +static bool
> +setup_reg_subrtx_bounds (unsigned int code)
> +{
> +  const char *format = GET_RTX_FORMAT ((enum rtx_code) code);
> +  unsigned int i = 0;
> +  for (; format[i] != 'e'; ++i)
> +    {
> +      if (!format[i])
> +     /* No subrtxes.  Leave start and count as 0.  */
> +     return true;
> +      if (format[i] == 'E' || format[i] == 'V')
> +     return false;
> +    }
> +
> +  /* Record the sequence of 'e's.  */
> +  rtx_all_subrtx_bounds[code].start = i;
> +  do
> +    ++i;
> +  while (format[i] == 'e');
> +  rtx_all_subrtx_bounds[code].count = i - rtx_all_subrtx_bounds[code].start;
> +  /* rtl-iter.h relies on this.  */
> +  gcc_checking_assert (rtx_all_subrtx_bounds[code].count <= 3);
> +
> +  for (; format[i]; ++i)
> +    if (format[i] == 'E' || format[i] == 'V' || format[i] == 'e')
> +      return false;
> +
> +  return true;
> +}
> +
>  /* Initialize non_rtx_starting_operands, which is used to speed up
> -   for_each_rtx.  */
> +   for_each_rtx, and rtx_all_subrtx_bounds.  */
>  void
>  init_rtlanal (void)
>  {
> @@ -5335,6 +5460,10 @@ init_rtlanal (void)
>        const char *format = GET_RTX_FORMAT (i);
>        const char *first = strpbrk (format, "eEV");
>        non_rtx_starting_operands[i] = first ? first - format : -1;
> +      if (!setup_reg_subrtx_bounds (i))
> +     rtx_all_subrtx_bounds[i].count = UCHAR_MAX;
> +      if (GET_RTX_CLASS (i) != RTX_CONST_OBJ)
> +     rtx_nonconst_subrtx_bounds[i] = rtx_all_subrtx_bounds[i];
>      }
>  
>    init_num_sign_bit_copies_in_rep ();

signature.asc
Description: Digital signature

Re: RFC: Faster for_each_rtx-like iterators

Reply via email to