Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-15 Thread Michael Ploujnikov
On 2018-07-04 04:52 AM, Richard Biener wrote:
> On Tue, Jul 3, 2018 at 9:09 PM Jeff Law  wrote:
>>
>> On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
>>> On 2018-07-03 12:46 PM, Richard Biener wrote:
 On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
  wrote:
> On 2018-06-20 04:23 AM, Richard Biener wrote:
>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>>
>>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
 Hi everyone,

 (I hope this is the right place to ask, if not my apologies; please
 point me in the right direction)

 I'm trying to get a better understanding of the following part in
 tree_swap_operands_p():

   /* It is preferable to swap two SSA_NAME to ensure a canonical
> form
  for commutative and comparison operators.  Ensuring a
> canonical
  form allows the optimizers to find additional redundancies
> without
  having to explicitly check for both orderings.  */
   if (TREE_CODE (arg0) == SSA_NAME
   && TREE_CODE (arg1) == SSA_NAME
   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
 return 1;

 My questions in no particular order: It looks like this was added
> in
 2004. I couldn't find any info other than what's in the
> corresponding
 commit (cc0bdf913) so I'm wondering if the canonical form/order
> still
 relevant/needed today? Does the ordering have to be done based on
> the
 name versions specifically? Or can it be based on something more
 intrinsic to the input source code rather than a GCC internal
> value, eg:
 would alphabetic sort order of the variable names be a reasonable
 replacement?
>>> Canonicalization is still important and useful.
>>
>> Indeed.
>>
>>> However, canonicalizing on SSA_NAMEs is problematical due to the way
> we
>>> recycle nodes and re-pack them.
>>
>> In the past we made sure to not disrupt order - hopefully that didn't
> change
>> so the re-packing shoudln't invaidate previous canonicalization:
>>
>> static void
>> release_free_names_and_compact_live_names (function *fun)
>> {
>> ...
>>   /* And compact the SSA number space.  We make sure to not change
> the
>>  relative order of SSA versions.  */
>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>> {
>>
>>
>>> I think defining additional rules for canonicalization prior to
> using
>>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
>>
>> I don't see a good reason to do that, it will be harder to spot
> canonicalization
>> issues and it will take extra compile-time.
>>
>>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
> are
>>> temporaries generated by the compiler and do not correspond to any
>>> declared/defined object in the original source.  So you'll still
> need
>>> the SSA_NAME_VERSION (or something as stable or better)
> canonicalization
>>> to handle those cases.
>>
>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
> names).
>>
>> Richard.
>>
>>> Jeff
>
> After a bit more digging I found that insert_phi_nodes inserts PHIs in
> the order of UIDs, which indirectly affects the order of vars in
> old_ssa_names, which in turn affects the order in which
> make_ssa_name_fn
> is called to pick SSA versions from FREE_SSANAMES so in the end
> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
> by
> UIDs. I'm trying to figure out if there's a way to avoid depending on
> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
> the same I'm wondering if there's some other info available at the
> point
> of insert_phi_nodes that would be a good replacement for UID. From my
> very limited experience with a very small source input, and if I
> understand things correctly, all of the var_infos have a var which is
> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
> general case? I don't fully understand what are all the things that
> insert_phi_nodes iterates over.

 Why do you want to remove the dependence on UID ordering? It's pervasive 
 throughout the whole compilation...

 Richard.

> - Michael

>>>
>>>
>>> Well, I'm working on a reduction of the number of changes seen with
>>> binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
>>> since current UID assignment is essentially tied to the order of things
>>> in the input source code one function's changes can cascade to others
>>> (even when they're unchanged). As you said UID dependence is quiet
>>> pervasive, and although find

gcc-9-20180715 is now available

2018-07-15 Thread gccadmin
Snapshot gcc-9-20180715 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/9-20180715/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 262671

You'll find:

 gcc-9-20180715.tar.xzComplete GCC

  SHA256=e5b0501c5bee6aac180e9f3cb3d64e901a9858fb3d30d35eb6f90303bb83d652
  SHA1=5f24e7536404812090d47c94ea02907ed31449d4

Diffs from 9-20180708 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [RFC] Update coding conventions to restrict use of non-const references

2018-07-15 Thread Aldy Hernandez
On Thu, Jul 12, 2018 at 1:41 PM Jonathan Wakely  wrote:
>
> On Thu, 12 Jul 2018 at 11:41, Richard Sandiford wrote:
> > +Only use non-constant references in the following situations:
> > +
> > +
> > +
> > +when they are necessary to conform to a standard interface, such as
> > +the first argument to a non-member operator+=
>
> And the return value of such operators (which also applies to member
> operators, which is the more conventional way to write compound
> assignment operators).
>
> > +in a return value, when providing access to something that is known
> > +to be nonnull
> > +
> > +
> > +
> > +In other situations the convention is to use pointers instead.
> > +
> > +
> > +
> > +HOST_WIDE_INT do_arith (..., bool *overflow);   // OK
> > +HOST_WIDE_INT do_arith (..., bool &overflow);   // Please avoid
>
> I understand the objection to using references for out parameters (an
> alternative to pointers is to return a struct with the wide int result
> and the overflow flag), but ...

Putting everything in a new struct is just going through more hoops to
avoid a common language idiom that everyone in C++ is used to.

>
> > +int *elt = &array[i];  // OK
> > +int &elt = array[i];   // Please avoid
>
> ... this seems unnecessary. If the function is so long that the fact
> elt is a reference can easily get lost, the problem is the length of
> the function, not the use of a reference.

Agreed.

Richard (Sandiford), this really looks like going out of your way to
enforce a personal style issue across the entire code base.  It's sad
that we try to come up with new ways to make it even harder for new
developers to contribute to GCC.

Aldy


Re: [RFC] Update coding conventions to restrict use of non-const references

2018-07-15 Thread Aldy Hernandez
On Thu, Jul 12, 2018 at 8:02 PM Pedro Alves  wrote:
>
> On 07/12/2018 05:17 PM, Richard Sandiford wrote:
> > Pedro Alves  writes:
>
> >> (an
> >>> alternative to pointers is to return a struct with the wide int result
> >>> and the overflow flag),
> >>
> >> +1.  I've been pushing GDB in that direction whenever possible.
> >
> > I agree that can sometimes be better.  I guess it depends on the
> > context.  If a function returns a bool and some other data that has no
> > obvious "failure" value, it can be easier to write chained conditions if
> > the function returns the bool and provides the other data via an output
> > parameter.
>
> I agree it depends on context, though your example sounds like a
> case for std::optional.  (We have gdb::optional in gdb, since
> std::optional is C++17 and gdb requires C++11.  LLVM has a similar
> type, I believe.)
>
> >
> >>> but ...
> >>>
>  +int *elt = &array[i];  // OK
>  +int &elt = array[i];   // Please avoid
> >>>
> >>> ... this seems unnecessary. If the function is so long that the fact
> >>> elt is a reference can easily get lost, the problem is the length of
> >>> the function, not the use of a reference.
> >>>
> >>
> >> +1000.  This seems really unnecessary.  References have the advantage
> >> of implicit never-null semantics, for instance.
> >
> > The nonnullness is there either way in the above example though.
> >
> > I don't feel as strongly about the non-const-reference variables, but for:
> >
> >  int &elt = array[i];
> >  ...
> >  elt = 1;
> >
> > it can be easy to misread that "elt = 1" is changing more than just
> > a local variable.
>
> I think that might be just the case for people who haven't used
> references before, and once you get some exposure, the effect
> goes away.  See examples below.

Agreed.

>
> >
> > I take Jonathan's point that it shouldn't be much of a problem if
> > functions are a reasonable size, but we've not tended to be very
> > good at keeping function sizes down.  I guess part of the problem
> > is that even functions that start off small tend to grow over time.
> >
> > We have been better at enforcing more specific rules like the ones
> > in the patch.  And that's easier to do because a patch either adds
> > references or it doesn't.  It's harder to force (or remember to force)
> > someone to split up a function if they're adding 5 lines to one that is
> > already 20 lines long.  Then for the next person it's 25 lines long, etc.
> >
>
> > Also, the "int *elt" approach copes with cases in which "elt" might
> > have to be redirected later, so we're going to need to use it in some
> > cases.
>
> Sure, if you need to reseat, certainly use a pointer.  That actually
> highlights an advantage of references -- the fact that you are sure
> nothing could reseat it.  With a local pointer, you need to be mindful
> of that.  "Does this loop change the pointer?"  Etc.  If the semantics
> of the function call for having a variable that is not meant to be
> reseated, why not allow expressing it with a reference.
> "Write what you mean."  Of course, just like all code, there's going
> to be a judgment call.
>
> A place where references frequently appear is in C++11 range-for loops.
> Like, random example from gdb's codebase:
>
>  for (const symbol_search &p : symbols)
>
> GCC doesn't yet require C++11, but it should be a matter
> of time, one would hope.
>
> Other places references appear often is in coming up with
> an alias to avoid repeating long expressions, like
> for example (again from gdb):
>
>   auto &vec = objfile->per_bfd->demangled_hash_languages;
>   auto it = std::lower_bound (vec.begin (), vec.end (),
>   MSYMBOL_LANGUAGE (sym));
>   if (it == vec.end () || *it != MSYMBOL_LANGUAGE (sym))
> vec.insert (it, MSYMBOL_LANGUAGE (sym));
>
> I don't think the fact that "vec" is a reference here confuses
> anyone.
>
> That was literally a random sample (grepped for "&[a-z]* = ") so
> I ended up picking one with C++11 "auto", but here's another
> random example, spelling out the type name, similarly using
> a reference as a shorthand alias:
>
>   osdata_item &item = osdata->items.back ();
>
>   item.columns.emplace_back (std::move (data->property_name),
>  std::string (body_text));
>
> > It's then a question of whether "int &elt" is useful enough that
> > it's worth accepting it too, and having a mixture of styles in the codebase.
> I don't see it as a mixture of styles, as I don't see
> pointers and references the exact same thing, but rather
> see references as another tool in the box.

Agreed, agreed, agreed :).

Seriously folks, I really think we should concentrate on cleaning up
GCC, and making it more readable, not nitpicking over style issues.
Look at our wide int implementation for instance-- it adheres
perfectly to our coding guidelines, and yet it's a mess mess to read
(*).

(*) I'm not singling out wide int because Richard worked o