On Fri, 2018-06-29 at 09:09 +0200, Richard Biener wrote:
> On Thu, Jun 28, 2018 at 4:29 PM David Malcolm <[email protected]>
> wrote:
> >
> > On Thu, 2018-06-28 at 13:29 +0200, Richard Biener wrote:
> > > On Tue, Jun 26, 2018 at 3:54 PM David Malcolm <[email protected]
> > > m>
> > > wrote:
> > > >
> > > > On Mon, 2018-06-25 at 15:34 +0200, Richard Biener wrote:
> > > > > On Wed, Jun 20, 2018 at 6:34 PM David Malcolm <dmalcolm@redha
> > > > > t.co
> > > > > m>
> > > > > wrote:
> > > > > >
> > > > > > Here's v3 of the patch (one big patch this time, rather
> > > > > > than a
> > > > > > kit).
> > > > > >
> > > > > > Like the v2 patch kit, this patch reuses the existing dump
> > > > > > API,
> > > > > > rather than inventing its own.
> > > > > >
> > > > > > Specifically, it uses the dump_* functions in dumpfile.h
> > > > > > that
> > > > > > don't
> > > > > > take a FILE *, the ones that implicitly write to dump_file
> > > > > > and/or
> > > > > > alt_dump_file. I needed a name for them, so I've taken to
> > > > > > calling
> > > > > > them the "structured dump API" (better name ideas welcome).
> > > > > >
> > > > > > v3 eliminates v2's optinfo_guard class, instead using
> > > > > > "dump_*_loc"
> > > > > > calls as delimiters when consolidating "dump_*"
> > > > > > calls. There's
> > > > > > a
> > > > > > new dump_context class which has responsibility for
> > > > > > consolidating
> > > > > > them into optimization records.
> > > > > >
> > > > > > The dump_*_loc calls now capture more than just a
> > > > > > location_t:
> > > > > > they
> > > > > > capture the profile_count and the location in GCC's own
> > > > > > sources
> > > > > > where
> > > > > > the dump is being emitted from.
> > > > > >
> > > > > > This works by introducing a new "dump_location_t" class as
> > > > > > the
> > > > > > argument of those dump_*_loc calls. The dump_location_t
> > > > > > can
> > > > > > be constructed from a gimple * or from an rtx_insn *, so
> > > > > > that
> > > > > > rather than writing:
> > > > > >
> > > > > > dump_printf_loc (MSG_NOTE, gimple_location (stmt),
> > > > > > "some message: %i", 42);
> > > > > >
> > > > > > you can write:
> > > > > >
> > > > > > dump_printf_loc (MSG_NOTE, stmt,
> > > > > > "some message: %i", 42);
> > > > > >
> > > > > > and the dump_location_t constructor will grab the
> > > > > > location_t
> > > > > > and
> > > > > > profile_count of stmt, and the location of the
> > > > > > "dump_printf_loc"
> > > > > > callsite (and gracefully handle "stmt" being NULL).
> > > > > >
> > > > > > Earlier versions of the patch captured the location of the
> > > > > > dump_*_loc call via preprocessor hacks, or didn't work
> > > > > > properly;
> > > > > > this version of the patch works more cleanly: internally,
> > > > > > dump_location_t is split into two new classes:
> > > > > > * dump_user_location_t: the location_t and profile_count
> > > > > > within
> > > > > > the *user's code*, and
> > > > > > * dump_impl_location_t: the __builtin_FILE/LINE/FUNCTION
> > > > > > within
> > > > > > the *implementation* code (i.e. GCC or a plugin),
> > > > > > captured
> > > > > > "automagically" via default params
> > > > > >
> > > > > > These classes are sometimes used elsewhere in the
> > > > > > code. For
> > > > > > example, "vect_location" becomes a dump_user_location_t
> > > > > > (location_t and profile_count), so that in e.g:
> > > > > >
> > > > > > vect_location = find_loop_location (loop);
> > > > > >
> > > > > > it's capturing the location_t and profile_count, and then
> > > > > > when
> > > > > > it's used here:
> > > > > >
> > > > > > dump_printf_loc (MSG_NOTE, vect_location, "foo");
> > > > > >
> > > > > > the dump_location_t is constructed from the vect_location
> > > > > > plus the dump_impl_location_t at that callsite.
> > > > > >
> > > > > > In contrast, loop-unroll.c's report_unroll's "locus" param
> > > > > > becomes a dump_location_t: we're interested in where it was
> > > > > > called from, not in the locations of the various dump_*_loc
> > > > > > calls
> > > > > > within it.
> > > > > >
> > > > > > Previous versions of the patch captured a gimple *, and
> > > > > > needed
> > > > > > GTY markers; in this patch, the dump_user_location_t is now
> > > > > > just a
> > > > > > location_t and a profile_count.
> > > > > >
> > > > > > The v2 patch added an overload for dump_printf_loc so that
> > > > > > you
> > > > > > could pass in either a location_t, or the new type; this
> > > > > > version
> > > > > > of the patch eliminates that: they all now take
> > > > > > dump_location_t.
> > > > > >
> > > > > > Doing so required adding support for rtx_insn *, so that
> > > > > > one
> > > > > > can
> > > > > > write this kind of thing in RTL passes:
> > > > > >
> > > > > > dump_printf_loc (MSG_NOTE, insn, "foo");
> > > > > >
> > > > > > One knock-on effect is that get_loop_location now returns a
> > > > > > dump_user_location_t rather than a location_t, so that it
> > > > > > has
> > > > > > hotness information.
> > > > > >
> > > > > > Richi: would you like me to split out this location-
> > > > > > handling
> > > > > > code into a separate patch? (It's kind of redundant
> > > > > > without
> > > > > > adding the remarks and optimization records work, but if
> > > > > > that's
> > > > > > easier I can do it)
> > > > >
> > > > > I think that would be easier because it doesn't require the
> > > > > JSON
> > > > > stuff and so I'll happily approve it.
> > > > >
> > > > > Thus - trying to review that bits (and sorry for the delay).
> > > > >
> > > > > + location_t srcloc = loc.get_location_t ();
> > > > > +
> > > > > if (dump_file && (dump_kind & pflags))
> > > > > {
> > > > > - dump_loc (dump_kind, dump_file, loc);
> > > > > + dump_loc (dump_kind, dump_file, srcloc);
> > > > > print_gimple_stmt (dump_file, gs, spc, dump_flags |
> > > > > extra_dump_flags);
> > > > > }
> > > > >
> > > > > if (alt_dump_file && (dump_kind & alt_flags))
> > > > > {
> > > > > - dump_loc (dump_kind, alt_dump_file, loc);
> > > > > + dump_loc (dump_kind, alt_dump_file, srcloc);
> > > > > print_gimple_stmt (alt_dump_file, gs, spc, dump_flags
> > > > > |
> > > > > extra_dump_flags);
> > > > > }
> > > > > +
> > > > > + if (optinfo_enabled_p ())
> > > > > + {
> > > > > + optinfo &info = begin_next_optinfo (loc);
> > > > > + info.handle_dump_file_kind (dump_kind);
> > > > > + info.add_stmt (gs, extra_dump_flags);
> > > > > + }
> > > > >
> > > > > seeing this in multiple places. I seem to remember that
> > > > > dump_file / alt_dump_file was suposed to handle dumping
> > > > > into two locations - a dump file and optinfo (or
> > > > > stdout). This
> > > > > looks
> > > > > like the optinfo "stream" is even more separate. Could that
> > > > > obsolete the alt_dump_file stream? I'd need to review
> > > > > existing
> > > > > stuff
> > > > > in more detail to answer but maybe you already know from
> > > > > recently
> > > > > digging into this.
> > > >
> > > > Possibly. I attempted this in v1 of the patch, but it was
> > > > mixed in
> > > > with
> > > > a bunch of other stuff. I'll have another go at doing this.
> > > >
> > > > > Oh, and all the if (optinfo_enable_p ()) stuff is for the
> > > > > followup
> > > > > then, right?
> > > >
> > > > Yes.
> > > >
> > > > > I like the boiler-plate changes to dump_* using stuff a lot,
> > > > > so
> > > > > the
> > > > > infrastructure to do that (the location wrapping) and these
> > > > > boiler-
> > > > > plate
> > > > > changes are pre-approved if split out.
> > > >
> > > > Thanks. I split out the location wrapping, and have committed
> > > > it
> > > > to
> > > > trunk (r262149). It's not clear to me exactly what other parts
> > > > you
> > > > liked,
> > > > so I'm going to try to split out more of the non-JSON bits in
> > > > the
> > > > hope that some parts are good enough as-is, and I'll post them
> > > > for
> > > > review
> > > > as followups.
> > > >
> > > > For reference, here's what I've committed (I added some obvious
> > > > changes
> > > > to doc/optinfo.texi).
> > > >
> > > > > I think the *_REMARK stuff should get attention of the
> > > > > respective
> > > > > maintainers - not sure what the difference between NOTE and
> > > > > REMARK
> > > > > is ;)
> > > >
> > > > Me neither :) I think it might come down to "this is purely a
> > > > debugging
> > > > message for a pass maintainer" (MSG_NOTE) vs "this is a high-
> > > > level
> > > > thing
> > > > that an advanced user want to see" (MSG_REMARK???).
> > > >
> > > > One idea that occurred to me: are we overusing
> > > > dump_flags_t? It's
> > > > a
> > > > mixture of TDF_* bitfields (used by our internal dumps) plus
> > > > MSG_*
> > > > bitfields (used with -fopt-info). IIRC the only TDF_ bitfield
> > > > that's
> > > > ever used with the MSG_* bitfields is TDF_DETAILS. Might it
> > > > make
> > > > sense
> > > > to split out the MSG_* bitfields into a different type
> > > > (dump_level_t???)
> > > > to reinforce this split? (and thus the dump_* entrypoints that
> > > > don't take
> > > > a FILE * would take this new type). I'm not sure if this is a
> > > > good
> > > > idea
> > > > or not.
> > >
> > > Making it even more complicated doesn't make it easier to use it
> > > "correctly". So I'd rather try to simplify it. How passes use
> > > TDF_DETAILS vs. non-details is already highly inconsistent. This
> > > is why I liked the original optinfo work because it somehow made
> > > user-interesting vs. developer-interesting with the same API
> > > (and to the same dump-file). Just that MSG_NOTE is exposed to
> > > users while I think it should be developer-only...
> > >
> > > IMHO the TDF_ stuff should control things at the stmt/tree/BB
> > > level,
> > > thus be IL specific flags while the MSG_ stuff should control
> > > pass-specific dumping detail.
> >
> > You mention user-interesting vs developer-interesting, and whether
> > it
> > would be the same API and/or the same dump-file.
> >
> > I've posted a bunch of patches here, some small, some large, trying
> > various different appoaches, coming at the problem from at least
> > two directions, so maybe it's worth discussing the overall
> > direction
> > here (with some ASCII art!)
> >
> > * what is the status quo?
> > * what do we want to achieve?
> > * how do we get there?
> >
> > Status quo:
> > * We have the "dump_*(MSG_*)" API which hides the destination of
> > where we're dumping to (currently dump_file and/or alt_file_file,
> > conditionalized via flags).
> >
> > dump_* (MSG_*) ----> dumpfile.c --+--> dump_file
> > |
> > `--> alt_dump_file
> >
> > * As of r262149 (the dump_location_t commit) dumpfile.c receives
> > the
> > hotness and emission location of each dump_*_loc call, but
> > currently
> > it does nothing with that information.
> > * The messages that we emit through the "dump_*(MSG_*)" API and
> > their
> > "levels" are currently rather haphazard. Some are close to being
> > suitable for end-users, but most seem never intended to be
> > end-user-facing. For reference:
> > grep -nH -e MSG_MISSED_OPTIMIZATION *.c *.cc | wc -l
> > 452
> > grep -nH -e MSG_NOTE *.c *.cc | wc -l
> > 551
> > grep -nH -e MSG_OPTIMIZED_LOCATIONS *.c *.cc | wc -l
> > 39
> > (though some of these are support code e.g. in dumpfile.c, rather
> > than uses).
>
> Yep. I believe MSG_NOTE was the fallout of optinfo introduction and
> my
> request to share the same API with dumping to the dumpfile. MSG_NOTE
> just received "anything else" ...
>
> > * The API builds up messages programatically, which is hostile to
> > i18n. The API isn't marked as needing i18n for its strings
> > (which
> > IIRC is via using "gmsgid" as a param name being special-cased in
> > the gettext toolchain).
> >
> > What do we want to achieve?
> > * I want end-users to be able to enable high-level dump messages
> > about
> > optimizations, and for those messages to be decipherable without
> > reading GCC sources e.g. the opt_problem idea posted here:
> > * "[PATCH] [RFC] Higher-level reporting of vectorization
> > problems"
> > * https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01462.html
> > Presumably such messages would need to be
> > internationalized. Many
> > other messages are really internal only, and would be a burden to
> > impose on our translators.
> > * I want to support additional destinations for any/all dump
> > messages
> > beyond just the dump file and -fopt-info:
> > * putting them through the diagnostics subsystem (as "remarks")
> > * storing them to disk in a machine-readable format, so that
> > 3rd-party tools can present useful visualizations of how the
> > user's code is being optimized, e.g. prioritized by hotness
> > * I want to preserve the existing API and most existing uses of it
> > (to minimize churn and thus keep our lives easier)
> >
> > The patches I've been posting so far add additional dump
> > destinations,
> > like this:
> >
> > dump_* (MSG_*) >---> dumpfile.c --+--> dump_file
> > |
> > +--> alt_dump_file
> > |
> > +--> diagnostic "remarks"
> > |
> > +--> "optimization records"
> > (saved to disk)
> >
> > I believe you favor an approach of "MSG_NOTE is internal, whereas
> > MSG_MISSED_OPTIMIZATION and MSG_OPTIMIZED_LOCATIONS are for end
> > users".
>
> Yeah, kind of. You introduced another MSG_ variant
"MSG_REMARK" above, though I'm not sure I like that name.
> and in the end we might
> need to go that way. I guess renaming MSG_NOTE to MSG_DEBUG would
> be a step in the direction to reflect reality ;)
I think I want to look more at the "opt_problem" idea about high-level
vectorization messages, and attack the problem from that direction;
maybe that will suggest a good way forward for our existing dump messages.
> > I think the "should this dump message be internationalized?" issue
> > introduces a natural separation between dump messages aimed at
> > end-users vs purely "internal" dumps; this suggests to me that
> > we need a new API for such dump messages, designed to support i18n,
> > and marked as such for gettext, so I favor something like this:
> >
> > SOURCES DESTINATIONS
> > dump_* (MSG_*) >---> dumpfile.c --+--> dump_file
> > | |
> > new dump API -->-+ +--> alt_dump_file
> > |
> > +--> diagnostic "remarks"
> > |
> > +--> "optimization records"
> > (saved to disk)
> >
> > (I'm deliberately being vague about what this i18n-enabled dump API
> > might look like)
>
> I know I threw i18n into the discussion - but I'd say we should focus
> on other details first. I think there's no need to _require_ the
> "debug"
> dumpfile stuff not be translated, but to make it easier for
> translators
> I suppose being able to "amend" the API calls with a "do not
> translate"
> variant would be OK.
I'm thinking that only a small fraction of the dump messages will be
user-facing, so the default would be "do not translate" with the rare
special-case being "do translate".
> So in what way is the dump_* API not suitable for translation
> that the diagnostic machinery (warning/error) does not suffer from?
Consider this example, from "vect_create_data_ref_ptr" (I used this
example way back in the v1 patch kit):
if (dump_enabled_p ())
{
tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
dump_printf_loc (MSG_NOTE, vect_location,
"create %s-pointer variable to type: ",
get_tree_code_name (TREE_CODE (aggr_type)));
dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type);
if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
dump_printf (MSG_NOTE, " vectorizing an array ref: ");
else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
dump_printf (MSG_NOTE, " vectorizing a vector ref: ");
else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
dump_printf (MSG_NOTE, " vectorizing a record based array ref: ");
else
dump_printf (MSG_NOTE, " vectorizing a pointer ref: ");
dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr));
dump_printf (MSG_NOTE, "\n");
}
where there are two problems with translation:
(a) the message is built up piecewise, with conditional logic.
(b) there's nothing yet marking the string fragments as needing
translation
> That is, if we internally make dump_* go through the pretty-printers
> we can handle stuff like quoting or even stmts/expressions. I think
> that would be useful for dumping to dumpfiles as well.
That could work. Currently dump_printf* use fprintf internally, but I
don't think we make much use of the codes there. So we could make it
use a pretty_printer internally. The core pretty-print.c code doesn't
handle any of the interesting middle-end types we'd want to talk about
(gimple, tree, symtab_node *, etc), so I think to have a custom
middle-end dump pp_format_decoder callback. The above example could
then look something like:
if (dump_enabled_p ())
{
tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr));
const char *msgid;
if (TREE_CODE (dr_base_type) == ARRAY_TYPE)
msgid = ("create %s-pointer variable to type: %T"
" vectorizing an array ref: %T");
else if (TREE_CODE (dr_base_type) == VECTOR_TYPE)
msgid = ("create %s-pointer variable to type: %T"
" vectorizing a vector ref: %T");
else if (TREE_CODE (dr_base_type) == RECORD_TYPE)
msgid = ("create %s-pointer variable to type: %T"
" vectorizing a record based array ref: %T");
else
msgid = ("create %s-pointer variable to type: %T"
" vectorizing a pointer ref: %T");
dump_printf_loc (MSG_NOTE, vect_location, msgid,
get_tree_code_name (TREE_CODE (aggr_type)));
aggr_type, DR_BASE_OBJECT (dr));
dump_printf (MSG_NOTE, "\n");
}
where I'm using "%T" as the format code for tree with TDF_SLIM. The
optinfo could could then "print" these more interesting types by
capturing the metadata (and thus expose it in optimization records).
Though, that said, I don't think that message as-is is something we'd
want end-users to see, and thus not something for translators.
Maybe something like:
extern void dump_userfacing_printf_loc (dump_flags_t,
const dump_location_t &,
const char *gmsgid, ...)
ATTRIBUTE_DUMP_PRINTF_3;
for the small subset of messages that we'd want to be translated?
(with ATTRIBUTE_DUMP_PRINTF_3 to signify that we have a new family
of format codes).
(I'm not in love with that function name)
> >
> > That said, I have a version of the patch kit which does just this:
> >
> > dump_* (MSG_*) ----> dumpfile.c --+--> dump_file
> > |
> > +--> alt_dump_file
> > |
> > +--> diagnostic "remarks"
> >
> > i.e. generalizing the dump destinations (by adding optinfo
> > internally),
> > but without requiring the JSON support or touching the dump API .
> > Would
> > that be suitable as a next step?
>
> Yes. Thanks for going step-by-step btw, this is really useful in
> simplifying
> the picture.
>
> Richard.
Thanks.
Here's a reduced version of the patch kit which builds on what's
already in trunk.
Patch 1 adds the basic optinfo machinery, consolidating the dump_*
calls into optinfo objects that can be sent to multiple dump
destinations - but it doesn't add any destinations. It captures
"rich" information about what is dumped, tagged with things like
code hotness information, enough to eventually implement optimization
records in a followup.
dump_* (MSG_*) --> dumpfile.c --+--> (a) dump_file
|
+--> (b) alt_dump_file
|
`--> (c) optinfo
Patch 2 adds in the first destination for them: diagnostic
"remarks" (updated to work on top of patch 1). It's not perfect
yet (we probably need to be able to filter them using
optinfo_group_t and probably other criteria), and I'm posting it now
to provide more motivation for patch 1, but it's stable enough for
trunk (though a couple of FIXMEs remain).
dump_* (MSG_*) --> dumpfile.c --+--> (a) dump_file
|
+--> (b) alt_dump_file
|
`--> (c) optinfo
`---> optinfo destinations
(c.1) diagnostic "remarks"
I have followups to add JSON optimizations records as another kind
of optinfo destination, so that the picture would become:
dump_* (MSG_*) --> dumpfile.c --+--> (a) dump_file
|
+--> (b) alt_dump_file
|
`--> (c) optinfo
`---> optinfo destinations
(c.1) diagnostic "remarks"
(c.2) optimization records
(the patches add this ASCII art to dumpfile.h, btw)
As mentioned before, the optinfo consolidation works by making the
dump_*_loc calls implicitly start a new optinfo, terminating and
emitting any pending one. This avoids having to touch all the dump
code to explicitly terminate messages, but means that optinfo
instances can be pending for a while, until the next dump_*_loc
happens (or the end of the pass, or a GC; see the patch). Sadly,
this delay seems to make them more suitable for user-facing messages
than for debugging GCC itself. Because of this, I didn't think it
was appropriate to port the alt_dump_file code to using it (better to
have that be more direct). But I can do that if you want.
(Alternatively, maybe a dump_printf* call that ends with a newline
should terminate the optinfo? Doing that would make most of them
terminate themselves)
Dave
David Malcolm (2):
Add "optinfo" framework
optinfo: add diagnostic remarks
gcc/Makefile.in | 2 +
gcc/common.opt | 17 +
gcc/coretypes.h | 7 +
gcc/diagnostic-color.c | 2 +
gcc/diagnostic-core.h | 2 +
gcc/diagnostic.c | 17 +
gcc/diagnostic.def | 1 +
gcc/doc/invoke.texi | 67 ++++
gcc/dump-context.h | 128 +++++++
gcc/dumpfile.c | 497 +++++++++++++++++++++++++--
gcc/dumpfile.h | 84 +++--
gcc/fortran/gfc-diagnostic.def | 1 +
gcc/ggc-page.c | 2 +
gcc/opt-functions.awk | 1 +
gcc/optinfo-emit-diagnostics.cc | 317 +++++++++++++++++
gcc/optinfo-emit-diagnostics.h | 26 ++
gcc/optinfo-internal.h | 148 ++++++++
gcc/optinfo.cc | 252 ++++++++++++++
gcc/optinfo.h | 173 ++++++++++
gcc/opts.c | 4 +
gcc/opts.h | 13 +-
gcc/profile-count.c | 28 ++
gcc/profile-count.h | 5 +
gcc/selftest-run-tests.c | 2 +
gcc/selftest.h | 2 +
gcc/testsuite/gcc.dg/plugin/plugin.exp | 2 +
gcc/testsuite/gcc.dg/plugin/remarks-1.c | 30 ++
gcc/testsuite/gcc.dg/plugin/remarks_plugin.c | 152 ++++++++
gcc/testsuite/lib/gcc-dg.exp | 9 +
29 files changed, 1924 insertions(+), 67 deletions(-)
create mode 100644 gcc/dump-context.h
create mode 100644 gcc/optinfo-emit-diagnostics.cc
create mode 100644 gcc/optinfo-emit-diagnostics.h
create mode 100644 gcc/optinfo-internal.h
create mode 100644 gcc/optinfo.cc
create mode 100644 gcc/optinfo.h
create mode 100644 gcc/testsuite/gcc.dg/plugin/remarks-1.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/remarks_plugin.c
--
1.8.5.3