On Mon, 2017-05-15 at 15:36 +0200, Martin Liška wrote:
> Hi.
>
> I sent this email to David some time ago, but it should be probably
> answered
> on gcc mailing list.
> I have idea one to improve gcov tool and I'm interested in more
> precise locations for gimple
> statements. For gcov purpose, we dump location in ipa-profile pass,
> which is an early IPA
> pass and this data is used by gcov tool to map statements (blocks) to
> lines of code.
>
> I did a small experiment on the place we emit the location data:
> inform (gimple_location (stmt), "output_location");
>
> and it shows for:
> $ cat m2.c
> unsigned int
> UuT (void)
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned int
> ret = 0; if (true_var) /* count(1) */ { if (false_var) /* count(1) */
> ret = 111; /* count(#####) */ } else ret = 999; /* count(#####) */
> return ret; }
>
> int
> main (int argc, char **argv)
> {
> UuT ();
> return 0;
> }
>
> $ gcc --coverage m2.c
> m2.c: In function ‘main’:
> m2.c:8:3: note: output_location
> UuT ();
> ^~~~~~
> # .MEM_2 = VDEF <.MEM_1(D)>
> UuT ();
> m2.c:9:10: note: output_location
> return 0;
> ^
> _3 = 0;
> m2.c: In function ‘UuT’:
> m2.c:3:16: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
> ^~~~~~~~
> true_var_3 = 1;
> m2.c:3:43: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
> ^~~~~~~~~
> false_var_4 = 0;
> m2.c:3:71: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
> ^~~
> ret_5 = 0;
> m2.c:3:83: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
> ^
> if (true_var_3 != 0)
> m2.c:3:114: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
> ^
> if (false_var_4 != 0)
> m2.c:3:145: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
>
> ~~~~^~~~~
> ret_7 = 111;
> m2.c:3:182: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
>
> ~~~~^~~~~
> ret_6 = 999;
> m2.c:3:215: note: output_location
> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> count(#####) */ return ret; }
>
>
>
> ^~~
> _8 = ret_2;
> m2.c:3:215: note: output_location
> # VUSE <.MEM_9(D)>
> return _8;
>
> Which is not optimal, for some assignments I see just LHS
> (false_var_4 = 0),
My first though was: are there assignments for which this isn't the
case? The only one I see is the:
ret = 999;
~~~~^~~~~
Are the locations for these assignments coming through from the
frontend?
I believe that in the C frontend these are assignment-expression, and
hence handled by c_parser_expr_no_commas; in particular the use of
op_location and the call:
set_c_expr_source_range (&ret, lhs.get_start (), rhs.get_finish ());
ought to be setting up the caret of the assignment to be on the
operator token, and for the start/finish to range from the start of the
lhs to the end of the rhs i.e. what we see for:
ret = 999;
~~~~^~~~~
> for return statements only a returned value is displayed.
Is this running on SSA form? If so, I wonder if you're running into
something like this:
retval_N = PHI <lots of values>;
return retval_N;
where it's using the location of that "return retval_N;" for all of the
return statements in the function, rather than the individual source
locations.
> For conditions, only condition beginning is showed.
> Is this known behavior or do I miss
> something?
c_parser_if_statement has:
loc = c_parser_peek_token (parser)->location;
which is that of the open-paren. Maybe we should build a location
covering the range of the "if ( expression )" part of the if-statement?
Note that the C++ frontend may do different things than the C frontend
for any and all of these. When I added range-based locations to them,
I attempted to preserve the caret locations to avoid affecting stepping
behavior in the debugger (since the caret location is used when
stripping away range information); we could revisit some of the
decisions.
BTW, in the subject line, you mention a "rich_location", but I think
you mean a source_location (aka location_t): this is a caret location
along with range information; a rich_location is one or more of them,
potentially with fix-it hints. See "Example B" and "Example C" in line
-map.h. Admittedly the terminology there is a little muddled (sorry).
Dave