On 05/16/2017 09:14 PM, David Malcolm wrote:
> On Mon, 2017-05-15 at 15:36 +0200, Martin Liška wrote:
>> Hi.
>>
>> I sent this email to David some time ago, but it should be probably
>> answered
>> on gcc mailing list.
>
>> I have idea one to improve gcov tool and I'm interested in more
>> precise locations for gimple
>> statements. For gcov purpose, we dump location in ipa-profile pass,
>> which is an early IPA
>> pass and this data is used by gcov tool to map statements (blocks) to
>> lines of code.
>>
>> I did a small experiment on the place we emit the location data:
>> inform (gimple_location (stmt), "output_location");
>>
>> and it shows for:
>> $ cat m2.c
>> unsigned int
>> UuT (void)
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned int
>> ret = 0; if (true_var) /* count(1) */ { if (false_var) /* count(1) */
>> ret = 111; /* count(#####) */ } else ret = 999; /* count(#####) */
>> return ret; }
>>
>> int
>> main (int argc, char **argv)
>> {
>> UuT ();
>> return 0;
>> }
>>
>> $ gcc --coverage m2.c
>> m2.c: In function ‘main’:
>> m2.c:8:3: note: output_location
>> UuT ();
>> ^~~~~~
>> # .MEM_2 = VDEF <.MEM_1(D)>
>> UuT ();
>> m2.c:9:10: note: output_location
>> return 0;
>> ^
>> _3 = 0;
>> m2.c: In function ‘UuT’:
>> m2.c:3:16: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>> ^~~~~~~~
>> true_var_3 = 1;
>> m2.c:3:43: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>> ^~~~~~~~~
>> false_var_4 = 0;
>> m2.c:3:71: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>> ^~~
>> ret_5 = 0;
>> m2.c:3:83: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>> ^
>> if (true_var_3 != 0)
>> m2.c:3:114: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>> ^
>> if (false_var_4 != 0)
>> m2.c:3:145: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>>
>> ~~~~^~~~~
>> ret_7 = 111;
>> m2.c:3:182: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>>
>> ~~~~^~~~~
>> ret_6 = 999;
>> m2.c:3:215: note: output_location
>> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
>> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
>> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
>> count(#####) */ return ret; }
>>
>>
>>
>> ^~~
>> _8 = ret_2;
>> m2.c:3:215: note: output_location
>> # VUSE <.MEM_9(D)>
>> return _8;
>>
>> Which is not optimal, for some assignments I see just LHS
>> (false_var_4 = 0),
>
> My first though was: are there assignments for which this isn't the
> case? The only one I see is the:
> ret = 999;
> ~~~~^~~~~
>
> Are the locations for these assignments coming through from the
> frontend?
Hi.
Actually not all, the default assignments are created in gimplifier and
location is assigned from DECL_EXPR:
(gdb) p debug_tree(*expr_p)
<decl_expr 0x7ffff6988c80
type <void_type 0x7ffff6878f18 void VOID
align 8 symtab 0 alias set -1 canonical type 0x7ffff6878f18
pointer_to_this <pointer_type 0x7ffff68800a8>>
side-effects
arg 0 <var_decl 0x7ffff7f9ae10 true_var
type <integer_type 0x7ffff6878690 unsigned int public unsigned SI
size <integer_cst 0x7ffff6860f18 constant 32>
unit size <integer_cst 0x7ffff6860f30 constant 4>
align 32 symtab 0 alias set -1 canonical type 0x7ffff6878690
precision 32 min <integer_cst 0x7ffff6860f48 0> max <integer_cst 0x7ffff6860f00
4294967295>
pointer_to_this <pointer_type 0x7ffff6885dc8>>
used unsigned SI file /tmp/m2.c line 4 col 16 size <integer_cst
0x7ffff6860f18 32> unit size <integer_cst 0x7ffff6860f30 4>
align 32 context <function_decl 0x7ffff697ce00 UuT> initial
<integer_cst 0x7ffff698b258 1>
chain <var_decl 0x7ffff7f9aea0 false_var type <integer_type
0x7ffff6878690 unsigned int>
used unsigned SI file /tmp/m2.c line 4 col 43 size <integer_cst
0x7ffff6860f18 32> unit size <integer_cst 0x7ffff6860f30 4>
align 32 context <function_decl 0x7ffff697ce00 UuT> initial
<integer_cst 0x7ffff6860f48 0> chain <var_decl 0x7ffff7f9af30 ret>>>
/tmp/m2.c:4:16 start: /tmp/m2.c:4:16 finish: /tmp/m2.c:4:23>
That explains why only LHS of these assignments is selected.
>
> I believe that in the C frontend these are assignment-expression, and
> hence handled by c_parser_expr_no_commas; in particular the use of
> op_location and the call:
>
> set_c_expr_source_range (&ret, lhs.get_start (), rhs.get_finish ());
>
> ought to be setting up the caret of the assignment to be on the
> operator token, and for the start/finish to range from the start of the
> lhs to the end of the rhs i.e. what we see for:
>
> ret = 999;
> ~~~~^~~~~
Yep, MODIFY_EXPRs created in FE go this way and it's fine.
>
>
>> for return statements only a returned value is displayed.
>
> Is this running on SSA form? If so, I wonder if you're running into
> something like this:
>
> retval_N = PHI <lots of values>;
> return retval_N;
>
> where it's using the location of that "return retval_N;" for all of the
> return statements in the function, rather than the individual source
> locations.
Yep, but we properly assign each assignment to a SSA name that's going to
be merged in exit BB by PHI node:
_8 = ret_2;
/tmp/m2.c:7:8: note: output_location
return ret; }
^~~
Here the location comes from c_finish_return function where location
comes from a value that's returned.
>
>> For conditions, only condition beginning is showed.
>> Is this known behavior or do I miss
>> something?
>
> c_parser_if_statement has:
>
> loc = c_parser_peek_token (parser)->location;
>
> which is that of the open-paren. Maybe we should build a location
> covering the range of the "if ( expression )" part of the if-statement?
Adding Marek as C FE maintainer to reply the question.
>
> Note that the C++ frontend may do different things than the C frontend
> for any and all of these. When I added range-based locations to them,
> I attempted to preserve the caret locations to avoid affecting stepping
> behavior in the debugger (since the caret location is used when
> stripping away range information); we could revisit some of the
> decisions.
Looks both behave the same.
>
> BTW, in the subject line, you mention a "rich_location", but I think
> you mean a source_location (aka location_t): this is a caret location
> along with range information; a rich_location is one or more of them,
> potentially with fix-it hints. See "Example B" and "Example C" in line
> -map.h. Admittedly the terminology there is a little muddled (sorry).
Yep, I was confused by fact that I use inform function to display location_t
locations,
and it really builds rich_location:
void
inform (location_t location, const char *gmsgid, ...)
{
va_list ap;
va_start (ap, gmsgid);
rich_location richloc (line_table, location);
diagnostic_impl (&richloc, -1, gmsgid, &ap, DK_NOTE);
va_end (ap);
}
Thanks for help,
Martin
>
>
> Dave
>