gcc@gcc.gnu.org

2008-08-15 Thread Manuel López-Ibáñez
2008/8/14 Daniel Berlin <[EMAIL PROTECTED]>:
> 1. You can't assume VUSE's are must-aliases.  The fact that there is a
> vuse for something does not imply it is must-used, it implies it is
> may-used.
>
> We do not differentiate may-use from must-use in our alias system. You
> can do some trivial must-use analysis if you like (by computing
> cardinality of points-to set as either single or multiple and
> propagating/meeting it in the right place).
>
> Must-use is actually quite rare.

Then, is it impossible to distinguish the following testcase and the
one from my previous mail with the current infrastructure?

extern void foo (int *);
extern void bar (int);

void
baz (void)
{
  int i;
  if (i) /* { dg-warning "is used uninitialized" "uninit i warning" } */
bar (i);
  foo (&i);
}

> 2. " if (!gimple_references_memory_p (def))
> +   return;
> +"
> Is nonsensical the SSA_NAME_DEF_STMT of a vuse must contain a vdef,
> and thus must access memory.

Two things here.

1) The case I am trying to war about is:

 # BLOCK 2 freq:1
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  [/home/manuel/src/trunk/gcc/testsuite/gcc.dg/uninit-B.c : 12] # VUSE
 { iD.1951 }
  i.0D.1952_1 = iD.1951;
  [/home/manuel/src/trunk/gcc/testsuite/gcc.dg/uninit-B.c : 12] if
(i.0D.1952_1 != 0)

The def_stmt of i.0 is precisely that one. There is no vdef there.

2) I use that test to return early if the def_stmt of "t" does not
reference memory. t is just a SSA_NAME (like i.0 above), I do not know
whether its def_stmt has a VUSE like the above or not. I guess the
test is redundant since SINGLE_SSA_USE_OPERAND will return NULL
anyway. Is that what yo mean?

Cheers,

Manuel.


Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
Let's try to focus on what needs to be done looking for specific
features (or fixes) and how we could do it:

A) Printing the input expression instead of re-constructing it. As
   Joseph explained, this will fix the problems that Aldy mentioned
   (PR3544[123] and PR35742) and this requires:

  1) For non-preprocessed expr we need at least two locations per expr
 (beg/end). This will require changes on the build_* functions to
 handle multiple locations.

  1b) For each preprocessed token, we would need to keep two locations:
  one for the preprocessed location and another for the original
  location. As Joseph pointed out, ideally we should be able to
  find a way to track this with a single location_t object so we do
  not need 4 locations per expr.

  2) Changes in the parser to pass down the correct locations to the
 build_* functions.

  3) A location(s) -> source strings interface and machinery. Ideally,
 this should be more or less independent of CPP, so CPP (through
 the diagnostics machinery) calls into this when needed and not
 the other way around. This can be implemented in several ways:

 a) Keeping the CPP buffers in memory and having in line-maps
pointers directly into the buffers contents. This is easy and
fast but potentially memory consuming. Care to handle
charsets, tabs, etc must be taken into account. Factoring out
anything useful from libcpp would help to implement this.

 b) Re-open the file and fseek. This is not trivial since we need
to do it fast but still do all character conversions that we
did when libcpp opened it the first time. This is
approximately what Clang (LLVM) does and it seems they can do
it very fast by keeping a cache of buffers ever reopened. I
think that thanks to our line-maps implementation, we can do
the seeking quite more efficiently in terms of computation
time.  However, opening files is quite embedded into CPP, so
that would need to be factored out so we can avoid any
unnecessary CPP stuff when reopening but still do it
*properly* and *efficiently*.

  4) Changes in the diagnostics machinery to extract locations from
 expr and print a string from a
 source file instead of re-constructing things.

  5) Handle locations during folding or avoid aggressive folding in
 the front-ends.

  6) Handle locations during optimisation or update middle-end
 diagnostics to not rely in perfect location information. This
 probably means not using %qE, not column info, and similar
 limitations. Some trade-off must be investigated.


B) Printing accurate column information. This requires:

   *) Preprocessed/original locations in a single location_t. Similar
  as (A.1b) above.

   *) Changes in the parser to pass down the correct
  locations to diagnostics machinery. Similar to (A.2) above.

   B.1) Changes in the testsuite to enable testing column numbers.


C) Consistent diagnostics. This requires:

   C.1) Make CPP use the diagnostics machinery. This will fix part of
PR7263 and other similar bugs where there is a mismatch
between the diagnostics machinery and CPP's own diagnostics
machinery.

   *) Preprocessed/original locations in a single location_t.  This
  will avoid different behaviour when a token comes from a macro
  expansion. Similar as (A.1b) above.


D) Printing Ranges. This requires:

   *) Printing accurate column information. Similar to (B) above.

   *) A location(s) -> source strings interface and machinery. Similar
  to (A.3) above.

   *) Changes in the parser to pass down ranges. Similar to (A.2) above.

   D.1) Changes in the testsuite to enable testing ranges.

   D.2) Changes in the diagnostics machinery to handle ranges.


E) Caret diagnostics. This requires:

   *) Printing accurate column information. Similar to (B) above.

   *) A location(s) -> source strings interface and machinery. Similar
  to (A.3) above.

   E.1) Changes in the diagnostics machinery to print the source line
and a caret.

I have copied this in the wiki so anyone can update it or add
comments: http://gcc.gnu.org/wiki/Better_Diagnostics

I have some patches to make the diagnostic functions take explicit
locations and I hope to send them soon. Apart from those, I personally
don't have any specific plans to address any of the points above in
the near future because of lack of free time and I still have a long
queue of some trivial patches that I would like to get rid of before
we enter in regression-only mode.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Ian Lance Taylor
"Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:

> Let's try to focus on what needs to be done looking for specific
> features (or fixes) and how we could do it:

Thanks for writing this up.

> A) Printing the input expression instead of re-constructing it. As
>Joseph explained, this will fix the problems that Aldy mentioned
>(PR3544[123] and PR35742) and this requires:
>
>   1) For non-preprocessed expr we need at least two locations per expr
>  (beg/end). This will require changes on the build_* functions to
>  handle multiple locations.

This is probably obvious, but can you outline why we need two
locations for each expression?  The tools with which I am familiar
only print a single caret.  What would use the two locations for?


>  b) Re-open the file and fseek. This is not trivial since we need
> to do it fast but still do all character conversions that we
> did when libcpp opened it the first time. This is
> approximately what Clang (LLVM) does and it seems they can do
> it very fast by keeping a cache of buffers ever reopened. I
> think that thanks to our line-maps implementation, we can do
> the seeking quite more efficiently in terms of computation
> time.  However, opening files is quite embedded into CPP, so
> that would need to be factored out so we can avoid any
> unnecessary CPP stuff when reopening but still do it
> *properly* and *efficiently*.

If we are going to reopen the file, then why do we need to record the
locations in the preprocessed token stream?

If we keep, for each source line, the file offset in the file of the
start of that source line, then I think that printing the line from
the source file would be pretty fast.  That would not be free but it
would be much cheaper than keeping the entire input file.  Various
optimizations are possible--e.g., keep the file offset for every 16th
line. Conversely, perhaps we could record the line number and the
charset conversion state at each 4096th byte in the file; that would
let us quickly find the page in the file which contains the line.

Ian


gcc@gcc.gnu.org

2008-08-15 Thread Daniel Berlin
On Fri, Aug 15, 2008 at 8:06 AM, Manuel López-Ibáñez
<[EMAIL PROTECTED]> wrote:
> 2008/8/14 Daniel Berlin <[EMAIL PROTECTED]>:
>> 1. You can't assume VUSE's are must-aliases.  The fact that there is a
>> vuse for something does not imply it is must-used, it implies it is
>> may-used.
>>
>> We do not differentiate may-use from must-use in our alias system. You
>> can do some trivial must-use analysis if you like (by computing
>> cardinality of points-to set as either single or multiple and
>> propagating/meeting it in the right place).
>>
>> Must-use is actually quite rare.
>
> Then, is it impossible to distinguish the following testcase and the
> one from my previous mail with the current infrastructure?

If by current you mean "code that already exists", then yes :)
You could write code to do further analysis, but with the existing
code, it will not work.

>
>> 2. " if (!gimple_references_memory_p (def))
>> +   return;
>> +"
>> Is nonsensical the SSA_NAME_DEF_STMT of a vuse must contain a vdef,
>> and thus must access memory.
>
> Two things here.
>
> 1) The case I am trying to war about is:
>
>  # BLOCK 2 freq:1
>  # PRED: ENTRY [100.0%]  (fallthru,exec)
>  [/home/manuel/src/trunk/gcc/testsuite/gcc.dg/uninit-B.c : 12] # VUSE
>  { iD.1951 }
>  i.0D.1952_1 = iD.1951;
>  [/home/manuel/src/trunk/gcc/testsuite/gcc.dg/uninit-B.c : 12] if
> (i.0D.1952_1 != 0)
>
> The def_stmt of i.0 is precisely that one. There is no vdef there.

Sure but this is a default def, which are special, and do nothing anyway.


>
> 2) I use that test to return early if the def_stmt of "t" does not
> reference memory. t is just a SSA_NAME (like i.0 above), I do not know
> whether its def_stmt has a VUSE like the above or not. I guess the
> test is redundant since SINGLE_SSA_USE_OPERAND will return NULL
> anyway. Is that what yo mean?
>
No, i mean any SSA_NAME_DEF_STMT for a vuse that is not a default_def
will reference memory.


gcc@gcc.gnu.org

2008-08-15 Thread Daniel Berlin
On Fri, Aug 15, 2008 at 10:58 AM, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> On Fri, Aug 15, 2008 at 8:06 AM, Manuel López-Ibáñez
> <[EMAIL PROTECTED]> wrote:
>> 2008/8/14 Daniel Berlin <[EMAIL PROTECTED]>:
>>> 1. You can't assume VUSE's are must-aliases.  The fact that there is a
>>> vuse for something does not imply it is must-used, it implies it is
>>> may-used.
>>>
>>> We do not differentiate may-use from must-use in our alias system. You
>>> can do some trivial must-use analysis if you like (by computing
>>> cardinality of points-to set as either single or multiple and
>>> propagating/meeting it in the right place).
>>>
>>> Must-use is actually quite rare.
>>
>> Then, is it impossible to distinguish the following testcase and the
>> one from my previous mail with the current infrastructure?
>
> If by current you mean "code that already exists", then yes :)
> You could write code to do further analysis, but with the existing
> code, it will not work.

FWIW, it is actually worse than the cases you have posited so far.

Consider the following simple case (which are different from yours in
that the conditionals are not dependent on maybe uninitialized
variables), where you will miss an obvious warning.

extern int foo(int *);
extern int bar(int);
int main(int argc, char **argv)
{
  int a;

  if (argc)
 foo (&a)
/* VUSE of a will be a phi node, but it still may be used uninitialized.  */
  bar(a);
}


Realistically, to get good results, you would have to track may-use vs
must-use and also propagate where the default def is being used when
the default_def is not from a parameter.

(noticing that the a is a must-use there and comes from a phi node
whose arguments contain the default def would prove it is
uninitialized along some path)
--Dan


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Ian Lance Taylor <[EMAIL PROTECTED]>:
> "Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:
>
>> A) Printing the input expression instead of re-constructing it. As
>>Joseph explained, this will fix the problems that Aldy mentioned
>>(PR3544[123] and PR35742) and this requires:
>>
>>   1) For non-preprocessed expr we need at least two locations per expr
>>  (beg/end). This will require changes on the build_* functions to
>>  handle multiple locations.
>
> This is probably obvious, but can you outline why we need two
> locations for each expression?  The tools with which I am familiar
> only print a single caret.  What would use the two locations for?

This has nothing to do with caret diagnostics. This is an orthogonal
issue that would share some infrastructure as Joseph explained. If you
do

warning("called object %qE is not a function", expr);

for

({break;})();

we currently try to re-construct expr and that fails in some cases
(see the PRs referenced).

#'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo':
bug.c:4: error: called object  is not a function

The alternative is to print whatever we parsed when building expr. To
do that we would need to have begin/end locations for expr, and then
do a location_t->const char * translation and print whatever is
between those two pointers:

bug.c:4: error: called object '({break;})' is not a function


Is it clear now? If so, I will update the wiki to put this example.

>>  b) Re-open the file and fseek. This is not trivial since we need
>> to do it fast but still do all character conversions that we
>> did when libcpp opened it the first time. This is
>> approximately what Clang (LLVM) does and it seems they can do
>> it very fast by keeping a cache of buffers ever reopened. I
>> think that thanks to our line-maps implementation, we can do
>> the seeking quite more efficiently in terms of computation
>> time.  However, opening files is quite embedded into CPP, so
>> that would need to be factored out so we can avoid any
>> unnecessary CPP stuff when reopening but still do it
>> *properly* and *efficiently*.
>
> If we are going to reopen the file, then why do we need to record the
> locations in the preprocessed token stream?

Because for some diagnostics we want to give the warnings in the
instantiation point not in the macro definition point. Moreover, this
is what we currently do, so if we don't want to change the current
behaviour, we need to track both locations.

Example

/*header.h*/
#pragma GCC system_header
#define BIG  0x1b27da572ef3cd86ULL

/* file.c */
#include "pr7263.h"
__extension__ unsigned long long
bar ()
{
  return BIG;
}

We print a diagnostic at file.c for the expansion of BIG. However,
since we do not have the original location we cannot check that the
token comes from a system header, and we do not suppress the warning.
There are more subtle bugs that arise from not having the original
location available. See PR36478.

BTW, Clang takes into account both locations when printing diagnostics.

> If we keep, for each source line, the file offset in the file of the
> start of that source line, then I think that printing the line from
> the source file would be pretty fast.  That would not be free but it
> would be much cheaper than keeping the entire input file.  Various

Cheaper in terms of memory. It cannot be cheaper in terms of
compilation time than a direct pointer to the already opened buffer
for each line-map.

> optimizations are possible--e.g., keep the file offset for every 16th
> line. Conversely, perhaps we could record the line number and the
> charset conversion state at each 4096th byte in the file; that would
> let us quickly find the page in the file which contains the line.

I am not sure how you plan such approach to interact with
mapped-locations. I think that having an offset for each line-map and
then seeking until you find the correct position would be fine for an
initial implementation. More complex setups could be tested against
this basic implementation. And any optimization done here could be
done with the buffer already opened, so yes, cheaper in terms of
memory maybe but not cheaper in terms of compilation time. If this is
abstracted enough both approaches could perhaps coexist and share the
optimizations: while the front-end is working (where most of the
diagnostics come from) keep the buffers around, when going into
middle-end free them and if we need to give a diagnostic from the
middle-end, reopen and seek. But all this relies on someone factoring
file opening and charset conversion out of CCP first. Once that is
done, we could do whatever strategy or both or something else.

Cheers,

Manuel.


gcc@gcc.gnu.org

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Daniel Berlin <[EMAIL PROTECTED]>:
> On Fri, Aug 15, 2008 at 10:58 AM, Daniel Berlin <[EMAIL PROTECTED]> wrote:
>> On Fri, Aug 15, 2008 at 8:06 AM, Manuel López-Ibáñez
>> <[EMAIL PROTECTED]> wrote:
>>> 2008/8/14 Daniel Berlin <[EMAIL PROTECTED]>:
 1. You can't assume VUSE's are must-aliases.  The fact that there is a
 vuse for something does not imply it is must-used, it implies it is
 may-used.

 We do not differentiate may-use from must-use in our alias system. You
 can do some trivial must-use analysis if you like (by computing
 cardinality of points-to set as either single or multiple and
 propagating/meeting it in the right place).

 Must-use is actually quite rare.
>>>
>>> Then, is it impossible to distinguish the following testcase and the
>>> one from my previous mail with the current infrastructure?
>>
>> If by current you mean "code that already exists", then yes :)
>> You could write code to do further analysis, but with the existing
>> code, it will not work.
>
> FWIW, it is actually worse than the cases you have posited so far.
>
> Consider the following simple case (which are different from yours in
> that the conditionals are not dependent on maybe uninitialized
> variables), where you will miss an obvious warning.
>
> extern int foo(int *);
> extern int bar(int);
> int main(int argc, char **argv)
> {
>  int a;
>
>  if (argc)
> foo (&a)
> /* VUSE of a will be a phi node, but it still may be used uninitialized.  */
>  bar(a);
> }
>
>
> Realistically, to get good results, you would have to track may-use vs
> must-use and also propagate where the default def is being used when
> the default_def is not from a parameter.
>

The problem in the original testcase is that the default def of
variable 'c' is a VUSE in a statement that does not even use c.

 # BLOCK 2 freq:1
 # PRED: ENTRY [100.0%]  (fallthru,exec)
 [/home/manuel/src/trunk/gcc/builtins.c : 11095] # VUSE
 { cD.68618 }
 D.68627_3 = validate_argD.45737 (s1D.68612_2(D), 10);

Moreover, if you check fold_builtin_strchr in builtins.c, it is clear
that there is no path along which c is used uninitialized.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Chris Lattner

D) Printing Ranges. This requires:

  *) Printing accurate column information. Similar to (B) above.

  *) A location(s) -> source strings interface and machinery. Similar
 to (A.3) above.


Ranges also require some way to get the end of a token (in addition to  
its beginning).  For example, a range for:


X + some_long\
_ident??/
ifier

The range should start at "X" and end at "r".  This is just a location  
like any other, but requires passing down like the begin loc.  You  
might instead decide to do some fuzzy matching or something, but clang  
at least gets this right.  This is important for other clients of the  
loc info, e.g. refactoring clients.


-Chris


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Chris Lattner <[EMAIL PROTECTED]>:
>> D) Printing Ranges. This requires:
>>
>>  *) Printing accurate column information. Similar to (B) above.
>>
>>  *) A location(s) -> source strings interface and machinery. Similar
>> to (A.3) above.
>
> Ranges also require some way to get the end of a token (in addition to its
> beginning).  For example, a range for:
>
> X + some_long\
> _ident??/
> ifier
>
> The range should start at "X" and end at "r".  This is just a location like
> any other, but requires passing down like the begin loc.  You might instead
> decide to do some fuzzy matching or something, but clang at least gets this
> right.  This is important for other clients of the loc info, e.g.
> refactoring clients.

Oh yes. Well, there is a lot of fine-tunning to do but I think that
would be covered by A.1 and the binary_op expression would have at
least two locations begin/end pointing to X and r. If we are able to
print ({break;}), in the example I gave earlier, then we will be able
to print nice ranges most of the time.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Joseph S. Myers
On Fri, 15 Aug 2008, Manuel López-Ibáñez wrote:

>   5) Handle locations during folding or avoid aggressive folding in
>  the front-ends.

I plan to delay folding for C (beyond some folding for expressions of 
constants) for 4.5, probably in October.  (I do not plan to do anything 
for C++, and the folding will still happen before gimplification, though I 
believe that in principle most of what fold-const does would be better 
done on language-independent IR after gimplification.)

Note that for correct ranges you need to handle locations for 0 and (0) 
and ((0)) and so on, so either need duplicate trees for constants and 
parenthesised expressions or need to keep the locations outside the trees.  
It's possible you can keep them in the c_expr structures or other local 
variables, depending on whether you ever need locations for both an 
expression and deeply nested subexpressions of it.

-- 
Joseph S. Myers
[EMAIL PROTECTED]

gcc@gcc.gnu.org

2008-08-15 Thread Daniel Berlin
On Fri, Aug 15, 2008 at 11:31 AM, Manuel López-Ibáñez
<[EMAIL PROTECTED]> wrote:
> 2008/8/15 Daniel Berlin <[EMAIL PROTECTED]>:
>> On Fri, Aug 15, 2008 at 10:58 AM, Daniel Berlin <[EMAIL PROTECTED]> wrote:
>>> On Fri, Aug 15, 2008 at 8:06 AM, Manuel López-Ibáñez
>>> <[EMAIL PROTECTED]> wrote:
 2008/8/14 Daniel Berlin <[EMAIL PROTECTED]>:
> 1. You can't assume VUSE's are must-aliases.  The fact that there is a
> vuse for something does not imply it is must-used, it implies it is
> may-used.
>
> We do not differentiate may-use from must-use in our alias system. You
> can do some trivial must-use analysis if you like (by computing
> cardinality of points-to set as either single or multiple and
> propagating/meeting it in the right place).
>
> Must-use is actually quite rare.

 Then, is it impossible to distinguish the following testcase and the
 one from my previous mail with the current infrastructure?
>>>
>>> If by current you mean "code that already exists", then yes :)
>>> You could write code to do further analysis, but with the existing
>>> code, it will not work.
>>
>> FWIW, it is actually worse than the cases you have posited so far.
>>
>> Consider the following simple case (which are different from yours in
>> that the conditionals are not dependent on maybe uninitialized
>> variables), where you will miss an obvious warning.
>>
>> extern int foo(int *);
>> extern int bar(int);
>> int main(int argc, char **argv)
>> {
>>  int a;
>>
>>  if (argc)
>> foo (&a)
>> /* VUSE of a will be a phi node, but it still may be used uninitialized.  */
>>  bar(a);
>> }
>>
>>
>> Realistically, to get good results, you would have to track may-use vs
>> must-use and also propagate where the default def is being used when
>> the default_def is not from a parameter.
>>
>
> The problem in the original testcase is that the default def of
> variable 'c' is a VUSE in a statement that does not even use c.
>
It may-uses c, as we've been through.

>  # BLOCK 2 freq:1
>  # PRED: ENTRY [100.0%]  (fallthru,exec)
>  [/home/manuel/src/trunk/gcc/builtins.c : 11095] # VUSE
>  { cD.68618 }
>  D.68627_3 = validate_argD.45737 (s1D.68612_2(D), 10);
>
> Moreover, if you check fold_builtin_strchr in builtins.c, it is clear
> that there is no path along which c is used uninitialized.

This is not a default def.

cD.68618_34(D) is the default def.
if you look at default_def (c) it will be a NOP_EXPR statement.


prevent optimisation of variables on SSA

2008-08-15 Thread Martin Schindewolf

Hi all,

I currently introduce a temporary variable on SSA form but it
does not survive the SSA optimisation passes. (Not even the
simple ones triggered with -O1). Since the temporary variable
is considered to be a gimple register it is not possible to
set a VUSE or VDEF for it. (I tried and it breaks the SSA
properties). Are there any mechanisms to work around this
restriction and set the VDEF/VUSE anyway or is there a way to
tell the SSA optimisers not to touch this variable (or
SSA_NAME)?
Thank you very much for sharing your thoughts!

Regards,
Martin


Re: prevent optimisation of variables on SSA

2008-08-15 Thread Richard Guenther
On Fri, Aug 15, 2008 at 7:00 PM, Martin Schindewolf <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I currently introduce a temporary variable on SSA form but it
> does not survive the SSA optimisation passes. (Not even the
> simple ones triggered with -O1). Since the temporary variable
> is considered to be a gimple register it is not possible to
> set a VUSE or VDEF for it. (I tried and it breaks the SSA
> properties). Are there any mechanisms to work around this
> restriction and set the VDEF/VUSE anyway or is there a way to
> tell the SSA optimisers not to touch this variable (or
> SSA_NAME)?
> Thank you very much for sharing your thoughts!

You need to better explain what you are trying to do.

Richard.


Re: Bootstrap broken on x86_64-linux

2008-08-15 Thread Thomas Koenig
On Thu, 2008-08-14 at 14:41 -0700, H.J. Lu wrote:

> It should be fixed now.

Thanks a lot for the quick fix.

My problem is that I don't have access to a machine with GFC_REAL_16 and
working autoconf2.59, so possible problems in cut&paste code tend to be
hidden from me.

I'll make special mention of this the next time I submit a patch
containing #ifdef HAVE_GFC_REAL_16, to make sure that somebody tests
this before commit.

Thomas




Re: Better GCC diagnostics

2008-08-15 Thread Ian Lance Taylor
"Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:

> 2008/8/15 Ian Lance Taylor <[EMAIL PROTECTED]>:
>> "Manuel López-Ibáñez" <[EMAIL PROTECTED]> writes:
>>
>>> A) Printing the input expression instead of re-constructing it. As
>>>Joseph explained, this will fix the problems that Aldy mentioned
>>>(PR3544[123] and PR35742) and this requires:
>>>
>>>   1) For non-preprocessed expr we need at least two locations per expr
>>>  (beg/end). This will require changes on the build_* functions to
>>>  handle multiple locations.
>>
>> This is probably obvious, but can you outline why we need two
>> locations for each expression?  The tools with which I am familiar
>> only print a single caret.  What would use the two locations for?
>
> This has nothing to do with caret diagnostics. This is an orthogonal
> issue that would share some infrastructure as Joseph explained. If you
> do
>
> warning("called object %qE is not a function", expr);
>
> for
>
> ({break;})();
>
> we currently try to re-construct expr and that fails in some cases
> (see the PRs referenced).
>
> #'goto_expr' not supported by pp_c_expression#'bug.c: In function 'foo':
> bug.c:4: error: called object  is not a function
>
> The alternative is to print whatever we parsed when building expr. To
> do that we would need to have begin/end locations for expr, and then
> do a location_t->const char * translation and print whatever is
> between those two pointers:
>
> bug.c:4: error: called object '({break;})' is not a function
> 
>
> Is it clear now? If so, I will update the wiki to put this example.

That is clear.  Thanks.  I personally would be perfectly happy if the
compiler said
bug.c:4.COLUMN: error: called object is not a function
That is, fixing the compiler to includes parts of the source code in
the error message itself is, for me, of considerably lower priority
than fixing the compiler to generate good column numbers.



>>>  b) Re-open the file and fseek. This is not trivial since we need
>>> to do it fast but still do all character conversions that we
>>> did when libcpp opened it the first time. This is
>>> approximately what Clang (LLVM) does and it seems they can do
>>> it very fast by keeping a cache of buffers ever reopened. I
>>> think that thanks to our line-maps implementation, we can do
>>> the seeking quite more efficiently in terms of computation
>>> time.  However, opening files is quite embedded into CPP, so
>>> that would need to be factored out so we can avoid any
>>> unnecessary CPP stuff when reopening but still do it
>>> *properly* and *efficiently*.
>>
>> If we are going to reopen the file, then why do we need to record the
>> locations in the preprocessed token stream?
>
> Because for some diagnostics we want to give the warnings in the
> instantiation point not in the macro definition point. Moreover, this
> is what we currently do, so if we don't want to change the current
> behaviour, we need to track both locations.
>
> Example
>
> /*header.h*/
> #pragma GCC system_header
> #define BIG  0x1b27da572ef3cd86ULL
>
> /* file.c */
> #include "pr7263.h"
> __extension__ unsigned long long
> bar ()
> {
>   return BIG;
> }
>
> We print a diagnostic at file.c for the expansion of BIG. However,
> since we do not have the original location we cannot check that the
> token comes from a system header, and we do not suppress the warning.
> There are more subtle bugs that arise from not having the original
> location available. See PR36478.
>
> BTW, Clang takes into account both locations when printing diagnostics.

Perhaps I misunderstand what you mean by recording the location in the
preprocessed token stream.  You evidently do not mean getting column
numbers for the preprocessed code.  You mean that when a preprocessor
macro is expanded, we should record both the location where the macro
is used, and also some sort of reference to the macro so that we know
the location where the macro was defined.  Is that right?


>> If we keep, for each source line, the file offset in the file of the
>> start of that source line, then I think that printing the line from
>> the source file would be pretty fast.  That would not be free but it
>> would be much cheaper than keeping the entire input file.  Various
>
> Cheaper in terms of memory. It cannot be cheaper in terms of
> compilation time than a direct pointer to the already opened buffer
> for each line-map.

Except that we know that increased memory use leads to increased
compile time.  Diagnostic printing can't be slow, but it's also not on
the critical path.  Most code does not generate diagnostics.  So there
is a balance to be struck.

Ian


Re: Better GCC diagnostics

2008-08-15 Thread Manuel López-Ibáñez
2008/8/15 Ian Lance Taylor <[EMAIL PROTECTED]>:
>>
>> BTW, Clang takes into account both locations when printing diagnostics.
>
> Perhaps I misunderstand what you mean by recording the location in the
> preprocessed token stream.  You evidently do not mean getting column
> numbers for the preprocessed code.  You mean that when a preprocessor
> macro is expanded, we should record both the location where the macro
> is used, and also some sort of reference to the macro so that we know
> the location where the macro was defined.  Is that right?
>

I don't see the difference. We do currently assign to the preprocessed
code the location of the macro call. So we get column numbers to the
preprocessed code. In addition, we should record the location were the
macro was defined. In the previous example, we should be able to
obtain somehow two locations for 0x1b27da572ef3cd86ULL, one for its
position in header.h and another for its final position in file.c.
Whether we record two locations explicitly or we keep some lookup
table of macro expansions/definitions would depend on what is found to
be the most efficient implementation.

I don't know which approach Clang follows but in our case it would be
better to use the mapped-location infrastructure to track this for us
in a single source_location number.

Cheers,

Manuel.


Re: Better GCC diagnostics

2008-08-15 Thread Joseph S. Myers
On Fri, 15 Aug 2008, Ian Lance Taylor wrote:

> Perhaps I misunderstand what you mean by recording the location in the
> preprocessed token stream.  You evidently do not mean getting column
> numbers for the preprocessed code.  You mean that when a preprocessor
> macro is expanded, we should record both the location where the macro
> is used, and also some sort of reference to the macro so that we know
> the location where the macro was defined.  Is that right?

My use case for locations in preprocessed code is tracking down 
diagnostics arising in the expansions of complicated macros.  I find 
that's the main case where GCC's existing diagnostics are unhelpful; in 
that case I would like an option to show the relevant fragment of 
preprocessed source that causes the diagnostic.  There are also easy 
heuristics to decide whether the preprocessed source is likely to be more 
useful than the non-preprocessed source.

Indexes into a table of tokens would be one possibility; C++ already 
creates such a table in the course of lexing the whole input up front.

-- 
Joseph S. Myers
[EMAIL PROTECTED]


gcc-4.4-20080815 is now available

2008-08-15 Thread gccadmin
Snapshot gcc-4.4-20080815 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20080815/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 139138

You'll find:

gcc-4.4-20080815.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20080815.tar.bz2 C front end and core compiler

gcc-ada-4.4-20080815.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20080815.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20080815.tar.bz2  C++ front end and runtime

gcc-java-4.4-20080815.tar.bz2 Java front end and runtime

gcc-objc-4.4-20080815.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20080815.tar.bz2The GCC testsuite

Diffs from 4.4-20080808 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: broken FE diagnostics wrt complex expressions

2008-08-15 Thread Gabriel Dos Reis
On Wed, Aug 13, 2008 at 12:52 PM, Aldy Hernandez <[EMAIL PROTECTED]> wrote:
>
> It seems to me that the only approach here would be to provide caret
> diagnostics, because reconstructing the original sources from GENERIC
> seems like a loosing proposition.

Hi Aldy,

   I agree with your analysis.

>
> But, is this slew of work even worth it?

For the long term benefit, yes.

> I for one think that the
> aforementioned PRs should at least be marked down considerably.  3 of
> them are P2s-- and I think they should be some Pn+5, and/or mark them as
> an enhancement request.
>
> Are there any thoughts on this (the PRs, the caret diagnostics, plan of
> attack, etc?).

I'm kind of busy at the moment, and had put off work on caret diagnostics.
I've seen mail suggesting that some folks are working on it...

>
> Aldy
>


Re: broken FE diagnostics wrt complex expressions

2008-08-15 Thread Gabriel Dos Reis
On Wed, Aug 13, 2008 at 2:16 PM, Joseph S. Myers
<[EMAIL PROTECTED]> wrote:
> On Wed, 13 Aug 2008, Aldy Hernandez wrote:
>
> I think it would certainly be reasonable to print  for
> anything unsupported instead of broken diagnostics, and to reclassify all
> such bugs as wishlist requests for certain complex expressions to be
> better supported in diagnostics.

That makes sense to me.

-- Gaby


Re: [PATCH] caret diagnostics (was: broken FE diagnostics wrt complex expressions)

2008-08-15 Thread Gabriel Dos Reis
On Thu, Aug 14, 2008 at 7:39 AM, Joseph S. Myers
<[EMAIL PROTECTED]> wrote:
> On Thu, 14 Aug 2008, Manuel López-Ibáñez wrote:
>
> I don't think the option should necessarily just be boolean; once choice
> that may make sense would be caret diagnostics for the first diagnostic
> from an input file only, to avoid blowing up the output size when one
> mistake causes a cascade of diagnostics.  (This is a matter of designing
> the option as e.g. -fdiagnostics-show-caret={no,yes,first} rather than as
> -f/fno-, not a matter of needing such a feature implemented in the first
> version going on trunk.)

Yes, -diagnostics-show-caret= is more flexible.

>
> --
> Joseph S. Myers
> [EMAIL PROTECTED]


Re: [PATCH] caret diagnostics (was: broken FE diagnostics wrt complex expressions)

2008-08-15 Thread Gabriel Dos Reis
On Thu, Aug 14, 2008 at 12:14 PM, Aldy Hernandez <[EMAIL PROTECTED]> wrote:
>> * In the near future, make -fdiagnostics-show-caret the default at
>> least while in experimental mode or at least during stages1 and 2.
>> When making a release -fno-diagnostics-show-caret would be the
>> default. Do this through a configure option that sets the default.
>>
>> * In the far away future, review the defaults and get rid of the
>> configure option.
>
> To be honest, I don't mind either way.  Either your option or Joseph's
> is fine.  I am however interested in getting the caret diagnostics in,
> and then I can work on location information that will benefit everyone.

I'm in favor of getting -fdiagnostics-show-caret=no by default in this release,
and enable people like you to get useful stuff done.  That gives us time
to iron out outstanding bugs for the next release (and making it the default).

-- Gaby



>
> Aldy
>


Re: [PATCH] caret diagnostics

2008-08-15 Thread Gabriel Dos Reis
On Thu, Aug 14, 2008 at 11:52 AM, Tom Tromey <[EMAIL PROTECTED]> wrote:
>> "Joseph" == Joseph S Myers <[EMAIL PROTECTED]> writes:

> I'd like to see carets on by default as part of a major release -- say
> GCC 5.0.  (First mention!!)

100% agreed.

-- Gaby