Tags out of gcc

2014-10-04 Thread Adrian May
Hi All,

I have this brainstorm which I'd like to get some feedback on.

I reckon it's a bad idea to make source browsing info with a separate
program like cscope or etags. I reckon it's the compiler's job.

Why? (1) Because only the compiler can do it authoritatively, after
all, it decides what's in your executable. (2) Doing it properly
involves following #ifdefs and #includes, which the compiler already
does. (3) Because the types of tag you want are defined by the
language. (4) Because there's very little to it after parsing, which
the compiler is already doing.

I imagine doing it for c++ and outputting cscope format which is
reasonably expressive and popular.

I have no idea how hard it would be, but if I can bug people for help
I'd be willing to give it a shot.

Is there any interest in something like that?

Adrian.

PS: I'm not talking about local text completion here. I"m talking
about finding your way around a huge project where cscope etc give 50
different definitions of any important function because they can't
follow the #ifdefs.


Re: Tags out of gcc

2014-10-04 Thread Richard Kenner
> I reckon it's a bad idea to make source browsing info with a separate
> program like cscope or etags. I reckon it's the compiler's job.

One of the issues with soure browsing is that you want to be able to do
it in the presence of syntax errors.  That can make it harder for the
compiler to do it since it's usually not doing a robust parse in the
presense of errors.


Re: Tags out of gcc

2014-10-04 Thread Adrian May
Well it seems to be able to report a lot of syntax errors even if
they're close together, so it must be getting back on its feet fairly
quickly. I don't know how that works. Maybe it just scoots along to
the next semicolon or maybe you explicitly have productions like "if
(syntax error) { ... }".

What I also don't know is what the parser outputs if there's an error.
Can it say "he tried to define bool foo() at line 123 but the body was
erroneous", or does it just stdout the error message and forget there
was ever an attempt to define foo?


On 4 October 2014 18:17, Richard Kenner  wrote:
>> I reckon it's a bad idea to make source browsing info with a separate
>> program like cscope or etags. I reckon it's the compiler's job.
>
> One of the issues with soure browsing is that you want to be able to do
> it in the presence of syntax errors.  That can make it harder for the
> compiler to do it since it's usually not doing a robust parse in the
> presense of errors.


Re: Tags out of gcc

2014-10-04 Thread Richard Kenner
> Well it seems to be able to report a lot of syntax errors even if
> they're close together, so it must be getting back on its feet fairly
> quickly. I don't know how that works. Maybe it just scoots along to
> the next semicolon or maybe you explicitly have productions like "if
> (syntax error) { ... }".
> 
> What I also don't know is what the parser outputs if there's an error.
> Can it say "he tried to define bool foo() at line 123 but the body was
> erroneous", or does it just stdout the error message and forget there
> was ever an attempt to define foo?

You're missing my point by getting too deep into details.  I'm making a
more general point, which is that a parser of a compiler and a source
browser have two different purposes.

The purpose of the former is primarily to produce a parse tree of a correct
program and secondarily to produce as many error messages as possible for
an incorrect program.  The purpose of the latter is to try to figure out as
much as it can about the semantic meaning of what may be program fragments
and be completely uncaring about the presence or absence of errors.

Although there is indeed significant commonality between these two
purposes, there are very significant difference as well.  For example, a
compiler usually won't look at things such as indentation and whitespace at
all (except maybe when deciding what message to give for errors, but I
think only the Ada front end does this), but high-quality source file
browser would rely more on indentation than the exact parse because the
indentation of a program in the process of being written will usually be
more likely to be able to identify semantic constructs than a parse based
on the tokens in the file.


RE: Tags out of gcc

2014-10-04 Thread Manuel López-Ibáñez
> I imagine doing it for c++ and outputting cscope format which is
> reasonably expressive and popular.
>
> I have no idea how hard it would be, but if I can bug people for help
> I'd be willing to give it a shot.

There are two ways to do this with GCC. One is trivial and one is
hard, but the hard one will likely give better results than the
trivial one.

The trivial one is that you build a plugin
(https://gcc.gnu.org/onlinedocs/gccint/Plugins.html) and hook it at
PLUGIN_FINISH_DECL (and perhaps also at PLUGIN_FINISH_TYPE, not sure
about that). You can then run the plugin in the same command that
compiles your code.

However, this approach has some limitations. It will not handle
preprocessor macros. You'll need to add new plugin hooks to GCC (which
I think would be welcome). And it may not work well in the presence of
compilation errors. It will also be slower than it really needs to be
(although perhaps faster than etags? The GCC parser is very
optimized...).

The hard approach is that you contribute to the effort to make GCC
more modular so that you can call the functions in the C++ parser that
you really need, while ignoring the rest of the compiler. Then, you
will be able to build a stand-alone program that does what you want
without requiring a complete gcc. The way to do this is to join the
GCC project, create a branch and try to build a prototype that doesn't
break the compiler and allows you to achieve what you want. Then,
propose to merge your changes to the main development branch.

I would suggest to start with the trivial approach, get used to GCC
development, then think about what it would take to do the hard
approach.

Cheers,

Manuel.


Re: Tags out of gcc

2014-10-04 Thread Jonathan Wakely
On 4 October 2014 15:47, Manuel López-Ibáñez wrote:
> The trivial one is that you build a plugin
> (https://gcc.gnu.org/onlinedocs/gccint/Plugins.html) and hook it at
> PLUGIN_FINISH_DECL (and perhaps also at PLUGIN_FINISH_TYPE, not sure
> about that). You can then run the plugin in the same command that
> compiles your code.

Does that hook get called for uninstantiated templates?

A source browser should index C++ templates where they are defined,
not only if they are used.


Re: Tags out of gcc

2014-10-04 Thread Manuel López-Ibáñez
On 4 October 2014 21:07, Jonathan Wakely  wrote:
> On 4 October 2014 15:47, Manuel López-Ibáñez wrote:
>> The trivial one is that you build a plugin
>> (https://gcc.gnu.org/onlinedocs/gccint/Plugins.html) and hook it at
>> PLUGIN_FINISH_DECL (and perhaps also at PLUGIN_FINISH_TYPE, not sure
>> about that). You can then run the plugin in the same command that
>> compiles your code.
>
> Does that hook get called for uninstantiated templates?
>
> A source browser should index C++ templates where they are defined,
> not only if they are used.

Maybe not, but it is a matter of adding more hooks, which seems easy enough.

In any case, the advice is the same: Join us and together we'll
conquer the galaxy, ah no, that's not it. It is: join GCC development
and propose changes that enable what you want to do.

Nonetheless, if I wanted to try this idea, I would start with the
hooks that are there already, thus I wouldn't even need to modify GCC.
Then, I would test whether the result is fast enough, and then think
about adding more hooks to catch anything that is missing.

Cheers,

Manuel.


Re: Tags out of gcc

2014-10-04 Thread Adrian May
At first sight, I prefer the hooks approach. Not just cos I'm a noob
(although that is a compelling reason in itself) but also because it
happens during the main compile. A separate innovation could have
different flags so it wouldn't be authoritative anymore.

But it absolutely has to follow the preprocessor, so how do I do that?
I'm a bit surprised about that being a problem cos when I look at
preprocessor output it looks very convenient - I get one big file but
it's full of clues as to where it all came from. Perhaps I have to
hook those clues.

Adrian