weak_global_object_name useless?

2011-06-29 Thread Gabriel Charette
Adding gcc@gcc.gnu.org

On Wed, Jun 29, 2011 at 6:08 PM, Gabriel Charette  wrote:
>
> What's the purpose of weak_global_object_name? Defined in gcc/varasm.c
> grepping from the base of the source recursively I only find this:
> ./gcc/ChangeLog-1998: * varasm.c (assemble_start_function): Add 
> weak_global_object_name.
> ./gcc/output.h:extern const char *weak_global_object_name;
> ./gcc/ChangeLog-2000: weak_global_object_name here, as const char *.
> ./gcc/ChangeLog-2000: first_global_object_name or weak_global_object_name.  
> Clean up string
> ./gcc/ChangeLog-2000: * varasm.c (first_global_object_name, 
> weak_global_object_name):
> ./gcc/tree.c:      const char *name = weak_global_object_name;
> ./gcc/ChangeLog-2005: (weak_global_object_name): Likewise.
> ./gcc/varasm.c:extern GTY(()) const char *weak_global_object_name;
> ./gcc/varasm.c:const char *weak_global_object_name;
> ./gcc/varasm.c:   Set first_global_object_name and weak_global_object_name as 
> appropriate.  */
> ./gcc/varasm.c:    type = &weak_global_object_name;
> It seems like it's never actually set... some references to it are set... but 
> that seems like a very weird usage? And there is never any code that checks 
> whether `something == weak_global_object_name`...
> I'm tempted to try to remove it... shall we ask Jason?
> Gab


Re: weak_global_object_name useless?

2011-06-29 Thread Gabriel Charette
Well looking at notice_global_symbol:

The comments says that weak_global_object_name is set in it, but there
is nothing actually doing so...

The only line of code containing weak_global_object_name in
notice_global_symbol is:
type = &weak_global_object_name;

Gab

On Wed, Jun 29, 2011 at 6:26 PM, Andrew Pinski  wrote:
> On Wed, Jun 29, 2011 at 6:13 PM, Gabriel Charette  wrote:
>> Adding gcc@gcc.gnu.org
>>
>> On Wed, Jun 29, 2011 at 6:08 PM, Gabriel Charette  wrote:
>>>
>>> What's the purpose of weak_global_object_name? Defined in gcc/varasm.c
>>> grepping from the base of the source recursively I only find this:
>>> ./gcc/ChangeLog-1998: * varasm.c (assemble_start_function): Add 
>>> weak_global_object_name.
>>> ./gcc/output.h:extern const char *weak_global_object_name;
>>> ./gcc/ChangeLog-2000: weak_global_object_name here, as const char *.
>>> ./gcc/ChangeLog-2000: first_global_object_name or weak_global_object_name.  
>>> Clean up string
>>> ./gcc/ChangeLog-2000: * varasm.c (first_global_object_name, 
>>> weak_global_object_name):
>>> ./gcc/tree.c:      const char *name = weak_global_object_name;
>>> ./gcc/ChangeLog-2005: (weak_global_object_name): Likewise.
>>> ./gcc/varasm.c:extern GTY(()) const char *weak_global_object_name;
>>> ./gcc/varasm.c:const char *weak_global_object_name;
>>> ./gcc/varasm.c:   Set first_global_object_name and weak_global_object_name 
>>> as appropriate.  */
>>> ./gcc/varasm.c:    type = &weak_global_object_name;
>>> It seems like it's never actually set... some references to it are set... 
>>> but that seems like a very weird usage? And there is never any code that 
>>> checks whether `something == weak_global_object_name`...
>>> I'm tempted to try to remove it... shall we ask Jason?
>
> It is set in notice_global_symbol if I read the code correctly.
> *type is either  weak_global_object_name or first_global_object_name.
> If  first_global_object_name was set by this function, then
> weak_global_object_name would never be set.  Otherwise
> weak_global_object_name is set to the first time notice_global_symbol
> is called with a weak decl, or an one only decl, or when flag_shlib is
> true.
> get_file_function_name uses  weak_global_object_name if
> first_global_object_name was not set.
> The purpose is the name of the first weak global symbol to create a
> name which is uniq at link time.
> get_file_function_name uses it to create a semi uniq number.
>
> Thanks,
> Andrew Pinski
>


Re: weak_global_object_name useless?

2011-06-29 Thread Gabriel Charette
Aaah my bad!

Thanks,
Gab

On Wed, Jun 29, 2011 at 6:45 PM, Andrew Pinski  wrote:
> On Wed, Jun 29, 2011 at 6:44 PM, Gabriel Charette  wrote:
>> Well looking at notice_global_symbol:
>>
>> The comments says that weak_global_object_name is set in it, but there
>> is nothing actually doing so...
>>
>> The only line of code containing weak_global_object_name in
>> notice_global_symbol is:
>> type = &weak_global_object_name;
>>
>
>      *type = name;
> does the setting.
>
> Thanks,
> Andrew Pinski
>


[pph] cache replace next insert by

2011-07-08 Thread Gabriel Charette
Hey Diego,

as we just talked over the phone, here is the diff I had sitting in my
stash to ask the pph cache to replace the next cache insert by the
given pointer (while still reading what's in the stream).

To use it simply call pph_cache_replace_next_by(stream, your_new_pointer)
and immediately after call pph_in_(whatever you want to read in and
immediately replace the cache pointer value for).

Cheers,
Gab


pph_cache_replace_next_by.diff
Description: Binary data


Flags and pph...

2011-07-18 Thread Gabriel Charette
Hey guys,

so the asm diff we are seeing in c1builtin-integral is not something
we are not streaming out, or any other logic being wrong in the pph.

The problem is: we define a dg-options entry in the header file which
tells deja-gnu to add flags to the compilation (so far so good...)

The problem is: deja-gnu only sees those options when doing the pph
compile (as deja-gnu takes its options by reading the file compiled it
seems, not the full translation unit).

In our case compiling the pph with -ffinite-math-only -fno-math-errno.

When compiling the C file, those flags are no longer there (deja-gnu
doesn't pass them in as they are not present in the C file)

Furthermore, when doing the non-pph compile, those flags aren't passed either.

It turns out, if we either:
1) Do the pph compile without the flags, and then the pph and non-pph
compile: there is no asm diff
2) Do the pph compile with the flags, but also pass the flags when
doing the C file compile (with and without pph): there is also no asm
diff!

So the question is: how do we want to treat flags passed to the pph
compile? I don't think it makes sense to use these flags in the rest
of the compilation, but not doing so also results in the flag only
being use for half of the compilation...
Maybe we should save the flags used as part of the pph compile and
invalidate the pph file if we don't get the same set of flags when
doing the real compilation (as we do, i think, with the
identifier(macro) context present when compiling a given pph file)??

Gab


Re: Flags and pph...

2011-07-18 Thread Gabriel Charette
To demonstrate my point:
add /* { dg-options "-ffinite-math-only -fno-math-errno" } */
to c1builin-integral.cc

and watch the test pass (since now we are using the flags across all
compilations).

Or similarly, remove the same dg-options entry from
c0builtin-integral.h and watch the test pass (since now we are using
no additional flags across all compilations)

Gab

On Mon, Jul 18, 2011 at 5:29 PM, Gabriel Charette  wrote:
> Hey guys,
>
> so the asm diff we are seeing in c1builtin-integral is not something
> we are not streaming out, or any other logic being wrong in the pph.
>
> The problem is: we define a dg-options entry in the header file which
> tells deja-gnu to add flags to the compilation (so far so good...)
>
> The problem is: deja-gnu only sees those options when doing the pph
> compile (as deja-gnu takes its options by reading the file compiled it
> seems, not the full translation unit).
>
> In our case compiling the pph with -ffinite-math-only -fno-math-errno.
>
> When compiling the C file, those flags are no longer there (deja-gnu
> doesn't pass them in as they are not present in the C file)
>
> Furthermore, when doing the non-pph compile, those flags aren't passed either.
>
> It turns out, if we either:
> 1) Do the pph compile without the flags, and then the pph and non-pph
> compile: there is no asm diff
> 2) Do the pph compile with the flags, but also pass the flags when
> doing the C file compile (with and without pph): there is also no asm
> diff!
>
> So the question is: how do we want to treat flags passed to the pph
> compile? I don't think it makes sense to use these flags in the rest
> of the compilation, but not doing so also results in the flag only
> being use for half of the compilation...
> Maybe we should save the flags used as part of the pph compile and
> invalidate the pph file if we don't get the same set of flags when
> doing the real compilation (as we do, i think, with the
> identifier(macro) context present when compiling a given pph file)??
>
> Gab
>


Re: Flags and pph...

2011-07-18 Thread Gabriel Charette
On Mon, Jul 18, 2011 at 5:52 PM, Diego Novillo  wrote:
> On Mon, Jul 18, 2011 at 20:45, Gabriel Charette  wrote:
>> To demonstrate my point:
>> add /* { dg-options "-ffinite-math-only -fno-math-errno" } */
>> to c1builin-integral.cc
>>
>> and watch the test pass (since now we are using the flags across all
>> compilations).
>
> Ah, OK.  That's your fix then.
>

Ya, that fixes the test.

But the sole fact that pph was not throwing an error is wrong in my opinion.

We should probably ensure that the same flags are enabled/disabled
that were at the pph compile time (otherwise we could do the parsing
with one set of flags and compile with a different set of flags...)

We probably only want to enforce this for a subset of the flags (i.e.
we don't care about flags like -Wall and stuff like that, but only
about the flags actually affecting the binary output).

We can discuss this in the Hydrazine meeting tomorrow.

Gab


Linemap and pph

2011-07-22 Thread Gabriel Charette
Hi,

@tromey: We have a question for you: the problem is detailed here and
our question to you is at the bottom. Thanks!

So the line problem we are having in pph is much bigger then we
originally thought.

The problem is:
As we've had issues with before: the chain of name bindings in any
cpp_bindings is inserted at the head, such that when we stream it back
in, the last one put on the chain by the parser is read first.

This is a problem in the linemap case because when we read the last
name, we also read it's input_location (line and column) and try to
replay that location by calling linemap_line_start (so far so
good). This issue is that linemap_line_start EXPECTS to be called
with lines in increasing orders (but since the bindings are backwards
in the chain we actually call it with lines in a decreasing order),
the logic in linemap_line_start is such that if the line# is less then
the previous line# it was called with, it assumes this must be a new
file and creates a new file entry in the line_table by calling
linemap_add (which in our case is WRONG!!).

>From a comment found in libcpp/line-map.c, linemaps clearly assume
this fact: "Since the set (line_table) is built chronologically, the
logical lines are monotonic increasing, and so the list is sorted and
we can use a binary search."

@crowl: Maybe this is the issue with resume_scope?? I'm just thinking
that since it seems lookups are fairly tied to the line_table, and the
fact that it should be built in a monotonically increasing order.

In the past we have solved similar issues (caused by the backwards
ordering), by replaying whatever it was in the correct order (before
doing anything else), and then simply loading references using the
pph_cache when the real ones were needed. BUT we can't do this with
source_locations as they are a simple typedef unsigned int, not actual
structs which are passed by pointer value...

Potential solution? :
I thought of a few ways to solve this:
1) Catch the calls to the linemap functions when generating the pph,
store them with the pph and replay them first before anything else
when reading the pph file.
The problem with that is that to when trying to obtain the source
location for a given binding streamed in: after calling
linemap_line_start when needed, we call linemap_position_for_column
which ASSUMES that we are currently on the last line which
linemap_line_start was called with (and thus doing the full replay
first would not really work here, is there another function we can
call that can take a line as an argument as well as a column??)

2) We could potentially output a reference to the source_location and
use that to do what we always do using the cache in this sort of
situation, however, we would need to figure out exactly which integer
we need to output a reference of (since they are passed around by
value...). This would be kinda hacky, and would also imply building
some hackery to go around the fact that the lto version of the input
location doesn't need to use references (i.e. we would need a streamer
hook, but two different functions signatures...). I don't think this
is a nice way of going about it, but could be an option...

@tromey: I hear you are the person in the know when it comes down to
linemaps, do you have any hint on how we could rebuild a valid
line_table given we are obtaining the bindings from the last one in
the file to the first one (that is, using pph (pre-parsed headers)
where we are trying to restore a cached version of the parser state
for a header)?

Thanks,
Gabriel


Re: Linemap and pph

2011-07-22 Thread Gabriel Charette
Thanks for the quick reply Tom,

On Fri, Jul 22, 2011 at 1:39 PM, Tom Tromey  wrote:
>>>>>> "Gabriel" == Gabriel Charette  writes:
>
> Gabriel> @tromey: We have a question for you: the problem is detailed
> Gabriel> here and our question to you is at the bottom. Thanks!
>
> Sure.  I have not followed PPH progress very closely and after reading
> your message I am not sure I have much to offer.  I'll give it a stab
> anyhow.
>
> Gabriel> This is a problem in the linemap case because when we read the last
> Gabriel> name, we also read it's input_location (line and column) and try to
> Gabriel> replay that location by calling linemap_line_start (so far so
> Gabriel> good). This issue is that linemap_line_start EXPECTS to be called
> Gabriel> with lines in increasing orders (but since the bindings are backwards
> Gabriel> in the chain we actually call it with lines in a decreasing order),
> Gabriel> the logic in linemap_line_start is such that if the line# is less 
> then
> Gabriel> the previous line# it was called with, it assumes this must be a new
> Gabriel> file and creates a new file entry in the line_table by calling
> Gabriel> linemap_add (which in our case is WRONG!!).
>
> My belief is that linemap was designed to support the typical case for a
> C and C++ compiler.  However, this is not sacrosanct; you could perhaps
> change the implementation to work better for your case.  I suspect,
> though, that this won't be easy -- and also be aware of Dodji's patch
> series, which complicates linemap.
>

We are tying to keep pph as "pluginable" as possible (Diego correct me
if I'm wrong), so changing the actual implementation of the linemap
would be our very last resort I think.

> Gabriel> In the past we have solved similar issues (caused by the backwards
> Gabriel> ordering), by replaying whatever it was in the correct order (before
> Gabriel> doing anything else), and then simply loading references using the
> Gabriel> pph_cache when the real ones were needed. BUT we can't do this with
> Gabriel> source_locations as they are a simple typedef unsigned int, not 
> actual
> Gabriel> structs which are passed by pointer value...
>
> A source_location is a reference to a particular line_map.  It is sort
> of like a compressed pointer.
>

Right, but our cache works in a way that, say you output a refrenece
to a struct, you first remember the pointer value, then output the
struct, if you try to ouput the same reference (same pointer value)
later we simply mark that as a shared reference in the output (and
have logic to link the two on the way in).

However since source_location aren't pointers per se, this wouldn't
work (at least with our current cache implementation, and changing
that is also last resort in my opinion)

> Gabriel> @tromey: I hear you are the person in the know when it comes down to
> Gabriel> linemaps, do you have any hint on how we could rebuild a valid
> Gabriel> line_table given we are obtaining the bindings from the last one in
> Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
> Gabriel> where we are trying to restore a cached version of the parser state
> Gabriel> for a header)?
>
> Can you not just serialize the line map when creating the PPH?
>

We were wondering about that, the problem we thought is that a pph can
be loaded from anywhere in many different C files (which would
generate a different linemap entry in each case I think?). If there
was a way to serialize the linemap entries from the LC_ENTER entry for
the header file to the LC_LEAVE (i.e. ignoring builtins, command line,
etc.), and then directly insert those entries in a given C file
compilation's linemaps entries, that would be great!

> Then, when using the PPH, read the serialized line map, in order, into
> the current global line map.  This will preserve the include chains and
> all the rest.  Then rewrite source locations coming from the PPH to new
> locations from the new line map.  This rewriting could perhaps even be
> done efficiently, say just by adding an offset to all locations --
> though I am not sure, this would require some research.  (It seems
> doable to me, though; perhaps, but not definitely, requiring some
> additional linemap API.)
>

I will investigate in that direction.

> I have no idea if this is practical for you.  I guess it depends on how
> a PPH is read in.
>

A pph is read in by: reading the preprocessor tokens and replaying
them (i.e. redefining them as if they had been pasted in the C file at
the very location of the include triggering the pph). We then load the
bindings attached to the global_namespace defined in the header and
all bin

Re: Linemap and pph

2011-07-25 Thread Gabriel Charette
On Sat, Jul 23, 2011 at 8:51 AM, Dodji Seketeli  wrote:
> Gabriel Charette  a écrit:
>
>>> Gabriel> @tromey: I hear you are the person in the know when it comes down 
>>> to
>>> Gabriel> linemaps, do you have any hint on how we could rebuild a valid
>>> Gabriel> line_table given we are obtaining the bindings from the last one in
>>> Gabriel> the file to the first one (that is, using pph (pre-parsed headers)
>>> Gabriel> where we are trying to restore a cached version of the parser state
>>> Gabriel> for a header)?
>>>
>>> Can you not just serialize the line map when creating the PPH?
>>>
>>
>> We were wondering about that, the problem we thought is that a pph can
>> be loaded from anywhere in many different C files (which would
>> generate a different linemap entry in each case I think?).
>
> Just to make sure I understand, a given header included from two
> different main C files with two different sets of macro context would
> yield two macro maps with different contents anyway.  So I would believe
> that these would yield two different pph, one for each C file.  Is that
> correct?
>

Before using a pph, we make sure the current macro context is the same
as the one in which it was originally compiled (as far as what that
header is using from that context). So yes, sometimes we have
different pph for different C files, but it's very possible to use the
same pph for different includes in independent C files.

> For the cases where the same pph could be loaded from two different main
> CUs, I'd say that in the context of a given main CU being parsed, loading
> the PPH and its associated serialized line map would yield a new line
> map to be inserted into the struct linemaps of the current CU.
>
> Then, not only should the source_locations encoded by that new
> de-serialized line map be rewritten (probably by modifying things like
> struct linemap::to_line and struct linemap::included_from, if need be)
> to fit into the source_location space of the line map set carried by the
> struct linemaps of the current main CU being parsed, but the
> source_locations carried by the tokens present inside the pph must also
> be changed to look as if they were yielded by the newly rewritten line
> map.  This is what could look expensive at first sight.
>

Right this is one of the option, but as you say, potentially an
expensive change.

> Wouldn't that address the issue of a given pph loaded by multiple main C
> files?
>

If it works, yes, I think.

>> If there was a way to serialize the linemap entries from the LC_ENTER
>> entry for the header file to the LC_LEAVE (i.e. ignoring builtins,
>> command line, etc.)
>
> I believe all the builtins have the same source_location, which is the
> BUILTIN_LOCATION constant.  As for command line options, I believe they
> get the location UNKNOWN_LOCATION.  So I would think these should not be
> a problem when de-serializing the line map.
>

Right, but no point serializing them only to filter them on input.


Re: Linemap and pph

2011-07-25 Thread Gabriel Charette
Alright, so after looking even more at the linemap code, here is what
I'm thinking now:

So the main problem is:
with the linemap logic is that linemap_line_start adds a new table
entry if line_delta < 0 (I don't understand why this is needed, my
assumption is that it is because the rest of the logic depends on
set->highest_location, thus we want to make sure we are higher than
the highest so far? can someone clarify why we need this?)

My solution:
I will add a boolean flag to linemap_line_start's parameters,
allowEarlierStartLineStart, which when true, will not create a new
entry in the line_table even if the line_delta < 0.

Instead of relying on highest_location to find the current line's
source location, it will use the start_location and add to it the
delta between to_line and map->to_line).

I will also add a new linemap function,
linemap_position_for_line_and_column, which will be just like
linemap_position_for_column, except that it won't assume the
highest_line in the set is the one we are currently on (this function
will thus allow us to get a source_location without depending on the
current assumptions on the order these functions are called in).

This solution would not modify the actual final result of the entries,
at the end of parsing, as far as I understand (I would also of course
only use this new flag in the call from lto_input_location (which we
use in pph to get our source_location from the streamed
expanded_location))

Let me know what you think,
Gabriel


Re: Linemap and pph

2011-07-26 Thread Gabriel Charette
On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli  wrote:
> Gabriel Charette  a écrit:
>
>> Alright, so after looking even more at the linemap code, here is what
>> I'm thinking now:
>>
>> So the main problem is:
>> with the linemap logic is that linemap_line_start adds a new table
>> entry if line_delta < 0 (I don't understand why this is needed, my
>> assumption is that it is because the rest of the logic depends on
>> set->highest_location, thus we want to make sure we are higher than
>> the highest so far? can someone clarify why we need this?)
>
> Here is how I understand this.  Sorry if it's too obvious.
>
> The linemap_line_start model follows that of a parser that parses tokens
> from left to right, and from top to bottom.  Let's imagine a main CU
> called M which has this layout:
>
> < start, locus #0
>
> [... tokens ...]
>
> #include "A"  <--- locus #1
>
> [... tokens ...]
>
> There are going to be at least three line maps (instances of struct
> line_map) created and added to the line map set (struct line_maps) of M:
> one for loci from #0 to right before #1, at least one for loci coming
> from the included header A (there can be several line maps here, if A
> itself includes other headers) and another one from loci coming right
> after #1.
>
> A hard invariant here is that source_locations yielded by a given line
> map must be less than the struct line_map::start_location of the next
> line map.  This is (partly) so that line map lookup (when we want to get
> the line map that corresponds to a given source_location) can be done
> using a binary search, which is faster than just doing a linear search.
> A side effect of this is that source_locations handed out by
> linemap_line_start increase monotonically.
>
> With these assumptions in mind, when the parser code calls
> linemap_line_start to get the source_location corresponding to a new
> line at column 0, the only case where the new line is less than the
> preceding line (so that line_delta < 0) is if the parser is entering a
> new file.  E.g, it is entering the header A, coming from M.  Or, it is
> getting back to M, leaving A.  In that case, it seems required for
> linemap_line_start to create the new line map for A or for M.
>

Thanks for this explanation, that's pretty much how I had understood
the linemap code, but you clarified some of the things I wasn't clear
about.

>> My solution:
>> I will add a boolean flag to linemap_line_start's parameters,
>> allowEarlierStartLineStart, which when true, will not create a new
>> entry in the line_table even if the line_delta < 0.
>
> If I understand correctly, you are acting like if you were parsing from
> right to left and from bottom to top, for the content of A.
>

Right.

> In that context, you need to keep the hard invariant I talked about
> above, so that line map lookup keeps working, at least.
>
> In other words, when you generate a source_location for a token T0 that
> comes topologically /before/ the token T1 you generated source_location
> for at the previous step, you need to make sure that source_location of
> T0 is less than source_location of T1, and that the struct
> line_map::start_location of the current map is less than T0.  If T0
> comes from a different header than T1 (suppose both T1 and T0 comes from
> headers that are included by A) then a new line map (different from the
> line map for T1) needs to be created for T0.  Now I am not sure if in
> your design, there is going to be only one pph for A, or if there is
> going to be one pph for each header included by A as well.  I would have
> imagined the later case to be more appropriate.
>

Correct we are using the multiple pph approach (one for each included
header in the header).

This actually makes the problem harder than I had originally pictured
it yesterday. I'm now thinking using serialization is going to be a
better solution because of that.

>> Instead of relying on highest_location to find the current line's
>> source location, it will use the start_location and add to it the
>> delta between to_line and map->to_line).
>
> If the map->start_location was allocated correctly, I guess this could
> work assuming the constraints I raised above are respected.
>
> Note however that there are corner cases like having so long lines in
> the source code being parsed that we fill the column-space of the line
> map so quickly that we need to allocate more line maps for a given file.
> These are corner cases taken care of by linemap_line_start today that
> you could miss if you are trying to walk in reverse order like this.
> Those would have been ad

Re: Linemap and pph

2011-07-26 Thread Gabriel Charette
As mentionned in my previous email, I'm now thinking of serializing
the linemap from the header by pph'ed, here is how I think this could
work:

From what I understand, the source_locations allocated for everything
in a given set of headers (from the LC_ENTER for the header in the
line_table up to, and including everything in between, the
corresponding LC_LEAVE) is dependent on only one thing; the value of
line_table->highest_location when the header was inserted (i.e. if in
two different contexts we happen to have the same
line_table->highest_location when inserting header A.h, all of the
tokens for A.h in each context will have the same source_location).

If the previous statement is true, then we calculate an offset between
the line_table->highest_location as it was in the serialized version
of the header and as it is in the C file in which we are about to
de-serialize the line table.

We then need to update some things based on that offset:
  for each line_map entry in the line_table: start_location
  for each source_location field on tokens coming in from the header

Some other things need to be updated with simple logic:
  for each line_map entry in the line_table: included_from
  for the line_table itself: highest_location, highest_line, max_column_hint

Some things just stay as is when de-serialized:
  for each line_map entry in the line_table: reason, sysp, to_file,
to_line, column_bits

After doing this we should have the same line_map we would have had
when doing a regular compile (non-pph), and the tokens coming from the
header file should have the correct source_location, simply by adding
the calculated offset to the one they had previously (this also allows
us to only serialize the source_location for each token as opposed to
serializing it's expanded location, saving space in the pph).

Let me know what you think,
Gabriel

On Tue, Jul 26, 2011 at 11:10 AM, Gabriel Charette  wrote:
> On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli  wrote:
>> Gabriel Charette  a écrit:
>>
>>> Alright, so after looking even more at the linemap code, here is what
>>> I'm thinking now:
>>>
>>> So the main problem is:
>>> with the linemap logic is that linemap_line_start adds a new table
>>> entry if line_delta < 0 (I don't understand why this is needed, my
>>> assumption is that it is because the rest of the logic depends on
>>> set->highest_location, thus we want to make sure we are higher than
>>> the highest so far? can someone clarify why we need this?)
>>
>> Here is how I understand this.  Sorry if it's too obvious.
>>
>> The linemap_line_start model follows that of a parser that parses tokens
>> from left to right, and from top to bottom.  Let's imagine a main CU
>> called M which has this layout:
>>
>> < start, locus #0
>>
>> [... tokens ...]
>>
>> #include "A"  <--- locus #1
>>
>> [... tokens ...]
>>
>> There are going to be at least three line maps (instances of struct
>> line_map) created and added to the line map set (struct line_maps) of M:
>> one for loci from #0 to right before #1, at least one for loci coming
>> from the included header A (there can be several line maps here, if A
>> itself includes other headers) and another one from loci coming right
>> after #1.
>>
>> A hard invariant here is that source_locations yielded by a given line
>> map must be less than the struct line_map::start_location of the next
>> line map.  This is (partly) so that line map lookup (when we want to get
>> the line map that corresponds to a given source_location) can be done
>> using a binary search, which is faster than just doing a linear search.
>> A side effect of this is that source_locations handed out by
>> linemap_line_start increase monotonically.
>>
>> With these assumptions in mind, when the parser code calls
>> linemap_line_start to get the source_location corresponding to a new
>> line at column 0, the only case where the new line is less than the
>> preceding line (so that line_delta < 0) is if the parser is entering a
>> new file.  E.g, it is entering the header A, coming from M.  Or, it is
>> getting back to M, leaving A.  In that case, it seems required for
>> linemap_line_start to create the new line map for A or for M.
>>
>
> Thanks for this explanation, that's pretty much how I had understood
> the linemap code, but you clarified some of the things I wasn't clear
> about.
>
>>> My solution:
>>> I will add a boolean flag to linemap_line_start's parameters,
>>> allowEarlierStartLineStart, which when true, will not create a new
>>> entry 

Re: Linemap and pph

2011-07-27 Thread Gabriel Charette
I think I wasn't clear in the way I expressed my assumptions in my last email:

On Wed, Jul 27, 2011 at 1:11 AM, Dodji Seketeli  wrote:
>
> Gabriel Charette  a écrit:
>
> > From what I understand, the source_locations allocated for
> > everything in a given set of headers (from the LC_ENTER for the
> > header in the line_table up to, and including everything in between,
> > the corresponding LC_LEAVE) is dependent on only one thing; the
> > value of line_table->highest_location when the header was inserted
>
> The source_location handed out by a given line map is dependant on
> three things:
>
> 1/ The line_map::start_location.  For a given map, this one equals
> line_table->highest_location + 1, because at the time the line map is
> created, its line_map::start_location must be greater than the highest
> source_location handed out by any line map previously created.  At any
> point in time, line_table->highest_location equals the
> line_map::start_location of the lastly created line_map.  This is part
> of the monotonically increasing property of source_location we were
> talking about earlier.
>

Actually from my understanding, highest_location is not equal to the
last start_location, but to the last source_location returned by
either linemap_line_start or linemap_positition_for_column (which is
>= to the start_location of the current line_map).

> 2/ The line number of the location
>
> 3/ The column number of the location
>

Right, that's what I mean, I would not actually stream
highest_location. What I meant by they "all depend on only
highest_location" is that IF highest_location is the same in file B.c
and in a different compiled file C.c when they happen to include A.h,
then all of the source_locations for the tokens in A.h (in both
compilation) would be identical (i.e. token1's loc in B == token1's
loc in C, etc.) (as 2/3 always is the same since we're talking about
the same file A.h in both compilation, hence if 1 also holds, we get
the same result).

> > (i.e. if in two different contexts we happen to have the same
> > line_table->highest_location when inserting header A.h, all of the
> > tokens for A.h in each context will have the same source_location).
>
> Each token coming from a given A.h will have a different
> source_location, as they will presumably have a different {line,column}
> pair.
>

What I meant is that all of the source locations handed out in the
first compilation will be the same as all of the source locations
handed out in the second compilation, pairwise (not that ALL token's
source locations themselves will be the same within a single
compilation of course!).

> If in two different contexts we happen to have the same
> line_table->highest_location, the line_map::start_location of the two
> different line maps of the two different A.hs will be equal, though.
>
> > If the previous statement is true, then we calculate an offset
> > between the line_table->highest_location as it was in the serialized
> > version of the header and as it is in the C file in which we are
> > about to de-serialize the line table.
> >
> > We then need to update some things based on that offset:
>
> [...]
>

Hence, given that they only depend on start_location, I just have to
calculate an offset between the serialized start_location and the
start_location as it would be (highest_location + 1) in the C file
including the header, and offset all of the source_locations on each
token coming from the pph (without even needing to recalculate them!).

> It seems to me that you could just set the line_map::start_location of
> the de-serialized map for the current portion of A.h to the current
> line_table->highest_location of the main CU your are currently parsing
> (i.e, just forget about the serialized line_table->highest_location
> into account.  Actually I would think that it's unnecessary to
> serialize the line_table->highest_location) + 1.
>
> Then you should also set the de-serialized line_map::to_line to the
> current line in your context.  Then you should add that line map to
> the current line_table and set line_table->highest_location to the
> line_map::start_location you have just computed.
>
> Then, assign the source_location of each token that belongs to that
> de-serialized map to a source_location that is handed out by your new
> linemap_position_for_line_and_column, with de-serialized map passed in
> argument.  This is where you would progress "backward" in the token
> stream, for that given map.
>
> Then somehow, when you need to suck in a new pph (so this time you are
> going downward in the stream again), just start this dance of
> de-serializing the map for th

Re: Linemap and pph

2011-07-27 Thread Gabriel Charette
>>
>> Hence, given that they only depend on start_location, I just have to
>> calculate an offset between the serialized start_location and the
>> start_location as it would be (highest_location + 1) in the C file
>> including the header, and offset all of the source_locations on each
>> token coming from the pph (without even needing to recalculate them!).
>>
>
> That could work.  But then you'd need to do something for a map encoding
> the locations of tokens coming from the pph to appear in line_table,
> right?  Otherwise, at lookup time, (when you want to find the map that
> matches the source_location of a token coming for that pph), you'll be
> in trouble.  I am saying this b/c you are not calling linemap_line_start
> anymore.  And that function was the one that was including the said map
> to line_table.  And the map still must be inserted into line_table
> somehow.
>

The lookup, from what I understand, only depends on the line_map
entries for the header to be present, and the same as they would be
*had* the functions been called (not that each token actually called
the linemap getters to get its location), and also of course depends
on that the tokens have the correct source_locations *as if* obtained
from the linemap getters.

>> Doing it this way (with the offset) I would read in all the tokens and
>> linemap entries inherited from that header and it's underlying include
>> tree, thus no need to be tricky about inserting line tables for the
>> header's included file, as they are part of the header's serialized
>> line_table by recursion (a pph'ed header can include other pph'ed
>> header),
>
> This is what I am not sure to understand.  There is only one line table
> per CU.  The headers included by the CU generate instances of struct
> line map that are inserted into the line table of the CU.  So I don't
> understand what you mean by "header's serialized line_table", as I don't
> think there is such a thing as a header's line_table.

What I mean by "serialized header line_table" is the serialized
version of the line_table as it was when were done parsing the header
being pph'ed.

I would then de-serialize that and insert it's line_map entries in the
C file's line_table, doing the necessary offset adjustements in the
process (and updating all other line_table variables like
highest_location that would have changed if we had actually called the
linemap functions)


Best,
Gabriel


Re: Line 0 Hack??

2011-08-01 Thread Gabriel Charette
Re-sending as plain text for gcc@gcc.gnu.org ...



Hi,

I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c
(or just text search for "Hack", it's the only place it's found in
that file...).

From my revision history, Steven introduced this in 2005, and Tom
modified it in 2007 (probably when modifying the linemap).

The problem is that this very call to get the source_location of line
0 creates a NEW linemap entry in the line table, AFTER all of the
LC_LEAVE have taken place (i.e. we were done parsing and now add a new
linemap to the line_table)...

And hence, we finish the parsing with line_table->depth == 1.

In particular, I am building linemap serialization for pre-parsed
headers, and added what I think to be a fair gcc_assert that when we
serialize the line_table, it's depth should be 0. However, if we
happen to have "main" in the header (this is when we get in this block
in decl.c from my understanding as DECL_MAIN_P is then true), a new
linemap is added at the end of the line table in the header after the
LC_LEAVE... when merging this in the middle of a C file upon
deserializing, this entry makes no sense, but I can't just ignore it
either as a source_location has been handed off for it...

My question is, what is this "special line 0" is this just a hack for
this particular situation or is "line 0" a much more important concept
I can't mess around with?

I could potentially hack around it, but hacking around a hack can only
make things worst in the long run..

I'm not familiar with the "middle end warning" referred to in the
comment in decl.c, could we potentially once and for all get rid of
this hack?

Best,
Gabriel


Re: Line 0 Hack??

2011-08-01 Thread Gabriel Charette
I have removed the hack and the test output is identical to the clean
build test output.

See issue4835047 for the patch.

Gabriel

On Mon, Aug 1, 2011 at 2:56 PM, Gabriel Charette  wrote:
> Re-sending as plain text for gcc@gcc.gnu.org ...
>
> 
>
> Hi,
>
> I have a question about the line 0 hack on line 13232 of gcc/cp/decl.c
> (or just text search for "Hack", it's the only place it's found in
> that file...).
>
> From my revision history, Steven introduced this in 2005, and Tom
> modified it in 2007 (probably when modifying the linemap).
>
> The problem is that this very call to get the source_location of line
> 0 creates a NEW linemap entry in the line table, AFTER all of the
> LC_LEAVE have taken place (i.e. we were done parsing and now add a new
> linemap to the line_table)...
>
> And hence, we finish the parsing with line_table->depth == 1.
>
> In particular, I am building linemap serialization for pre-parsed
> headers, and added what I think to be a fair gcc_assert that when we
> serialize the line_table, it's depth should be 0. However, if we
> happen to have "main" in the header (this is when we get in this block
> in decl.c from my understanding as DECL_MAIN_P is then true), a new
> linemap is added at the end of the line table in the header after the
> LC_LEAVE... when merging this in the middle of a C file upon
> deserializing, this entry makes no sense, but I can't just ignore it
> either as a source_location has been handed off for it...
>
> My question is, what is this "special line 0" is this just a hack for
> this particular situation or is "line 0" a much more important concept
> I can't mess around with?
>
> I could potentially hack around it, but hacking around a hack can only
> make things worst in the long run..
>
> I'm not familiar with the "middle end warning" referred to in the
> comment in decl.c, could we potentially once and for all get rid of
> this hack?
>
> Best,
> Gabriel
>