On Wed, Jul 18, 2012 at 4:35 PM, Dodji Seketeli <do...@redhat.com> wrote:
> Dehao Chen <de...@google.com> writes:
>
>> Hi, Dodji,
>>
>> Thanks for the comments.
>
> You are welcome.
>
> [...]
>
>> We difinitely need to modify the linemap to associate the pointer with
>> the location_t.
>
> And that is the hard part.  :-)

You are right, I played with line_map a little bit today, it's quite
complex and it's hard to encode anything else into it.

The reason why we wanna build an integrated block_location (of type
location_t) is that we don't want to change the fold() interface. As a
result, we need to invent a way to map an integer to {location_t,
block} pair. How about the this approach:

1. Keep the FE unchanged.
2. Add a hashmap to map {location_t, block} to block_location_idx,
which is the index to {location_t, block} array.
3. After FE, use a separate pass to build the hashmap, and convert all
EXPR_LOCATION and gimple_location from original location_t to
block_location_idx.
4. After that, all reference (read/wrrite) to EXPR_LOCATION and
gimple_location will all refer to block_location_idx instead of
location_t.

The pros of this approach:
* both FE and fold() family are kept unchanged.
The cons of this approach:
* extra storage for {location_t, block} array
* needs to be very careful during implementation, e.g. after phase 3,
nothing should reference "input_location" because it is location_t.

Any comments?

Thanks,
Dehao
>
> You can read the comments of the line_map_ordinary struct in
> libcpp/include/line-map.h to understand how a given pair of line/column
> is encoded there.
>
> The general idea is that there is no bijection between the instances
> of location_t and a given line map.  Just a surjection, at best.  So the idea
> of adding a block pointer to a line map won't work.
>
> For instance, if you have a translation unit without header file and
> without macro, there will be only one line map, that will encode all the
> instances of location_t needed for all the tokens of that translation
> unit.  You need an association location_t <-> block for each location_t,
> sure, but then the blocks can reasonably not be stored in the unique
> line map ...
>
>>>
>>> From our discussion, it looks like a good way to go would be to do
>>> something like what the RTL infrastructure does with insn locators (e.g
>>> in emit-rtl.c, starting from bloc_locators_locs and going to
>>> locator_eq).
>>>
>>> Basically create a new "block_locator" integer that would associated
>>> with a {location_t, block} pair.  There would be a vector of
>>> 'block_locator's.  The indexes of the 'block_locator's would index a
>>> vector of {location_t, block} pair.  That way, looking up the
>>> {location_t, block} that corresponds to a given block_location would be
>>> possible.  This is or a variation along these lines.
>>>
>>>
>> I got your point. But one more thing: we need to change location_t in all
>> data structure into this index, right? E.g., gimple_location would return
>> this index instead of original location_t?
>
> I am not sure.  My understanding is that this new index will just be a
> "block_locator"; for blocks.  It would be only built whenever you need
> it for a block.  So gimple_location should stay as is.
>
>> And we also want to change expand_location(location_t) into
>> expand_location(index)?
>
> I am not sure either.  You'd be able to get the location_t from the
> block_index, so you'd just have to do something like:
>
>     expand_location (block_loc_to_source_loc (block_location));
>
> where block_loc_to_source_loc would be a new function.
>
>> An alternative approach would be simply add "block" as part of the hash
>> function in the linemap.
>
> That's the crux of the issue.  There is no real hash function in the
> line map.  Just a monotonic (wrt line/columns) encoding that is tailored
> to avoid "holes" in the integer space used for instances of location_t,
> and minimize the memory usage of the line maps.
>
> I don't really see how you can reasonably add a block into the existing
> encoding of instances of location_t ...
>
>> In this way, whenever the block is changed, we can simply add a new
>> location_t number.
>
> Today, as the encoding is designed to avoid holes in the location_t
> space, you cannot create a new instance of location_t "in the middle" of
> the instances of location_t already created.
>
> --
>                 Dodji

Reply via email to