On Wed, Jul 18, 2012 at 4:35 PM, Dodji Seketeli <do...@redhat.com> wrote: > Dehao Chen <de...@google.com> writes: > >> Hi, Dodji, >> >> Thanks for the comments. > > You are welcome. > > [...] > >> We difinitely need to modify the linemap to associate the pointer with >> the location_t. > > And that is the hard part. :-)
You are right, I played with line_map a little bit today, it's quite complex and it's hard to encode anything else into it. The reason why we wanna build an integrated block_location (of type location_t) is that we don't want to change the fold() interface. As a result, we need to invent a way to map an integer to {location_t, block} pair. How about the this approach: 1. Keep the FE unchanged. 2. Add a hashmap to map {location_t, block} to block_location_idx, which is the index to {location_t, block} array. 3. After FE, use a separate pass to build the hashmap, and convert all EXPR_LOCATION and gimple_location from original location_t to block_location_idx. 4. After that, all reference (read/wrrite) to EXPR_LOCATION and gimple_location will all refer to block_location_idx instead of location_t. The pros of this approach: * both FE and fold() family are kept unchanged. The cons of this approach: * extra storage for {location_t, block} array * needs to be very careful during implementation, e.g. after phase 3, nothing should reference "input_location" because it is location_t. Any comments? Thanks, Dehao > > You can read the comments of the line_map_ordinary struct in > libcpp/include/line-map.h to understand how a given pair of line/column > is encoded there. > > The general idea is that there is no bijection between the instances > of location_t and a given line map. Just a surjection, at best. So the idea > of adding a block pointer to a line map won't work. > > For instance, if you have a translation unit without header file and > without macro, there will be only one line map, that will encode all the > instances of location_t needed for all the tokens of that translation > unit. You need an association location_t <-> block for each location_t, > sure, but then the blocks can reasonably not be stored in the unique > line map ... > >>> >>> From our discussion, it looks like a good way to go would be to do >>> something like what the RTL infrastructure does with insn locators (e.g >>> in emit-rtl.c, starting from bloc_locators_locs and going to >>> locator_eq). >>> >>> Basically create a new "block_locator" integer that would associated >>> with a {location_t, block} pair. There would be a vector of >>> 'block_locator's. The indexes of the 'block_locator's would index a >>> vector of {location_t, block} pair. That way, looking up the >>> {location_t, block} that corresponds to a given block_location would be >>> possible. This is or a variation along these lines. >>> >>> >> I got your point. But one more thing: we need to change location_t in all >> data structure into this index, right? E.g., gimple_location would return >> this index instead of original location_t? > > I am not sure. My understanding is that this new index will just be a > "block_locator"; for blocks. It would be only built whenever you need > it for a block. So gimple_location should stay as is. > >> And we also want to change expand_location(location_t) into >> expand_location(index)? > > I am not sure either. You'd be able to get the location_t from the > block_index, so you'd just have to do something like: > > expand_location (block_loc_to_source_loc (block_location)); > > where block_loc_to_source_loc would be a new function. > >> An alternative approach would be simply add "block" as part of the hash >> function in the linemap. > > That's the crux of the issue. There is no real hash function in the > line map. Just a monotonic (wrt line/columns) encoding that is tailored > to avoid "holes" in the integer space used for instances of location_t, > and minimize the memory usage of the line maps. > > I don't really see how you can reasonably add a block into the existing > encoding of instances of location_t ... > >> In this way, whenever the block is changed, we can simply add a new >> location_t number. > > Today, as the encoding is designed to avoid holes in the location_t > space, you cannot create a new instance of location_t "in the middle" of > the instances of location_t already created. > > -- > Dodji