> > Btw, how does this all scale with the worst testcase? That is, a very deep > inheritance hierarchy with each level virtually overriding all methods and > very many duplicate input files? Is that really the worst case? We should > make sure not to introduce quadratic or worse algorithms. (I didn't see you > limiting the number of possible call targets)
Very many duplicate input files are not an issue, since I identify the ODR types and handle them only by leaders. To get things working well in practice (not in worst case) I use the query cache. The purpose of query cache is to re-use the target lists when same virutal call appears in program many times. The cache tokens passed to users allows them to not walk the same list many times when the work depends only on callees (as it does for dead code removal). So to trigger the qudaratic case you need to introduce something like 1000 classes inheriting each other and each overwriting the method and then call with type of each of them (so you have 1000 different calls). Then the call target query cache will have 1+2+..1000 entries. Worse yet, make the first baseclass to have 1000 methods and you can trigger linear walks in the gimple_get_virt_method_for_binfo linear slowness. Of course here you will generate quadratically sized vtables into the binary. I made simple script constructing this and we died in FE (binfos are quadratic already and we have non-linear walks on them). So I declared it premature optimization and decided to get things correct first. The simple way around is to start returning empty list once call target cache is, say 100*number of virtual methods in unit. The mainline version with patch enabling it on LTO (I wait for approval of the ODR patch that makes use of VTABLE manglings) I get the overall overhead of call target analysis and walks in unreachable code removal on Firefox under 2%. Vast majority goes to constructor folding that is called from gimple_get_virt_method_for_binfo. Sensible speedup would be to reconstruct vectors of virtual methods from vtables instead of walking the vtables everywhere, but for that I would like to change all the binfo walks we have in middle-end, so I would like to cleanup and fix existing code a bit first. Finally the cache can be made smaller (linear to vtable size in case of single) inheritance by simply making it to return partial vectors and have list of outer type to contain the list of inner type. It means that I will need to invent my own type for return value of possible_polymorphic_call_targets and iterators on that + some smarter way to tell passes calling me when they do redundant work. If you preffer to have the limit on cache size now, I can push out a patch. Honza