sammccall added a comment. In D124422#3474110 <https://reviews.llvm.org/D124422#3474110>, @ilya-biryukov wrote:
> Yeah, that sounds promising. Various source locations inside the same nodes > are probably very close. Probably even across nodes. Yeah. The reason to avoid compressing across nodes is it breaks lazy-loading. (Also I'm not sure whether related decls end up proximate in the output). Maybe there's a way around it, but this is much lower-hanging I think. > If we are willing to take on a bit more complexity, we could probably get > even bigger wins by storing all source locations in a single array having > indices into that array in the actual positions where we need the source > locations. This is an interesting idea, I wonder whether it will be a win. The delta-encodings are efficient as you note, but we have to represent SourceLocations twice. (value + index). I think the #bits we save storing indices vs SourceLocations is similar to the #bits we need to encode the deltas. If these cancel out then the savings we get are from deduplicating SourceLocations in the array - I suspect there aren't actually that many duplicates, we'd be lucky to halve the array size. Rough example with made-up numbers: - today we store raw SourceLocations, N * ~28 bits - maybe simple delta-encoding maybe reduces this by 8 to N * 20 bits average - sorted SourceLocation array (no delta-encoding, no deduplication) is N * 28, indices maybe N * 23 (assuming distance of 2^5 between sourcelocations) - delta-encoding the array means it takes conservatively N * 7 bits (5 bits average delta + VBR overhead + variance) - delta-encoding the indices yields the same 8 bits as before => N * 15 - deduplcating the halves the entries and reduces the size of indices by 1 bit - this yields N/2*7 for the array + N * 14 for indices = N * 17.5 So we come out ahead (20->17.5) if deduplication halves #locations. If it reduces by a third, it's 20->19 instead. This seems likely to be an improvement, over simple delta, but much smaller than delta vs status quo. > I suspect there are requirements that will force us to store the array in > chunks and allow searching for a particular chunk. > > However, that's really adding quite a bit of complexity and I'm not sure > preamble sizes are such a big deal outside our environment. E.g. even 10% > savings are borderline for that kind of complexity. Yeah, I don't plan to pursue this for now because the complexity/reward scares me, but if we get really squeezed for size we could revisit. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124422/new/ https://reviews.llvm.org/D124422 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits