[clang] [Clang] Improve EmitClangAttrSpellingListIndex (PR #114899)

Erich Keane via cfe-commits Thu, 07 Nov 2024 06:39:07 -0800

erichkeane wrote:

> Hi @erichkeane, thanks for your detailed review and comments.
> 
> I had some thoughts about your optimized approach which I would like to share.
> 
> I agree that falling back to full string comparisons is "safe" but 
> non-optimal. However, performing "first different character" comparison might 
> get complicated if you have more than 2 different spelling names with the 
> same size.
> 
> An example case is `AT_NoSanitizeSpecific` with - "no_sanitize_thread" and 
> "no_sanitize_memory" (size 18). In this case it is easy to check for the 
> differing character by using `std::mismatch`. However, a case where there are 
> 3 such strings (like "aaaba", "aaacb", "aaabc") would involve generation of a 
> much more complex conditional. Although rare, I'm not sure if we can ignore 
> this situation. We may sacrifice readability of the emitter code / 
> debuggability of the `.inc` file at the cost of minimal optimizations (full 
> string comparisons now happen in only 6 cases).
> 
> I'm happy to add the FIXME note, but would like to hear what you think.


So the conditional actually doesn't have to be all that difficult.  You need 
N-1 comparisons to differentiate (though you m ight end up with duplicates).  
For example, in your 3 strings example if differentiating the first one, my 
algorithm would generate Str.size() == 5 && Str[3] == 'b' && Str[3] == 'b' 
(since that is the difference character for each).  It is simple enough join 
with an 'and' and a loop, as long as we don't care about duplicates.

As far as debugability/readability of emitted code: it isn't really a concern 
here.  Once we get the tablegen 'right' (which traditionally doesn't take much 
effort/testing), no one ever gets into here again.  The spelling-identification 
one uses a trie, and it is a giant mess of undebugability :) 

> 
> To address a previous comment -
> 
> > This seems to have missed a bunch, see alloc_size and allocating and 
> > alloc_align, all of which are the same spellings, but you're checking the 
> > name anyway (in the .inc file you linked).
> 
> I'm confident that my earlier approach did not perform string comparisons for 
> "alloc_size", "allocating" and "alloc_align". The 2 tests that were failing 
> were because I assumed that spellings with the same name would have 
> consecutive indices. This failed for `AT_unused`. But I concede that the 
> earlier loop did not leverage the `.size()` for a more optimal comparison :) 
> This way is definitely better.
> 

Got it!  It wasn't clear based on the old implementation what was happening, 
just from the GIST that something wasn't working.

> [Gist](https://gist.github.com/chinmaydd/1733a9a84932a45678c47f31be4d64d7) 
> contains updated output.

That GIST looks about right.  I see as you mentioned, ~6 different comparisons, 
which isn't too bad.  Definitely an improvement.

https://github.com/llvm/llvm-project/pull/114899
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Improve EmitClangAttrSpellingListIndex (PR #114899)

Reply via email to