tra added a comment.

In D110089#3014388 <https://reviews.llvm.org/D110089#3014388>, @jlebar wrote:
>> One alternative would be to use run-time dispatch, but, given that texture 
>> lookup is a single instruction, the overhead would be 
>> substantial-to-prohibitive.
>
> I guess I'm confused...  Is the parameter value that we're "overloading" on 
> usually/always a constant?  In that case, there's no overhead with runtime 
> dispatch.  Or, is it not a constant?  In which case, how does nvcc generate a 
> single instruction for this idiom at all?

It's a string literal.  And you're actually right, clang does manage to 
optimize strcmp with a known value. https://godbolt.org/z/h351hfsMf

However, it's only part of the problem. Depending on which particular operation 
is used, the arguments vary, too. I still need to use templates that 
effectively need to be parameterized by that string literal argument and I 
can't easily do it until C++20.
I'd need to push strcmp-based runtime dispatch down to the implementation of 
the texture lookups with the same operand signature. That's harder to 
generalize, as I'd have to implement string-based dispatch for quite a few 
subsets of the operations -- basically for each variant of cartesian product of 
`{dimensionality, Lod, Level, Sparse}`.

Another downside is that the string comparison code will result in functions 
being much larger than necessary. Probably not a big thing overall, but why add 
overhead that would be paid for by all users and which does not buy us 
anything? Having one trivial compiler builtin that simplifies things a lot is a 
better trade-off, IMO.

> But then I see `switch` statements in the code, so now I'm extra confused.  :)

That switch is for a special case of texture lookup which may result in one of 
four texture instruction variants. All others map 1:1.

> Overall, I am unsure of why we need all of this magic.  We can rely on LLVM 
> to optimize away constant integer comparisons, and also even comparisons 
> between string literals.

It makes it possible to usa a string literal to parameterize templates, which 
allows to generate variants of `__nv_tex_surf_handler` in a relatively concise 
way.

> What specifically would be inefficient if this were a series of "real" 
> overloaded functions, with none of the macros, templates, or builtins?  
> (Assuming efficiency is the concern here?)

It's both efficiency and avoidance of typos in repetitive nearly identical 
code. 
There are ~2500 variants of high-level texture lookup variants. They end up 
calling about 600 different `__nv_tex_surf_handler` overloads that, in turn,  
end up generating ~70 unique inline assembly variants. 
The current code structure reflects that hierarchy. This is essentially the 
reason for the parameterization by the operation name happening early, instead 
of being used as a key for runtime dispatch at the end.



================
Comment at: clang/lib/AST/ExprConstant.cpp:11097
 
+static int EvaluateTextureOp(const CallExpr *E) {
+  // Sorted list of known operations stuuported by '__nv_tex_surf_handler'
----------------
jlebar wrote:
> Write a comment explaining what this function does?
> 
> (It seems to...translate a string into an integer?  If so, to me, it's 
> strange that it uses a sorted list for this because...what if I add another 
> function?  Won't that mess up all the numbers?  Anyway, to be clarified in 
> the comment.)
> 
> Now that I read more, I see that you don't care about this being a stable 
> mapping etc etc...
> 
> I don't really get why this has to be a builtin at all, though.  If it's 
> always a string literal, a simple strcmp will do the job, LLVM can optimize 
> this?  And I'm almost sure you can assert that the char* is always a string 
> literal, so you can guarantee that it's always optimized away.
Yes, it's just a 1:1 map.  We do not care about specific values as they only 
matter within one TU. I'll document that.

I can't easily use string literal to parameterize a template. 

Hmm. Perhaps I can implement a `constexpr perfect_hash(literal)` in a header. 
This would eliminate the need for the builtin.
E.g. https://godbolt.org/z/bzzMbaKhe

Let me give it a try.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to