arsenm wrote:

> > I can understand handling the other primitive types, like short2 or a 
> > pointer, but I think it's unreasonable for this builtin to support all of 
> > these aggregates
> 
> Without builtin support, users have to manually decompose structs into 32-bit 
> words and reassemble them. This is tedious and error-prone. 

But that is exactly what they should do. Special aggregate handling is also an 
extra hazard in the compiler. General users should not be using builtins. 

> Worse, if they use memcpy or pointer casts, the compiler may introduce 
> scratch memory that it cannot optimize away. Our CodeGen avoids this . it 
> stores the aggregate, loads as integer, splits into 32-bit words, permutes 
> each, and reassembles. All integer ops, no scratch, SROA-friendly. This is 
> similar to how C++ lets you assign structs by value instead of requiring 
> memcpy. Permute is a fundamental warp operation on GPU. Making it work 
> transparently for arbitrary trivially-copyable types is a big usability win, 
> and the implementation is modest — a single loop over 32-bit words in one 
> self-contained function.

None of this has anything to do with this builtin. 


This builtin should only accept trivially legal 32-bit types 

https://github.com/llvm/llvm-project/pull/153501
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to