pekka.jaaskelainen added inline comments.

================
Comment at: lib/CodeGen/CGBuiltin.cpp:2033
@@ +2032,3 @@
+          *Arg1 = EmitScalarExpr(E->getArg(1));
+    llvm::Type *ReservedIDTy = ConvertType(getContext().OCLReserveIDTy);
+
----------------
Anastasia wrote:
> pekka.jaaskelainen wrote:
> > Anastasia wrote:
> > > Why do we need to modify user defined functions? Only builtins will have 
> > > this extra parameter.
> > > 
> > > I think packet size would be useful for builtin implementation to know 
> > > the number of bytes to be read/written. I don't see how to implement it 
> > > correctly otherwise. As mentioned earlier relying on the metadata is not 
> > > a conventional compilation approach and should be avoided if possible.
> > The pipe struct can have the packet size in its header before the actual 
> > ring buffer or whatever, which can be used as a fallback unless the 
> > compiler can optimize it to a larger access. Correct implementation thus 
> > doesn't require a "hidden parameter". Adding it as a compile time hidden 
> > constant argument should help the optimizers, that's of course true, but I 
> > don't think it's strictly required.
> > 
> > If you think having a special behavior for the built-ins calls isn't 
> > problematic, then fine, I'm not having so strong opinion on this.
> So where would this size be initialized/set? Note that host code might have 
> different type sizes.
> 
> I am thinking that having a size parameter makes the generated function more 
> generic since we lose information about the type here and recovering it might 
> involve extra steps. I am currently not clear about how to do that.
https://www.khronos.org/registry/cl/sdk/2.0/docs/man/xhtml/clCreatePipe.html 
sets the packet size in creation. We have a prototype implementation working 
using that. But that generic software version uses a byte loop which is 
painfully slow, which of course should be optimized to wide loads/stores 
whenever possible.

If we want to optimize it, it means inlining the builtin call and then deriving 
the size for the loop at the compile time after which it can be vectorized 
(assuming the packet fifo is aligned). Where this size constant comes from is 
the question under discussion -- you propose having it as an additional hidden 
parameter to the calls and I propose a metadata. Yours optimizes automatically 
but requires special casing (when calling a builtin, not user function, add 
this magic parameter), mine requires a new optimization that takes the MD in 
account and converts it to a constant after inlining the built-in.

I'm open for both as I do not really know how much trouble the special casing 
will bring and I appreciate that optimizations might just work, but in my 
understanding the size info is strictly not required, but very useful for 
optimization.


http://reviews.llvm.org/D15914



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to