One correction: Just checked the C99 standard, the restricted pointer is officially supported. So if should be safe for you to use this in your opencl kernel.
> -----Original Message----- > From: Zhigang Gong [mailto:[email protected]] > Sent: Thursday, February 12, 2015 6:21 PM > To: '彭席汉' > Cc: '褰腑姹?peng_xihan@dahuatec'; '[email protected]'; 'Song, > Ruiling' > Subject: RE: [Beignet] a question about default optimize option when building > > I just found a non-standard performance hint keyword __restrict__ is > supported by CLANG. > You could try it this way > > __global unsigned char __restrict__ *p; > int a, b, c, d; > > res1 = *p * (a*b + c*d); > <some code here > > res2 = *p * (a*b + c*d + 1); > > > Then the compiler will treat pointer p pointing to an unique chunk of memory > which could only be accessed by p. Then if there is no storing to pointer p > between the res1 and res2 assignment, the res2 will not generate an extra > load. > > Please have a try at your kernel and check the difference by enable the LLVM > output as below: > > export OCL_OUTPUT_LLVM_AFTER_GEN=1 > > Please be warned, this is not a portable way even it works for beignet, may > not > work with other OpenCL library. > > > -----Original Message----- > > From: Beignet [mailto:[email protected]] On Behalf > > Of Zhigang Gong > > Sent: Thursday, February 12, 2015 4:57 PM > > To: 彭席汉 > > Cc: 褰腑姹?peng_xihan@dahuatec; [email protected]; Song, > > Ruiling > > Subject: Re: [Beignet] a question about default optimize option when > > building > > > > Athough theoretically this is doable for OpenCL, if user could > > guarantee all the buffers are not overlapped, then the compiler could > > do more aggresive optimization to merge duplicate loads. > > > > But you can check the OpenCL C spec or C99 spec, there are no such > > type of attribute qualifiers defined, so there is no safe/portable way > > to do this type of optimization from compiler side currently. > > > > On Thu, Feb 12, 2015 at 05:32:08PM +0800, 彭席汉 wrote: > > > Any way to tell compiler I have no side effect for memory pointer p > > > and I don't > > want to load it again? > > > > > > > > > > > > > > > 彭席汉 > > > 2015-02-12 > > > > > > > > > > > > 发件人: Zhigang Gong > > > 发送时间: 2015-02-12 17:14:36 > > > 收件人: Song, Ruiling > > > 抄送: [email protected]; 彭席汿 > > [email protected]> > > > 主题: Re: [Beignet] a question about default optimize option when > > > building > > > > > > Some additional analysis based on ruiling's comment. > > > > > > The second load from p(to calculate res2) may or may not be issued. > > > It depends on whether there are some side effect instructions > > > between > > > res1 and res2's assignment. For example, if there is a store > > > instruction or there is a barrier, the second load will be issued > > > and you will see two loads for the same pointer in the final instruction > stream. > > > > > > As to the a * b + c*d, it will always be optimized and be reused > > > when calculate for res2 which means at the res2 assignment it will > > > only generate one add instruction to add the 1 to the previous > > > calculated value. > > > > > > On Thu, Feb 12, 2015 at 08:56:57AM +0000, Song, Ruiling wrote: > > > > It should not read global memory again. We already enable such > > > > kind of > > optimization pass in LLVM. > > > > And (a*b+c*d) should not calculate again. This is common-subexpression. > > Clang should do it easily. But I am not quite sure whether clang is > > affected by > > -O2 or -O0. Anyone know details? > > > > > > > > To check specific kernel. You may need to ‘export > > OCL_OUTPUT_LLVM_AFTER_GEN=1’ and build your program again to get > the > > LLVM IR. > > > > > > > > From: Beignet [mailto:[email protected]] On > > > > Behalf Of 彭席汉 > > > > Sent: Thursday, February 12, 2015 4:40 PM > > > > To: [email protected] > > > > Subject: [Beignet] a question about default optimize option when > > > > building > > > > > > > > Hi: > > > > > > > > My CL kernel program looks like as follow: > > > > > > > > __global unsigned char *p; > > > > int a, b, c, d; > > > > > > > > res1 = *p * (a*b + c*d); > > > > > > > > <some code here > > > > > > > > > res2 = *p * (a*b + c*d + 1); > > > > > > > > > > > > If I use default build option, for res2, what will EU do? read > > > > global memory > > for pointer p again and do computing of "a*b + c*d" again? > > > > > > > _______________________________________________ > > > > Beignet mailing list > > > > [email protected] > > > > http://lists.freedesktop.org/mailman/listinfo/beignet > > > > > > _______________________________________________ > > > Beignet mailing list > > > [email protected] > > > http://lists.freedesktop.org/mailman/listinfo/beignet > > > = = = = = = = = = = = = = = = = = = = = > > > > > > 致 > > > 礼! > > > > > > > > > > > > 2015-02-12 > > > > > > **************************************************************** > > ****** > > > ****************** > > > > > > 公司名称:浙江大华技术股份有限公司 > > > ZheJiang Dahua Technology CO.,LTD. > > > 地址:杭州滨江区滨安路1199号 > > > 部门:存储产品线-NVR产品线 > > > 手机:18969076807 > > > 邮政编码:310053 > > > E-mail: [email protected] > > > Http: //www.dahuatech.com > > > > > > > > > **************************************************************** > > ****** > > > ******************** > > > > > > _______________________________________________ > > Beignet mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/beignet _______________________________________________ Beignet mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/beignet
