But if branch prediction doesn't factor in, what is the explanation of this:
*julia> *a *=* *rand*(5000); *julia> *b *=* *rand*(5000); *julia> *c *=* *rand*(5000) *+* 0.5; *julia> *d *=* *rand*(5000) *+* 1; *julia> **@time* *essai*(200,a,b); 14.607105 seconds (5 allocations: 1.922 KB) *julia> **@time* *essai*(200,a,c); 8.357925 seconds (5 allocations: 1.922 KB) *julia> **@time* *essai*(200,a,d); 3.159876 seconds (5 allocations: 1.922 KB) On Friday, September 9, 2016 at 12:53:46 AM UTC+2, Yichao Yu wrote: > > Shape is irrelevant since it doesn't affect the order in the loop at all. > > Branch prediction is not the issue here. > > The issue is optimizing memory access and simd. > > It is illegal to optimize the original code into `a[k] += ss1 > ss2`. It > is legal to optimize the `if ss1 > ss2 ak += 1 end` version to `ak += ss1 > > ss2` and this is the optimization LLVM should do but doesn't in this case. > > Also, the thing to look for to check if there's vectorization in llvm ir > is the vector type in the loop body like > > ``` > %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ] > %offset.idx = or i64 %index, 1 > %20 = add i64 %offset.idx, -1 > %21 = getelementptr i64, i64* %19, i64 %20 > %22 = bitcast i64* %21 to <4 x i64>* > store <4 x i64> zeroinitializer, <4 x i64>* %22, align 8 > %23 = getelementptr i64, i64* %21, i64 4 > %24 = bitcast i64* %23 to <4 x i64>* > store <4 x i64> zeroinitializer, <4 x i64>* %24, align 8 > %25 = getelementptr i64, i64* %21, i64 8 > %26 = bitcast i64* %25 to <4 x i64>* > store <4 x i64> zeroinitializer, <4 x i64>* %26, align 8 > %27 = getelementptr i64, i64* %21, i64 12 > %28 = bitcast i64* %27 to <4 x i64>* > store <4 x i64> zeroinitializer, <4 x i64>* %28, align 8 > %index.next = add i64 %index, 16 > %29 = icmp eq i64 %index.next, %n.vec > ``` > > having a BB named `vector.body` doesn't mean the loop is vectorized. > > > > On Thu, Sep 8, 2016 at 6:40 PM, 'Greg Plowman' via julia-users < > [email protected] <javascript:>> wrote: > >> The difference is probably simd. >> >> the branch will code will not use simd. >> >> Either of these should eliminate branch and allow simd. >> ak += ss1>ss2 >> ak += ifelse(ss1>ss2, 1, 0) >> >> Check with @code_llvm, look for section vector.body >> >> >> at 5:45:30 AM UTC+10, Dupont wrote: >> >>> What is strange to me is that this is much slower >>> >>> >>> function essai(n, s1, s2) >>> a = Vector{Int64}(n) >>> >>> @inbounds for k = 1:n >>> ak = 0 >>> for ss1 in s1, ss2 in s2 >>> if ss1 > ss2 >>> ak += 1 >>> end >>> end >>> a[k] = ak >>> end >>> end >>> >>> >>> >
