[julia-users] Re: Performance of Kernel Inlining

Kristoffer Carlsson Sat, 29 Oct 2016 01:14:40 -0700

Could it be some alias checking going on?

Anyway, this code is horribly slow on 0.6 (even with #19097) it seems.


to_indexes(::Int64, ::Int64, ::Vararg{Int64,N}) at operators.jl:868 
(repeats 3 times)
kills performance.


On Saturday, October 29, 2016 at 5:56:12 AM UTC+2, Jared Crean wrote:
>
> I'm working on an high dimensional finite difference code, and I got a 
> strange performance result. I have a kernel function that
> computes the stencil at a given point, and an outer function, outer_func, 
> that loops over the dimensions and calls the kernel function at every grid 
> point.
> I created a second function, outer_func2, with the same loops as 
> outer_func, but rather than call the kernel function it has the contents of
> the kernel function copied into it.  The source code is here: 
> https://github.com/JaredCrean2/wave6d/blob/master/src/test_inline.jl
>
> The performance results (with bounds checking disabled and 
> --math-mode=fast) are:
>
> testing outer_func
>   0.398586 seconds
>   0.398821 seconds
> testing outer_func2
>   2.522230 seconds
>   2.522479 seconds
>
>
>
> I ran this on in Intel Ivy Bridge (i7-3820) processor, using Julia 0.4.4
>
> I looked at the llvm code (attached), and noticed outer_func2 has a bunch 
> of extra statements that look like
>
>   %lsr.iv570 = phi i8* [ %scevgep571, %L21 ], [ %scevgep569, %L.preheader 
> ]
>
>
>
> that are not present for outer_func.  I don't know llvm code very well 
> (hardly at all), so I'm not sure what these mean.  Any help
> understanding either the llvm code or the performance difference would be 
> appreciated.
>
>
>
>   Thanks,
>      Jared Crean
>

[julia-users] Re: Performance of Kernel Inlining

Reply via email to