Could it be some alias checking going on?
Anyway, this code is horribly slow on 0.6 (even with #19097) it seems.
to_indexes(::Int64, ::Int64, ::Vararg{Int64,N}) at operators.jl:868
(repeats 3 times)
kills performance.
On Saturday, October 29, 2016 at 5:56:12 AM UTC+2, Jared Crean wrote:
>
> I'm working on an high dimensional finite difference code, and I got a
> strange performance result. I have a kernel function that
> computes the stencil at a given point, and an outer function, outer_func,
> that loops over the dimensions and calls the kernel function at every grid
> point.
> I created a second function, outer_func2, with the same loops as
> outer_func, but rather than call the kernel function it has the contents of
> the kernel function copied into it. The source code is here:
> https://github.com/JaredCrean2/wave6d/blob/master/src/test_inline.jl
>
> The performance results (with bounds checking disabled and
> --math-mode=fast) are:
>
> testing outer_func
> 0.398586 seconds
> 0.398821 seconds
> testing outer_func2
> 2.522230 seconds
> 2.522479 seconds
>
>
>
> I ran this on in Intel Ivy Bridge (i7-3820) processor, using Julia 0.4.4
>
> I looked at the llvm code (attached), and noticed outer_func2 has a bunch
> of extra statements that look like
>
> %lsr.iv570 = phi i8* [ %scevgep571, %L21 ], [ %scevgep569, %L.preheader
> ]
>
>
>
> that are not present for outer_func. I don't know llvm code very well
> (hardly at all), so I'm not sure what these mean. Any help
> understanding either the llvm code or the performance difference would be
> appreciated.
>
>
>
> Thanks,
> Jared Crean
>