[ 
https://issues.apache.org/jira/browse/LUCENE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391245#comment-17391245
 ] 

Gautam Worah commented on LUCENE-9918:
--------------------------------------

Hey @gmiller, I noticed that in the micro benchmark code in your 
lucene-pfor-benchmark [repo 
|[https://github.com/gsmiller/lucene-pfor-benchmark/blob/main/src/main/java/gsmiller/DecodeBenchmark.java#L15],]
 the main loop runs 10 times I think?

SomeĀ 
[sources|http://daniel-strecker.com/blog/2020-01-14_auto_vectorization_in_java/#Output%20Interpretation]
 suggest that usually the JIT compiler compiles and optimizes statements as and 
when it sees that a particular operation is repeated multiple times. So it 
first optimizes them a little and them some more iff it sees them again. So 
maybe we just need to repeat the experiment with say 100k iterations?

> Can PForUtil be further auto-vectorized?
> ----------------------------------------
>
>                 Key: LUCENE-9918
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9918
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: Greg Miller
>            Priority: Minor
>
> While working on LUCENE-9850, we discovered the loop in PForUtil::prefixSumOf 
> is not getting auto-vectorized by the HotSpot compiler. We tried a few 
> different tweaks to see if we could change this, but came up empty. There are 
> some additional suggestions in the related 
> [PR|https://github.com/apache/lucene/pull/69#discussion_r608412309] that 
> could still be experimented with, and it may be worth doing so to see if 
> further improvements could be squeezed out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to