[jira] [Updated] (LUCENE-10299) investigate prefix/wildcard perf drop in nightly benchmarks

Robert Muir (Jira) Wed, 08 Dec 2021 13:06:50 -0800


     [ 
https://issues.apache.org/jira/browse/LUCENE-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-10299:
---------------------------------
    Description: 
Recently the prefix/wildcard dropped. As these are super simple and not 
impacted by cleanups being done around RegExp, I think instead the 
perf-difference is in the guts of MultiTermQuery where it uses DocIdSetBuilder?

*note that I haven't confirmed this and it is just a suspicion*

So I think it may be LUCENE-10289 changes? e.g. doing loops with {{long}} 
instead of {{int}} like before, we know these are slower in java.

I will admit, I'm a bit confused why we made this change since lucene docids 
can only be {{int}}.

Maybe we get the performance back for free, with JDK18/19 which are optimizing 
loops on {{long}} better? So I'm not arguing that we burn a bunch of time to 
fix this, but just opening the issue.

cc [~ivera]
    Environment:     (was: Recently the prefix/wildcard dropped. As these are 
super simple and not impacted by cleanups being done around RegExp, I think 
instead the perf-difference is in the guts of MultiTermQuery where it uses 
DocIdSetBuilder?

*note that I haven't confirmed this and it is just a suspicion*

So I think it may be LUCENE-10289 changes? e.g. doing loops with {{long}} 
instead of {{int}} like before, we know these are slower in java.

I will admit, I'm a bit confused why we made this change since lucene docids 
can only be {{int}}.

Maybe we get the performance back for free, with JDK18/19 which are optimizing 
loops on {{long}} better? So I'm not arguing that we burn a bunch of time to 
fix this, but just opening the issue.)

> investigate prefix/wildcard perf drop in nightly benchmarks
> -----------------------------------------------------------
>
>                 Key: LUCENE-10299
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10299
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>            Priority: Major
>
> Recently the prefix/wildcard dropped. As these are super simple and not 
> impacted by cleanups being done around RegExp, I think instead the 
> perf-difference is in the guts of MultiTermQuery where it uses 
> DocIdSetBuilder?
> *note that I haven't confirmed this and it is just a suspicion*
> So I think it may be LUCENE-10289 changes? e.g. doing loops with {{long}} 
> instead of {{int}} like before, we know these are slower in java.
> I will admit, I'm a bit confused why we made this change since lucene docids 
> can only be {{int}}.
> Maybe we get the performance back for free, with JDK18/19 which are 
> optimizing loops on {{long}} better? So I'm not arguing that we burn a bunch 
> of time to fix this, but just opening the issue.
> cc [~ivera]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10299) investigate prefix/wildcard perf drop in nightly benchmarks

Reply via email to