[ 
https://issues.apache.org/jira/browse/LUCENE-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531981#comment-17531981
 ] 

Rishabh Kumar Maurya commented on LUCENE-10509:
-----------------------------------------------

{quote}I suspect that this is due to LUCENE-9663, which improved compression of 
the terms dictionary. This affected OpenSearch because the cardinality 
aggregation performs value lookups on each document. You should open an issue 
against OpenSearch to change the way cardinality aggregations run to collect 
matching ordinals into a bitset first, and only look up values once the entire 
segment has been collected, this should address the performance problem and 
will likely make the cardinality aggregation faster than it was before Lucene 
8.9.
{quote}
[~jpountz] Looks like 
[OrdinalsCollector|https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java#L258]
 works precisely as you described - collecting ordinals and then performing 
value lookups once per segment, there could be optimization related to 
single-valued SortedSetDocValues? otherwise it looks perfect!

DirectCollector seems to be taking precedence in this case, and thus making 
things slow. OrdinalsCollectors does comes with a cost of more memory 
consumption and DirectCollector seems to be pretty useful for low cardinality 
use-cases. This might not be the right forum, but you being author of this 
code, would you suggest change in logic of choosing one collector over the 
other for use-cases like these?

> Performance degraded after upgrade from 8.8.2 to 8.9.0
> ------------------------------------------------------
>
>                 Key: LUCENE-10509
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10509
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 8.8.2
>            Reporter: Rajesh T
>            Priority: Minor
>
> We are planning to upgrade from elasticsearch 7.7.1 to opensearch 1.1.0 (with 
> lucene version 8.9.0). We have noticed that the performance of opensearch 
> 1.1.0 is worse than elasticsearch 7.7.1 due to lucene 8.9.0 version. Whereas 
> performance of opensearch 1.0.1 (with lucene version 8.8.2) is almost same as 
> elasticsearch 7.7.1 version.
> We have tested following scenarios and observed the slowness is caused by 
> lucene 8.9.0 version. The performance is degraded by 50% for cardinality 
> aggregations.
> Elasticsearch 7.7.1 version (with lucene-core 8.5.1) : Fast
> OpenSearch 1.0.1 version (with lucene-core 8.8.2) : Fast
> OpenSearch 1.1.0 version (with lucene-core 8.9.0) : Slow
> OpenSearch 1.1.0 version (with lucene-core 8.8.2) : Fast
> This is the snippet of openseach code which is running slow with lucene 8.9.0
> {code:java}
> QueryBuilder qb = 
> QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("__id.keyword", 
> randomValue));
> CardinalityAggregationBuilder agg = AggregationBuilders
>         .cardinality("somename")
>         .field("__id.keyword");
> return client.prepareSearch(index).setQuery(qb).addAggregation(agg);
>  {code}
> Please let us know if this is something that can be fixed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to