[ https://issues.apache.org/jira/browse/LUCENE-10509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531981#comment-17531981 ]
Rishabh Kumar Maurya commented on LUCENE-10509: ----------------------------------------------- {quote}I suspect that this is due to LUCENE-9663, which improved compression of the terms dictionary. This affected OpenSearch because the cardinality aggregation performs value lookups on each document. You should open an issue against OpenSearch to change the way cardinality aggregations run to collect matching ordinals into a bitset first, and only look up values once the entire segment has been collected, this should address the performance problem and will likely make the cardinality aggregation faster than it was before Lucene 8.9. {quote} [~jpountz] Looks like [OrdinalsCollector|https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/aggregations/metrics/CardinalityAggregator.java#L258] works precisely as you described - collecting ordinals and then performing value lookups once per segment, there could be optimization related to single-valued SortedSetDocValues? otherwise it looks perfect! DirectCollector seems to be taking precedence in this case, and thus making things slow. OrdinalsCollectors does comes with a cost of more memory consumption and DirectCollector seems to be pretty useful for low cardinality use-cases. This might not be the right forum, but you being author of this code, would you suggest change in logic of choosing one collector over the other for use-cases like these? > Performance degraded after upgrade from 8.8.2 to 8.9.0 > ------------------------------------------------------ > > Key: LUCENE-10509 > URL: https://issues.apache.org/jira/browse/LUCENE-10509 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 8.8.2 > Reporter: Rajesh T > Priority: Minor > > We are planning to upgrade from elasticsearch 7.7.1 to opensearch 1.1.0 (with > lucene version 8.9.0). We have noticed that the performance of opensearch > 1.1.0 is worse than elasticsearch 7.7.1 due to lucene 8.9.0 version. Whereas > performance of opensearch 1.0.1 (with lucene version 8.8.2) is almost same as > elasticsearch 7.7.1 version. > We have tested following scenarios and observed the slowness is caused by > lucene 8.9.0 version. The performance is degraded by 50% for cardinality > aggregations. > Elasticsearch 7.7.1 version (with lucene-core 8.5.1) : Fast > OpenSearch 1.0.1 version (with lucene-core 8.8.2) : Fast > OpenSearch 1.1.0 version (with lucene-core 8.9.0) : Slow > OpenSearch 1.1.0 version (with lucene-core 8.8.2) : Fast > This is the snippet of openseach code which is running slow with lucene 8.9.0 > {code:java} > QueryBuilder qb = > QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("__id.keyword", > randomValue)); > CardinalityAggregationBuilder agg = AggregationBuilders > .cardinality("somename") > .field("__id.keyword"); > return client.prepareSearch(index).setQuery(qb).addAggregation(agg); > {code} > Please let us know if this is something that can be fixed. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org