bharath-techie commented on issue #13188: URL: https://github.com/apache/lucene/issues/13188#issuecomment-2084365307
Thanks for the inputs @msokolov . I do see the similarities but the linked issue seems to be tied to rollups done as part of merge aided by index sorting on the dimensions. Index sorting is quite expensive. The difference here is that, all the computation is deferred to the format and its custom logic. And query time gains could be higher as we are using efficient cubing structures. For star tree implementation, the algorithm sorts the dims and then aggregates during flush , the successive merges just need to sort and aggregate the compacted, sorted data cube structures. So W.r.t performance , if the dimensions are of relatively lower cardinality , then there is minimal impact on index-append throughput ( < 2% ) as the difference is mainly due to write threads helping during flush. ( reference in OpenSearch [RFC](https://github.com/opensearch-project/OpenSearch/issues/12498#issuecomment-1971671707) ) There are some cons here as well , - If we need to account for deletes, probably we will need similar solution such as deleted docs iterator proposed [here](https://lists.apache.org/thread/jxczhgn5loqwn10xrb30k2hg0jrbovcs) and do decrements during merge/query. [ still cost is proportional to number of deleted docs ] Maybe we can add this incrementally ? - We will need to add guardrails against high cardinality dimensions - as it slows down indexing as sort and aggregation becomes expensive , and data cubes will also be less effective overall + advantages in storage size etc lowers. Let me know your thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org