bharath-techie commented on issue #13188:
URL: https://github.com/apache/lucene/issues/13188#issuecomment-2084365307

   Thanks for the inputs @msokolov . I do see the similarities but the linked 
issue seems to be tied to rollups done as part of merge aided by index sorting 
on the dimensions. Index sorting is quite expensive.  
   
   The difference here is that, all the computation is deferred to the format 
and its custom logic. And query time gains could be higher as we are using 
efficient cubing structures.
   
   For star tree implementation, the algorithm sorts the dims and then 
aggregates during flush , the successive merges just need to sort and aggregate 
the compacted, sorted data cube structures. 
   So W.r.t performance , if the dimensions are of relatively lower cardinality 
, then there is minimal impact on index-append throughput ( < 2% ) as the 
difference is mainly due to write threads helping during flush. ( reference in 
OpenSearch 
[RFC](https://github.com/opensearch-project/OpenSearch/issues/12498#issuecomment-1971671707)
 ) 
   
   There are some cons here as well , 
    - If we need to account for deletes, probably we will need similar solution 
such as deleted docs iterator proposed 
[here](https://lists.apache.org/thread/jxczhgn5loqwn10xrb30k2hg0jrbovcs) and do 
decrements during merge/query. [ still cost is proportional to number of 
deleted docs ] Maybe we can add this incrementally ?
    - We will need to add guardrails against high cardinality dimensions - as 
it slows down indexing as sort and aggregation becomes expensive , and data 
cubes will also be less effective overall + advantages in storage size etc 
lowers.
   
   Let me know your thoughts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to