RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2129984751

   Thanks for the suggestion. Above suggestion for clustering within the 
segment does improves skipping of documents (especially when combined with [BKD 
optimisation](https://github.com/apache/lucene-solr/pull/1351) to skip non 
competitive documents). But it still limits us from building multiple 
optimisations which could be done by having separate DWPT pools for different 
groups:
   
   - Having separate pool of DWPTs (thus creating separate segments) for 
different groups, will also reduce the cardinality of values within a segment 
for a field. Optimisation like [precomputing aggregations with StartTree 
index](https://github.com/opensearch-project/OpenSearch/issues/12498) tends to 
perform better when cardinality of the field is not too high. 
   - With the above approach, size of the segments can be still high. If we 
store more relevant logs (like 5xx and 4xx) in a different segments than less 
relevant ones (like 2xx), size of segments containing error and fault logs will 
be smaller (since error logs are generally less). This will help us to do 
storage optimisations like storing more relevant logs (like 5xx logs) on  hot 
storage (like on the node's disk) whereas less relevant logs can be directly 
stored in cheaper remote storage (e.g.: AWS S3, Google Cloud Storage, MinIO, 
etc.).
   
   Actually, we won't be able to build any more optimizations on top of the 
segment topology if we store them together. Let me know if this makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to