RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2129984751
Thanks for the suggestion. Above suggestion for clustering within the segment does improves skipping of documents (especially when combined with [BKD optimisation](https://github.com/apache/lucene-solr/pull/1351) to skip non competitive documents). But it still limits us from building multiple optimisations which could be done by having separate DWPT pools for different groups: - Having separate pool of DWPTs (thus creating separate segments) for different groups, will also reduce the cardinality of values within a segment for a field. Optimisation like [precomputing aggregations with StartTree index](https://github.com/opensearch-project/OpenSearch/issues/12498) tends to perform better when cardinality of the field is not too high. - With the above approach, size of the segments can be still high. If we store more relevant logs (like 5xx and 4xx) in a different segments than less relevant ones (like 2xx), size of segments containing error and fault logs will be smaller (since error logs are generally less). This will help us to do storage optimisations like storing more relevant logs (like 5xx logs) on hot storage (like on the node's disk) whereas less relevant logs can be directly stored in cheaper remote storage (e.g.: AWS S3, Google Cloud Storage, MinIO, etc.). Actually, we won't be able to build any more optimizations on top of the segment topology if we store them together. Let me know if this makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org