jainankitk commented on PR #14439:
URL: https://github.com/apache/lucene/pull/14439#issuecomment-2794969733

   > I didn't mean to imply that the two solutions are the same, apologies if 
that's how it came across.
   
   Not at all. Even I was initially confused with skipper logic, only after 
spending some time realized this approach is slightly different. So, thanks for 
reiterating the question.
   
   > I think you could start in HistogramCollector.getLeafCollector 
([code](https://github.com/apache/lucene/blob/4957766fcee52c534d786e3948fadf6d36c9779f/lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java#L50)).
 Right now we throw an exception if the field we're using isn't doc values 
([code](https://github.com/apache/lucene/blob/4957766fcee52c534d786e3948fadf6d36c9779f/lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java#L59)).
   
   Currently, `Collector` doesn't need to be aware of the `Query` itself. They 
are designed to collect individual docId or using `DocIdStream` from the 
scorer. But this `CustomCollector`, does not need the scorer to provide 
documents, but can `BulkCollect` documents, assuming `MATCH_ALL` or 
`PointRangeQuery` (where `PointRangeQuery.field == histogram.field`). 
Otherwise, it should fallback to traditional methods for collecting matching 
documents.
   
   
   > At a higher level, I'm curious if you had a use-case in mind.
   
   This optimization can be applied to following use cases:
   * Number of sale based on the price range (0-50, 50-100, 100-250,.....)
   * Number of visits on website for each day in a month
   
   Just as a data point, this change helped us improve date histogram latency 
from 5168 ms to 160 ms (~32x!!) for big5 workload in OpenSearch
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to