Re: [PR] Compute facets while collecting [lucene]

via GitHub Mon, 15 Jul 2024 05:40:41 -0700


epotyom commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2228412566


   @stefanvodita thank you for reviewing! 
   
   > Do you have any benchmark results to share? Hooking this into luceneutil 
might be tricky. I think even a vague sense of where this is better/worse than 
the existing facets would help.
   
   Our local test results show that throughput and CPU utilization are the 
same, but sandbox module has better latency. Although that doesn't mean that 
throughput and CPU util are the same in all cases - moving counting to a 
separate entity does add some overhead, as well as merging results in `reduce` 
call. We could also overlook some other efficiency improvements from the facets 
module.
   
   I've started looking into adding performance test tasks to luceneutil, but I 
think it requires some changes in current tasks as well, as we only seem to 
measure counting itself, but not query execution? And in the sandbox module we 
can't measure it separately. I'll continue looking into it.
   
   > Could this go into 9x? Have you tried?
   
   Yes, it is backwards compatible with 9x, there are some conflicts to solve 
when back porting though. I can share the commits if/when we want to backport.
   
   > This to make sure - do we need to mark things @lucene.experimental if 
they're in the sandbox?
   
   I checked a few other classes in sandbox module, they don't have the tag, so 
I suppose it goes without saying? But I can add it just in case, WDYT?
   
   > As I was saying, I haven't had a good look at the tests yet, but there're 
only a few for a very big PR. Are we sure we're covering everything? I like 
per-class unit tests when we can have them, although I understand that is a lot 
of work.
   
   Ah no, we only have some basic unit tests right now, we can definitely 
improve them, but I'd suggest we do it as a follow up step when/if the change 
is merged. It should be safe enough to do it that way as this is a sandbox 
module. As for non-sandbox changes, I believe existing unit tests cover 
existing APIs, and for newly added public methods we also have some coverage, 
and we can improve it later.
   
   > This replaces taxonomy and SSDV facets both, right? Together with point 2 
then - what would become obsolete if this goes ahead, i.e. will this be 
strictly better than the other facets or will we end up with 3 implementations 
instead of 2?
   
   "replaces" is a strong word :-) Firstly, the change doesn't touch indexing 
changes, so users still have to choose between side car taxonomy index and 
SSDV. As for searching, both taxonomy and SSDV based facets can be computed 
with the new module, although we haven't implemented SSDV support yet. There 
are quite a few other TODO items as well, both for performance and features, so 
I think it is too early to talk about replacing existing facets module.
   
   Even when the TODOs are implemented, we might never want to actually 
deprecate existing facets module. Except for potential performance impact for 
other users, there are other limitations. For example, in current facets module 
we collect document IDs first, which allows deciding which facets to compute 
based on collected docs. Not sure if anyone does it, but someone might? We can 
extend the new module to support that case too by first collecting doc IDs, and 
then iterating through all docs and computing facets. It can still be faster to 
do it with the new module, as it requires single doc ID loop for all facets 
rather than a loop per each Facets. But I guess what I'm saying is that it 
might be too early to discuss replacing existing facets module at this stage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

Reply via email to