epotyom commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2228412566
@stefanvodita thank you for reviewing! > Do you have any benchmark results to share? Hooking this into luceneutil might be tricky. I think even a vague sense of where this is better/worse than the existing facets would help. Our local test results show that throughput and CPU utilization are the same, but sandbox module has better latency. Although that doesn't mean that throughput and CPU util are the same in all cases - moving counting to a separate entity does add some overhead, as well as merging results in `reduce` call. We could also overlook some other efficiency improvements from the facets module. I've started looking into adding performance test tasks to luceneutil, but I think it requires some changes in current tasks as well, as we only seem to measure counting itself, but not query execution? And in the sandbox module we can't measure it separately. I'll continue looking into it. > Could this go into 9x? Have you tried? Yes, it is backwards compatible with 9x, there are some conflicts to solve when back porting though. I can share the commits if/when we want to backport. > This to make sure - do we need to mark things @lucene.experimental if they're in the sandbox? I checked a few other classes in sandbox module, they don't have the tag, so I suppose it goes without saying? But I can add it just in case, WDYT? > As I was saying, I haven't had a good look at the tests yet, but there're only a few for a very big PR. Are we sure we're covering everything? I like per-class unit tests when we can have them, although I understand that is a lot of work. Ah no, we only have some basic unit tests right now, we can definitely improve them, but I'd suggest we do it as a follow up step when/if the change is merged. It should be safe enough to do it that way as this is a sandbox module. As for non-sandbox changes, I believe existing unit tests cover existing APIs, and for newly added public methods we also have some coverage, and we can improve it later. > This replaces taxonomy and SSDV facets both, right? Together with point 2 then - what would become obsolete if this goes ahead, i.e. will this be strictly better than the other facets or will we end up with 3 implementations instead of 2? "replaces" is a strong word :-) Firstly, the change doesn't touch indexing changes, so users still have to choose between side car taxonomy index and SSDV. As for searching, both taxonomy and SSDV based facets can be computed with the new module, although we haven't implemented SSDV support yet. There are quite a few other TODO items as well, both for performance and features, so I think it is too early to talk about replacing existing facets module. Even when the TODOs are implemented, we might never want to actually deprecate existing facets module. Except for potential performance impact for other users, there are other limitations. For example, in current facets module we collect document IDs first, which allows deciding which facets to compute based on collected docs. Not sure if anyone does it, but someone might? We can extend the new module to support that case too by first collecting doc IDs, and then iterating through all docs and computing facets. It can still be faster to do it with the new module, as it requires single doc ID loop for all facets rather than a loop per each Facets. But I guess what I'm saying is that it might be too early to discuss replacing existing facets module at this stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org