Re: [PR] Compute facets while collecting [lucene]

via GitHub Wed, 24 Jul 2024 15:35:47 -0700


gsmiller commented on PR #13568:
URL: https://github.com/apache/lucene/pull/13568#issuecomment-2249005915


   I've spent some time wrapping my head around the proposed change but haven't 
looked at everything in detail yet. I wanted to provide some of my early 
questions and feedback though to see if any of these points have already been 
discussed or considered, etc. These are more high-level and structural, which 
is what I'm focusing on to start before providing more detailed feedback. In no 
particular order...
   
   1. In general, it seems like we expose a lot of abstractions to the end-user 
to deal with. It would be nice if this could be simplified by either 
simplifying the need for some of the abstractions, or by adding some 
helper/sugar implementation to make doing common tasks easier. Just starting 
from the demo code (which is where I started), it feels like we push users 
right in the deep end trying to interact with all these concepts (recorders, 
cutters, bi-maps, chained iterators).
   2. The java package organization "smells" a little funny to be at first. I 
wonder if it's organized well to limit API surface area? I will look at this in 
more detail in another pass, but I wonder if all the java sub-packages might be 
forcing some class definitions to have broader visibility than necessary 
(where-as a flatter package structure might allow a tighter API surface for 
users).
   3. I'm struggling with the "ordinal abstraction" that's being forced on all 
the non-taxonomy faceting. It doesn't feel right. Maybe it's the best thing to 
do here (I don't have any other suggestions at the moment), but it certainly 
feels like we're forcing these faceting implementations that naturally interact 
with distinct long values or user-defined ranges into a paradigm where they 
have to "speak ordinal" so they can work with these protocol abstractions 
everywhere. My gut feel on this is that we're trying to take the protocol 
abstractions too far. I'll keep thinking about this but I'd like to hear more 
from you on how you arrived at this design since I'm sure you've thought a lot 
more about it.
   4. The iterator logic (the "read" side after you do the accumulations) feels 
a little clunky to me at the moment. A couple thoughts: (a) Some of this seems 
very similar to the interaction between DocIdSetIterator and Scorer and I 
wonder if we share some idea from that. Specifically, what if the "recorders" 
expose ordinal iterators that act as shared state? And instead of having to 
call back to something like "getCount" with a specific ordinal, you just call 
"getCount" with no argument and get the count of the currently-positioned 
shared ordinal? The iteration logic likely has to iterate keys and values at 
the same time, so then you wouldn't be doing another map lookup on the 
"getValue" callbacks. (b) The comparable factory stuff feels a little awkward 
(the `OrdToComparable` stuff). Is there a way to have a more functional 
interface here than the need to "create" new comparable objects constantly 
(yes, I saw the reuse thing, but I wonder if we can simplify further)? More 
specifical
 ly, instead of having `T #getComparable(ord, ...)` could you not just have a 
`int #compareTo(ord)` method?
   5. This is a little lower-level feedback, but everything that implements 
`GetOrd` also extends `SkeletalGetOrd`. Can we collapse these to simplify? 
Could we just have an abstract class like `OrdinalProvider` or something?
   
   I think the big one for me to resolve is this whole concept of trying to 
make non-taxonomy faceting "look like" taxonomy faceting by making it fit this 
"ordinal iteration" paradigm. I want to make sure we really feel good about 
that. If I was a first-time user to this module (which is how I essentially 
approached looking at this PR, starting with the demo code) I think I would be 
fairly confused and find it to be a bit clunky to work with. I'm sure you 
exhausted all other options you could think of already, but I'd like to get 
up-to-speed on that thinking. Thanks again for all the hard work! It's exciting 
to see all this progress!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

Reply via email to