gsmiller commented on PR #13568: URL: https://github.com/apache/lucene/pull/13568#issuecomment-2249005915
I've spent some time wrapping my head around the proposed change but haven't looked at everything in detail yet. I wanted to provide some of my early questions and feedback though to see if any of these points have already been discussed or considered, etc. These are more high-level and structural, which is what I'm focusing on to start before providing more detailed feedback. In no particular order... 1. In general, it seems like we expose a lot of abstractions to the end-user to deal with. It would be nice if this could be simplified by either simplifying the need for some of the abstractions, or by adding some helper/sugar implementation to make doing common tasks easier. Just starting from the demo code (which is where I started), it feels like we push users right in the deep end trying to interact with all these concepts (recorders, cutters, bi-maps, chained iterators). 2. The java package organization "smells" a little funny to be at first. I wonder if it's organized well to limit API surface area? I will look at this in more detail in another pass, but I wonder if all the java sub-packages might be forcing some class definitions to have broader visibility than necessary (where-as a flatter package structure might allow a tighter API surface for users). 3. I'm struggling with the "ordinal abstraction" that's being forced on all the non-taxonomy faceting. It doesn't feel right. Maybe it's the best thing to do here (I don't have any other suggestions at the moment), but it certainly feels like we're forcing these faceting implementations that naturally interact with distinct long values or user-defined ranges into a paradigm where they have to "speak ordinal" so they can work with these protocol abstractions everywhere. My gut feel on this is that we're trying to take the protocol abstractions too far. I'll keep thinking about this but I'd like to hear more from you on how you arrived at this design since I'm sure you've thought a lot more about it. 4. The iterator logic (the "read" side after you do the accumulations) feels a little clunky to me at the moment. A couple thoughts: (a) Some of this seems very similar to the interaction between DocIdSetIterator and Scorer and I wonder if we share some idea from that. Specifically, what if the "recorders" expose ordinal iterators that act as shared state? And instead of having to call back to something like "getCount" with a specific ordinal, you just call "getCount" with no argument and get the count of the currently-positioned shared ordinal? The iteration logic likely has to iterate keys and values at the same time, so then you wouldn't be doing another map lookup on the "getValue" callbacks. (b) The comparable factory stuff feels a little awkward (the `OrdToComparable` stuff). Is there a way to have a more functional interface here than the need to "create" new comparable objects constantly (yes, I saw the reuse thing, but I wonder if we can simplify further)? More specifical ly, instead of having `T #getComparable(ord, ...)` could you not just have a `int #compareTo(ord)` method? 5. This is a little lower-level feedback, but everything that implements `GetOrd` also extends `SkeletalGetOrd`. Can we collapse these to simplify? Could we just have an abstract class like `OrdinalProvider` or something? I think the big one for me to resolve is this whole concept of trying to make non-taxonomy faceting "look like" taxonomy faceting by making it fit this "ordinal iteration" paradigm. I want to make sure we really feel good about that. If I was a first-time user to this module (which is how I essentially approached looking at this PR, starting with the demo code) I think I would be fairly confused and find it to be a bit clunky to work with. I'm sure you exhausted all other options you could think of already, but I'd like to get up-to-speed on that thinking. Thanks again for all the hard work! It's exciting to see all this progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org