shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1148199584
> I think the biggest difference between our implementations is the naming scheme. That's right. We could get both APIs to the same state of extensibility, but the main difference is the naming. Under the hood it's about encoding a series of `long[]` and implementing some aggregation function over them. > Maybe something like `RangedAndExactFacetSetMatcher` that lets user specify which which dimensions they want as exact matches, the details don't matter too much right now. We could totally offer that out-of-the-box, but I prefer that we do so in a different PR. Not sure about the API details yet, and whether it can truly be a generic impl or not, but it's totally something we should think about. > but I still feel like overriding the `matches` function is still a bit of an expert use case and no super clear maybe? Indeed, implementing your own `FacetSetMatcher` is expert? I mean, OOTB we offer few generic implementations which you can use to pretty much implement your facet set aggregation. If however you require a different _matching_ impl, or more specialized one, you can implement your own `FacetSetMatcher`. Also, given the OOTB examples, one should easily be able to understand how to implement their own. > So instead of letting the user override `match()`, Just so I understand and clarify that we're talking about the same thing: there are two classes here - `MatchingFacetSetCounts` (MFSC) and `FacetSetMatcher` (FSM). I think that MFSC can be `final` since I don't expect users to extend it. All they need is to provide a list of FSMs. Extending FSM is what you refer to here? This is indeed the more _expert_ API, which I hope users won't have to extend, given the OOTB classes. > we can create an `abstract readToLong()` that would read the field that the users created and stored So I wanted to allow FSM impls to be as optimal as they need, hence they are given a `byte[]`. We could consider passing `long[]` in the API, but that would mean that MFSC needs to deserialize the bytes to longs, which is what we wanted to avoid in the HyperRectangle impl. For this reason I don't think FSM should make you read the bytes into longs, let the impl do that if it's more convenient for it. Maybe we'll even find out that deserializing them to `long[]` (or reading them one `long` at a time) performs better, especially if the same dimension is evaluated over and over in a single `match()` call? I would also like to add that this sort of API is always something we can add in the future, in the form of `LongFSM` which overrides `match()` and adds its own `abstract long readLong(...)` version. That is, I'm not sure we need to finalize the details of this API just yet? > What if we made that `abstract FacetSet<T>` and provide an `abstract long[] writeToLong(T... values)`. Do you see more than two impls here? I.e. I put `FacetSet` but I wrote that we should also have `DoubleFacetSet` which encodes doubles. Any other impls here? Also, I'm not sure we should offer this extension to users, they can always implement that themselves? If you think about it, we eventually utilize BinaryDV to encode some data and aggregate it. The indexing + reading part of the code is not that complex so users can just write their own, and reuse existing classes where they fit? If users can decide how to write their `long[]`, they will also need to implement the reader, which in this PR is MFSC? If we'll change the aggregation API to be `long`-based then maybe? I think though that we may not need the generic here anyway. `FacetSet` can store `long[]` with a default impl for `toBytes[]` and you can override it if you wish. But this is super-expert API IMO, that I think we should expose only later if the need arises. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org