[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

GitBox Mon, 06 Jun 2022 22:02:48 -0700


shaie commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1148199584


   > I think the biggest difference between our implementations is the naming 
scheme.
   
   That's right. We could get both APIs to the same state of extensibility, but 
the main difference is the naming. Under the hood it's about encoding a series 
of `long[]` and implementing some aggregation function over them.
   
   > Maybe something like `RangedAndExactFacetSetMatcher` that lets user 
specify which which dimensions they want as exact matches, the details don't 
matter too much right now.
   
   We could totally offer that out-of-the-box, but I prefer that we do so in a 
different PR. Not sure about the API details yet, and whether it can truly be a 
generic impl or not, but it's totally something we should think about.
   
   > but I still feel like overriding the `matches` function is still a bit of 
an expert use case and no super clear maybe?
   
   Indeed, implementing your own `FacetSetMatcher` is expert? I mean, OOTB we 
offer few generic implementations which you can use to pretty much implement 
your facet set aggregation. If however you require a different _matching_ impl, 
or more specialized one, you can implement your own `FacetSetMatcher`. Also, 
given the OOTB examples, one should easily be able to understand how to 
implement their own.
   
   > So instead of letting the user override `match()`,
   
   Just so I understand and clarify that we're talking about the same thing: 
there are two classes here - `MatchingFacetSetCounts` (MFSC) and 
`FacetSetMatcher` (FSM). I think that MFSC can be `final` since I don't expect 
users to extend it. All they need is to provide a list of FSMs. Extending FSM 
is what you refer to here? This is indeed the more _expert_ API, which I hope 
users won't have to extend, given the OOTB classes.
   
   > we can create an `abstract readToLong()` that would read the field that 
the users created and stored
   
   So I wanted to allow FSM impls to be as optimal as they need, hence they are 
given a `byte[]`. We could consider passing `long[]` in the API, but that would 
mean that MFSC needs to deserialize the bytes to longs, which is what we wanted 
to avoid in the HyperRectangle impl. For this reason I don't think FSM should 
make you read the bytes into longs, let the impl do that if it's more 
convenient for it. Maybe we'll even find out that deserializing them to 
`long[]` (or reading them one `long` at a time) performs better, especially if 
the same dimension is evaluated over and over in a single `match()` call?
   
   I would also like to add that this sort of API is always something we can 
add in the future, in the form of `LongFSM` which overrides `match()` and adds 
its own `abstract long readLong(...)` version. That is, I'm not sure we need to 
finalize the details of this API just yet?
   
   > What if we made that `abstract FacetSet<T>` and provide an `abstract 
long[] writeToLong(T... values)`.
   
   Do you see more than two impls here? I.e. I put `FacetSet` but I wrote that 
we should also have `DoubleFacetSet` which encodes doubles. Any other impls 
here? Also, I'm not sure we should offer this extension to users, they can 
always implement that themselves? If you think about it, we eventually utilize 
BinaryDV to encode some data and aggregate it. The indexing + reading part of 
the code is not that complex so users can just write their own, and reuse 
existing classes where they fit?
   
   If users can decide how to write their `long[]`, they will also need to 
implement the reader, which in this PR is MFSC? If we'll change the aggregation 
API to be `long`-based then maybe? I think though that we may not need the 
generic here anyway. `FacetSet` can store `long[]` with a default impl for 
`toBytes[]` and you can override it if you wish. But this is super-expert API 
IMO, that I think we should expose only later if the need arises.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shaie commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

Reply via email to