[GitHub] [lucene] gsmiller commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

GitBox Thu, 09 Jun 2022 11:44:45 -0700


gsmiller commented on PR #841:
URL: https://github.com/apache/lucene/pull/841#issuecomment-1151477547


   Thanks @shaie! I'll have a look at the code you pushed. Before doing so, I'm 
going to take the opportunity to respond to your and Marc's questions/comments 
above :)
   
   @mdmarshmallow:
   > When we decide to optimize this (right after this PR is merged ideally), 
we would let MatchingFacetSetsCount be able to take a look at the FSM's passed 
to it and then determine if it should put the FSM into an R tree, KD tree, or 
just linearly scan based on the min and max of eachFSM. I think this makes 
sense, but we also shouldn't discuss it too much here as I think this is for 
another PR. I think the point is we can optimize the facetsets package in it's 
current state. With that being said, I do plan on writing the KD and R tree 
optimizations as soon as this is merged so I am still for this remaining a 
long[] API.
   
   Right, that's how I'm thinking of this. It's actually not too different than 
what the existing range faceting implementation does. That implementation looks 
at the requested ranges and determines how to optimize. Specifically, if none 
of the ranges overlap, it does one thing, but if any ranges do overlap, it 
takes a different approach. The caller isn't aware of any of this (nor should 
they be). I see this as similar (although a different problem). The caller 
describes the "matches" they want to facet on and `MatchingFacetSetCount` 
figures out the most optimal way to do that.
   
   @shaie:
   > [...] why we discussed having a fastMatchQuery optimization too [...]
   +1, makes sense
   
   > I think it's fine if we'll leave these optimizations for later, and even 
if that will change the API between MFSC and FSM, it's not a big deal yet.
   +1
   
   > We certainly can add such API. For "exact" matches it will return min=max 
right? Only for range ones they will be different. Are you proposing to return 
a min[] and max[] arrays, one entry per dimension? Just to make sure I 
understood your proposal (it doesn't have to be two arrays, but you understand 
the question).
   Right, that's how I'm thinking of it.
   
   > Not intending to start a discussion on how to implement that, but just 
wanted to point out that fastMatchQuery is something we'll need anyway (I 
guess) for drilldown, therefore it might be worth to start with it first? And, 
I'd rather we have some baseline benchmark before we implement any 
optimization, preferably for several use cases, so that in the end can propose 
which impl to use. E.g. maybe if you pass 1-2 FSMs, we won't need to create 
R/KD trees (they might even perform worse)? Anyway let's leave that for later.
   +1, totally agree. After this initial implementation, I would propose we 
create some benchmark tasks that utilize it so we can have a reasonably 
principled approach to optimizing.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #841: LUCENE-10274: Add hyperrectangle faceting capabilities

Reply via email to