gsmiller commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1151477547
Thanks @shaie! I'll have a look at the code you pushed. Before doing so, I'm going to take the opportunity to respond to your and Marc's questions/comments above :) @mdmarshmallow: > When we decide to optimize this (right after this PR is merged ideally), we would let MatchingFacetSetsCount be able to take a look at the FSM's passed to it and then determine if it should put the FSM into an R tree, KD tree, or just linearly scan based on the min and max of eachFSM. I think this makes sense, but we also shouldn't discuss it too much here as I think this is for another PR. I think the point is we can optimize the facetsets package in it's current state. With that being said, I do plan on writing the KD and R tree optimizations as soon as this is merged so I am still for this remaining a long[] API. Right, that's how I'm thinking of this. It's actually not too different than what the existing range faceting implementation does. That implementation looks at the requested ranges and determines how to optimize. Specifically, if none of the ranges overlap, it does one thing, but if any ranges do overlap, it takes a different approach. The caller isn't aware of any of this (nor should they be). I see this as similar (although a different problem). The caller describes the "matches" they want to facet on and `MatchingFacetSetCount` figures out the most optimal way to do that. @shaie: > [...] why we discussed having a fastMatchQuery optimization too [...] +1, makes sense > I think it's fine if we'll leave these optimizations for later, and even if that will change the API between MFSC and FSM, it's not a big deal yet. +1 > We certainly can add such API. For "exact" matches it will return min=max right? Only for range ones they will be different. Are you proposing to return a min[] and max[] arrays, one entry per dimension? Just to make sure I understood your proposal (it doesn't have to be two arrays, but you understand the question). Right, that's how I'm thinking of it. > Not intending to start a discussion on how to implement that, but just wanted to point out that fastMatchQuery is something we'll need anyway (I guess) for drilldown, therefore it might be worth to start with it first? And, I'd rather we have some baseline benchmark before we implement any optimization, preferably for several use cases, so that in the end can propose which impl to use. E.g. maybe if you pass 1-2 FSMs, we won't need to create R/KD trees (they might even perform worse)? Anyway let's leave that for later. +1, totally agree. After this initial implementation, I would propose we create some benchmark tasks that utilize it so we can have a reasonably principled approach to optimizing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org