dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2730909459
BTW I don't have plans to explore this further. Anyone should feel free to
take over. Or abandon if nobody cares -- I admit it's very unusual to even
have a top level disjunction, let
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729988278
The current approach is probably not the fastest indeed. We should add a
task to nightly benchmarks if we want to optimize this. Something like a
disjunction of phrase queries (possibly
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2729965328
An aside: `org.apache.lucene.search.DisjunctionScorer.TwoPhase#matches`
looks kind of sad, in that each matches() call is going to build a priority
queue of "unverified matches" (DisiWr
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2728182572
I could imagine improving BooleanScorer so that the TPI clauses are
separated and converted to a filter around the collector to try to match docs
*not* collected (i.e. test for docs inbe
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727625240
> If one or more DISI has a high cost (irrespective of TPIs), thus matching
many docs, I could see avoiding BS1 as well.
I imagine that your idea is that if most of the cost comes
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727629320
In case you missed it, `BooleanScorer` had optimizations recently that make
it hard to beat by `DisjunctionScorer` when clauses are `PostingsEnum`s:
- `DocIdSetIterator#intoBitSet` he
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727528269
If one or more DISI has a high cost (irrespective of TPIs), thus matching
many docs, I could see avoiding BS1 as well.
An aside, if we are going to refer to these as BS1 vs BS2, th
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727502419
BS2 uses a heap to merge multiple `DocIdSetIterator`s. Unfortunately,
reordering this heap on every call to `nextDoc()` or `advance(int)` is not
completely free and BS1's approach of loa
dsmiley commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727499162
Thanks for your confirmation of the problem. The collect-per-clause is
surprising to me; like what would benefit from that algorithm? Wouldn't that
_only_ be in fact _needed_ if scores
jpountz commented on PR #14357:
URL: https://github.com/apache/lucene/pull/14357#issuecomment-2727461724
+1 let's use `DisjunctionSumScorerwhich` (which already supports two-phase
iteration) when one of the clauses exposes a non-null two-phase iterator?
--
This is an automated message fro
dsmiley opened a new pull request, #14357:
URL: https://github.com/apache/lucene/pull/14357
Showing a performance problem here in BooleanScorer (used for disjunctions
-- "OR"). BS will score all its clauses indepenently, overlapping the same
documents, some of which might be expensive wit
11 matches
Mail list logo