gsmiller commented on code in PR #12089:
URL: https://github.com/apache/lucene/pull/12089#discussion_r1072871141
##
lucene/core/src/java/org/apache/lucene/search/DisiWrapper.java:
##
@@ -57,4 +57,14 @@ public DisiWrapper(Scorer scorer) {
matchCost = 0f;
}
}
+
+ p
gsmiller commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416767759
@rmuir I modified the PR to update the existing `TermInSetQuery` in-place,
instead of introducing a new sandbox concept. I've added some test coverage and
re-ran all the benchmarks. The
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416779603
Hmm, @gsmiller I really personally don't like all this complexity and
optimizations mixed into `TermInSetQuery`. It becomes almost unmanageable.
This is why I like the IndexOrDocValu
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416780926
and when i look at the benchmarks, there are cases where IndexOrDV does
better, and there are cases where this one does worse. I'm not seeing a "clear
winner", so i'd personally lean towa
almogtavor commented on issue #11870:
URL: https://github.com/apache/lucene/issues/11870#issuecomment-1416785510
@rmuir @dweiss @uschindler Do you think that a PR in this subject that
creates the initial Docusaurus website with some basic Getting Started sections
will get accepted? I genera
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416786686
another way to say it: I see a lot more optimizations and conditionals in
the query than I see wins in the benchmark?
which of these truly matter? Can we remove some to simplify the
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416790181
i'll work up a draft PR trying to work it from the opposite direction (keep
queries split apart) so my suggestions will be more concrete. You did the hard
part already and made a benchmark
rmuir opened a new pull request, #12127:
URL: https://github.com/apache/lucene/pull/12127
There's no need to make things abstract: DocValues does the right thing
Optimizing for where no docs for the field in the segment exist is easy, simple
null check (replacing the existing one!)
--
rmuir commented on PR #12127:
URL: https://github.com/apache/lucene/pull/12127#issuecomment-1416834391
one nice benefit is, not all of these N-subclasses were checking for the
case where field doesn't exist in segment and optimizing it by returning `null`
Scorer, now the case is always opti
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416839513
I think some of the benchmarks needs to be revisited here anyway.
`DocValuesTermsQuery` is not in great shape, which may lead to bad results from
IndexOrDocValuesQuery and cause confusion.
rmuir opened a new pull request, #12128:
URL: https://github.com/apache/lucene/pull/12128
LongHashSet is used for the set of numbers, but it has some issues:
* tries too hard to extend AbstractSet, mostly for testing
* causes traps with boxing if you aren't careful
* complex hashcode
rmuir opened a new pull request, #12129:
URL: https://github.com/apache/lucene/pull/12129
* Optimize the common case that docs only have single values for the field
* In the multivalued case, terminate reading docvalues if they are > maximum
set ordinal
* Implement ScorerSupplier, so t
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416888408
I fired up 3 related PRs to at least make it more of a fair fight :)
#12127
#12128
#12129
--
This is an automated message from the Apache Git Service.
To respond to t
rmuir commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416891188
> Anyway, back to the point about complexity vs. benefit, I 100% agree that
relying on `IndexOrDocValues` would be preferable if we can solve for cost
over-estimating. I'd pursued this pat
gsmiller commented on PR #12089:
URL: https://github.com/apache/lucene/pull/12089#issuecomment-1416895938
Thanks @rmuir! I'll have a look at the PRs tomorrow. I've got a bad case of
jet lag kicking in at the moment and my brain is turning to mush. In the
meantime, I took a pass at simplifyi
rmuir commented on PR #12129:
URL: https://github.com/apache/lucene/pull/12129#issuecomment-1416909246
moved this thing to `Sorted(Set)DocValuesField.newSlowSetQuery` as it is a
much better API to document its slowness safe;y and to encapsulate the guts.
it does not yet claim to help
rmuir commented on PR #12129:
URL: https://github.com/apache/lucene/pull/12129#issuecomment-1416915982
Another use-case this works with is for the IP address types, which is
really just some sugar over `BinaryPoint`:
`IndexOrDocValuesQuery(InetAddressPoint.newSetQuery(..),
SortedSetDocValu
17 matches
Mail list logo