Re: [PR] Add test for parsing brackets in range queries [lucene]

2024-05-14 Thread via GitHub
github-actions[bot] commented on PR #13323: URL: https://github.com/apache/lucene/pull/13323#issuecomment-2111363928 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix numDeletesToMerge for unchanged segments [lucene]

2024-05-14 Thread via GitHub
github-actions[bot] commented on PR #13324: URL: https://github.com/apache/lucene/pull/13324#issuecomment-2111363892 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix weird NRT bug #13353 [lucene]

2024-05-14 Thread via GitHub
benwtrent commented on PR #13369: URL: https://github.com/apache/lucene/pull/13369#issuecomment-2110978960 Pinging some folks on review. This is a weird place in the codebase I haven't messed with before. This deserves careful review. Checks are green, but I may have missed some corners.

[PR] Fix weird NRT bug #13353 [lucene]

2024-05-14 Thread via GitHub
benwtrent opened a new pull request, #13369: URL: https://github.com/apache/lucene/pull/13369 The issue outlines the problem. When we have point value dimensions, segment core readers assume that there will be point files. However, when allowing soft deletes and a document fails inde

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-14 Thread via GitHub
benwtrent commented on PR #13361: URL: https://github.com/apache/lucene/pull/13361#issuecomment-2110964898 > Do we still want to keep the increased k? I would rather not, we keep bumping it up, eventually we are going to stop searching in the graph altogether and just brute force, whi

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-14 Thread via GitHub
benwtrent commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2110737920 @naveentatikonda I used the dataset you linked. I simply downloaded the file. Ground truth is just the brute force nearest neighbors. I used the "test" set as the queries (1

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-14 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2110683420 > OK, I ran it again, on my index where the flush was set at 28MB & force merged. This time I ran it over all 10k queries (previously it was just 1k, as calculating the true

Re: [PR] Make `IndexInput#prefetch` take an offset. [lucene]

2024-05-14 Thread via GitHub
jpountz merged PR #13363: URL: https://github.com/apache/lucene/pull/13363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make Weight#scorerSupplier abstract, Weight#scorer final [lucene]

2024-05-14 Thread via GitHub
jpountz merged PR #13319: URL: https://github.com/apache/lucene/pull/13319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add per-field knn vector format info in SegmentInfo [lucene]

2024-05-14 Thread via GitHub
tteofili closed pull request #13367: Add per-field knn vector format info in SegmentInfo URL: https://github.com/apache/lucene/pull/13367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add per-field knn vector format info in SegmentInfo [lucene]

2024-05-14 Thread via GitHub
tteofili commented on PR #13367: URL: https://github.com/apache/lucene/pull/13367#issuecomment-2110564180 you're right @jpountz , we can probably get away with `fieldInfo.getAttribute(PerFieldKnnVectorFormat.PER_FIELD_FORMAT_KEY)`, I didn't notice that, thanks! -- This is an automated me

Re: [I] Remove Accountable interface where it's not needed [lucene]

2024-05-14 Thread via GitHub
jpountz commented on issue #13280: URL: https://github.com/apache/lucene/issues/13280#issuecomment-2110539035 Implemented via #13330. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Remove Accountable interface where it's not needed [lucene]

2024-05-14 Thread via GitHub
jpountz closed issue #13280: Remove Accountable interface where it's not needed URL: https://github.com/apache/lucene/issues/13280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Suggestion about LRUQueryCache Optimization [lucene]

2024-05-14 Thread via GitHub
jpountz closed issue #13318: Suggestion about LRUQueryCache Optimization URL: https://github.com/apache/lucene/issues/13318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Suggestion about LRUQueryCache Optimization [lucene]

2024-05-14 Thread via GitHub
jpountz commented on issue #13318: URL: https://github.com/apache/lucene/issues/13318#issuecomment-2110537906 Fixed by #13306. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Enhance DisjunctionMaxQuery explanation to include details in case there was no match [lucene]

2024-05-14 Thread via GitHub
jpountz closed issue #13357: Enhance DisjunctionMaxQuery explanation to include details in case there was no match URL: https://github.com/apache/lucene/issues/13357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add sub query explanations in DisjunctionMaxQuery#explain on no-match [lucene]

2024-05-14 Thread via GitHub
jpountz merged PR #13362: URL: https://github.com/apache/lucene/pull/13362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-14 Thread via GitHub
timgrein commented on PR #13361: URL: https://github.com/apache/lucene/pull/13361#issuecomment-2110503604 @benwtrent The beam width for the failing test case was the smallest value possible `5`. Increased the minimum to `10` according to your suggestion. Do we still want keep the increased

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-14 Thread via GitHub
benwtrent commented on PR #13361: URL: https://github.com/apache/lucene/pull/13361#issuecomment-2110476639 @timgrein what is the beamwidth set to in the failing case? We may want to increase the beamWidth size to just make the test more consistent. ``` int beamWidth = rando

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-14 Thread via GitHub
jpountz commented on code in PR #13221: URL: https://github.com/apache/lucene/pull/13221#discussion_r1600034795 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -207,102 +208,91 @@ private void updateCompetitiveIterator() throws IOExcep

Re: [PR] Add sub query explanations in DisjunctionMaxQuery#explain on no-match [lucene]

2024-05-14 Thread via GitHub
timgrein commented on PR #13362: URL: https://github.com/apache/lucene/pull/13362#issuecomment-2110435055 Thanks for the reviews! Added it to the improvement section ✅ @jpountz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Replace Map by primitive IntObjectHashMap. [lucene]

2024-05-14 Thread via GitHub
bruno-roustant commented on code in PR #13368: URL: https://github.com/apache/lucene/pull/13368#discussion_r1600171257 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/SuggestibleEntryCache.java: ## @@ -48,22 +45,33 @@ private SuggestibleEntryCache(Map bui

[PR] Replace Map by primitive IntObjectHashMap. [lucene]

2024-05-14 Thread via GitHub
bruno-roustant opened a new pull request, #13368: URL: https://github.com/apache/lucene/pull/13368 Also replace some Map by IntIntHashMap, if they don't rely on null value. The goal is to gain globally some memory, maybe some perf on some spots that call the map intensively, with a r

Re: [PR] Add per-field knn vector format info in SegmentInfo [lucene]

2024-05-14 Thread via GitHub
jpountz commented on PR #13367: URL: https://github.com/apache/lucene/pull/13367#issuecomment-2110416918 I'm a bit confused: what is the benefit of having it on segment infos in addition to field infos? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Align toString methods in geo module [lucene]

2024-05-14 Thread via GitHub
tteofili merged PR #13302: URL: https://github.com/apache/lucene/pull/13302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-14 Thread via GitHub
timgrein commented on PR #13361: URL: https://github.com/apache/lucene/pull/13361#issuecomment-2110381080 @benwtrent Without increasing `k` we'll get the following for the failing test instance: ``` TOP 1 docs: Document> 9.601536E-5 Document> 7.3713694E-5 Document>

Re: [PR] Add sub query explanations in DisjunctionMaxQuery#explain on no-match [lucene]

2024-05-14 Thread via GitHub
timgrein commented on PR #13362: URL: https://github.com/apache/lucene/pull/13362#issuecomment-2110260612 > This could make explanations harder to read for large queries, e.g. queries produced through rewriting. I wonder about doing something in-between such as only including the non-matchi

[PR] Add per-field knn vector format info in SegmentInfo [lucene]

2024-05-14 Thread via GitHub
tteofili opened a new pull request, #13367: URL: https://github.com/apache/lucene/pull/13367 When indexing vectors, it is possible to use different vector formats depending on the field; in addition to that it's also possible (although not currently implemented) to have `Codecs` that can pr

Re: [PR] Remove unnecessary bit conversion for IndexSorter [lucene]

2024-05-14 Thread via GitHub
jpountz merged PR #13320: URL: https://github.com/apache/lucene/pull/13320 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add sub query explanations in DisjunctionMaxQuery#explain on no-match [lucene]

2024-05-14 Thread via GitHub
jpountz commented on PR #13362: URL: https://github.com/apache/lucene/pull/13362#issuecomment-2110112272 This could make explanations harder to read for large queries, e.g. queries produced through rewriting. I wonder about doing something in-between such as only including the non-matching

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-14 Thread via GitHub
benwtrent commented on PR #13361: URL: https://github.com/apache/lucene/pull/13361#issuecomment-2109984343 @timgrein could you determine if the scores the same or not? I wonder if we are getting tripped up by doc IDs being the tie breaker for equal scores. -- This is an automated message

Re: [PR] Protect against nan & inf values in quantizer and test with tiny vectors [lucene]

2024-05-14 Thread via GitHub
benwtrent merged PR #13366: URL: https://github.com/apache/lucene/pull/13366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Prefetch postings data. [lucene]

2024-05-14 Thread via GitHub
jpountz commented on PR #13364: URL: https://github.com/apache/lucene/pull/13364#issuecomment-2109954553 > This is cool! In the hot case, do we expect prefetch to be a no-op? So we are hoping for "first do no harm" in that case? Yes, mostly. The benchmark I ran at https://github.com/

Re: [PR] Prefetch postings data. [lucene]

2024-05-14 Thread via GitHub
jpountz commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599846740 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -2049,6 +2074,44 @@ public long cost() { } } + private void see

Re: [PR] Prefetch postings data. [lucene]

2024-05-14 Thread via GitHub
rmuir commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599789208 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -2049,6 +2074,44 @@ public long cost() { } } + private void seekA

Re: [PR] Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 [lucene]

2024-05-14 Thread via GitHub
stefanvodita commented on PR #13358: URL: https://github.com/apache/lucene/pull/13358#issuecomment-2109802030 Thank you for the review, Mike! I'd already put the CHANGES entries in 9.11 tentatively, now they're correct 😄 -- This is an automated message from the Apache Git Service. To res

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-05-14 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-2109801427 I was skeptical this would work out at first, but I think we have a successful backport in the end, so the changes will go out with 9.11. -- This is an automated message from the

Re: [PR] Backport to 9x: Reduce duplication in taxonomy facets; always do counts #12966 [lucene]

2024-05-14 Thread via GitHub
stefanvodita merged PR #13358: URL: https://github.com/apache/lucene/pull/13358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Call ArrayUtil.copyArray instead of ArrayUtil.copySubArray for full array copy. [lucene]

2024-05-14 Thread via GitHub
bruno-roustant merged PR #13360: URL: https://github.com/apache/lucene/pull/13360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@luc

Re: [PR] Prefetch postings data. [lucene]

2024-05-14 Thread via GitHub
jpountz commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599496612 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -1097,7 +1118,9 @@ public BlockImpactsDocsEnum(FieldInfo fieldInfo, IntBl

Re: [PR] Prefetch postings data. [lucene]

2024-05-14 Thread via GitHub
jpountz commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599493535 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -902,6 +917,12 @@ public int advance(int target) throws IOException {