Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
navneet1v commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1762496963 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-09-16 Thread via GitHub
vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2354135495 I wonder if we can leverage IndexWriter's `addIndexes(Directory... dirs)` [API](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWrite

Re: [PR] Remove CollectorManager#forSequentialExecution [lucene]

2024-09-16 Thread via GitHub
gsmiller commented on PR #13790: URL: https://github.com/apache/lucene/pull/13790#issuecomment-2354128424 tl;dr: I agree with removing this. I was [initially hesitant](https://github.com/apache/lucene/pull/13735#issuecomment-2338340094) to add this for a lot of the same reasons, but

Re: [PR] Fix Flaky Test In TestBlockJoinBulkScorer [lucene]

2024-09-16 Thread via GitHub
javanna commented on PR #13785: URL: https://github.com/apache/lucene/pull/13785#issuecomment-2353671468 Thanks for fixing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Speed up advancing within a block. [lucene]

2024-09-16 Thread via GitHub
mikemccand commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2353416849 > I plotted the number of docs that queries need to skip within a block when advancing This is a really cool chart! Maybe we could somehow dynamically optimize, picking t

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2353397543 I also just started trying to replace `copy()` with the approach of adding `vectorValue(int ord, float[] outValue)` although this does add a copy operation in some cases where previousl

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
jpountz commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2353373827 FWIW I started playing with removing copy() by replacing it with a factory method for a dictionary: https://github.com/msokolov/lucene/commit/ae7aca32a690a4b21a3da793258ce17560b551e7. N

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2353215032 Regarding the rename of `fromOrdToDoc` to `all` I think it was not helpful and plan to revert or maybe come up with some other name. The problem is we also have `createDenseIterator` wh

[I] SpanOrQuery uses IDFs of failed subqueries in score calculation. [lucene]

2024-09-16 Thread via GitHub
tkarampAlpha opened a new issue, #13796: URL: https://github.com/apache/lucene/issues/13796 ### Description It seems that for SpanOrQuery IDF of terms belonging in subqueries that will not match a given document, will affect said document's score. I have observed this through o

Re: [I] Extended spell checker with phrase support and adaptive user session analysis. [LUCENE-626] [lucene]

2024-09-16 Thread via GitHub
Menahali commented on issue #1701: URL: https://github.com/apache/lucene/issues/1701#issuecomment-2353104696 > Karl Wettin ([migrated from JIRA](https://issues.apache.org/jira/browse/LUCENE-626?focusedCommentId=12477688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#co

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2353098829 OK there seem to be some test failures ... I did a complete run, but randomized testing always seems to ferret out something interesting! -- This is an automated message from the Apac

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-16 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2353036120 I pushed a new revision here addressing some of the major comments: 1. `KnnVectorValues.iterator()` now generally provides a new iterator; no caching is done. I removed `createIte

Re: [I] "cz" (vs ISO langauge code "cs") for Czech analysis package? [LUCENE-6366] [lucene]

2024-09-16 Thread via GitHub
WEBCON-BPS-DEV commented on issue #7426: URL: https://github.com/apache/lucene/issues/7426#issuecomment-2352942882 Is there any update on the issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Make Operations#optional create simpler automata. [lucene]

2024-09-16 Thread via GitHub
jpountz merged PR #13793: URL: https://github.com/apache/lucene/pull/13793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix Flaky Test In TestBlockJoinBulkScorer [lucene]

2024-09-16 Thread via GitHub
jpountz merged PR #13785: URL: https://github.com/apache/lucene/pull/13785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix Flaky Test In TestBlockJoinBulkScorer [lucene]

2024-09-16 Thread via GitHub
jpountz commented on code in PR #13785: URL: https://github.com/apache/lucene/pull/13785#discussion_r1761068214 ## lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java: ## @@ -283,7 +292,33 @@ public void collect(int doc) throws IOException {

Re: [PR] Fix Flaky Test In TestBlockJoinBulkScorer [lucene]

2024-09-16 Thread via GitHub
Mikep86 commented on code in PR #13785: URL: https://github.com/apache/lucene/pull/13785#discussion_r1761059326 ## lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java: ## @@ -283,7 +292,33 @@ public void collect(int doc) throws IOException {

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-16 Thread via GitHub
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1760911695 ## lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java: ## @@ -97,12 +97,12 @@ public List getMatchingDocs() { public void collect(int doc) throws IO

Re: [PR] Remove CollectorManager#forSequentialExecution [lucene]

2024-09-16 Thread via GitHub
javanna commented on PR #13790: URL: https://github.com/apache/lucene/pull/13790#issuecomment-2352511247 @gsmiller how do you feel about this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Deprecate BulkScorer#score(LeafReaderContext, Bits) [lucene]

2024-09-16 Thread via GitHub
javanna commented on PR #13794: URL: https://github.com/apache/lucene/pull/13794#issuecomment-2352510100 Thanks @jpountz ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Deprecate BulkScorer#score(LeafReaderContext, Bits) [lucene]

2024-09-16 Thread via GitHub
javanna merged PR #13794: URL: https://github.com/apache/lucene/pull/13794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Should we auto-adjust top score doc and top field collector manager based on slices? [lucene]

2024-09-16 Thread via GitHub
javanna commented on issue #13791: URL: https://github.com/apache/lucene/issues/13791#issuecomment-2352478375 ++ to that @jpountz that would also be my preferred approach. I somehow made the assumption that we are not willing to go that route, but I think we should revisit that decision, as

Re: [PR] Deprecate BulkScorer#score(LeafReaderContext, Bits) [lucene]

2024-09-16 Thread via GitHub
javanna commented on code in PR #13794: URL: https://github.com/apache/lucene/pull/13794#discussion_r1760852834 ## lucene/core/src/java/org/apache/lucene/search/BulkScorer.java: ## @@ -33,7 +33,11 @@ public abstract class BulkScorer { * @param collector The collector to whic

Re: [PR] Deprecate BulkScorer#score(LeafReaderContext, Bits) [lucene]

2024-09-16 Thread via GitHub
jpountz commented on code in PR #13794: URL: https://github.com/apache/lucene/pull/13794#discussion_r1760821555 ## lucene/core/src/java/org/apache/lucene/search/BulkScorer.java: ## @@ -33,7 +33,11 @@ public abstract class BulkScorer { * @param collector The collector to whic

Re: [PR] Add support for intra-segment search concurrency [lucene]

2024-09-16 Thread via GitHub
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1760803091 ## lucene/core/src/java/org/apache/lucene/search/BulkScorer.java: ## @@ -27,18 +27,6 @@ */ public abstract class BulkScorer { - /** - * Scores and collects all

[PR] Deprecate BulkScorer#score(LeafReaderContext, Bits) [lucene]

2024-09-16 Thread via GitHub
javanna opened a new pull request, #13794: URL: https://github.com/apache/lucene/pull/13794 We have removed BulkScorer#score(LeafReaderContext, Bits) in main in favour of BulkScorer#score(LeafCollector collector, Bits acceptDocs, int min, int max) as part of #13542. This commit deprecates t

Re: [PR] Speed up advancing within a block. [lucene]

2024-09-16 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2352330831 I reverted this change. While it was a good win on average on my machine, it was almost a net loss on nightly benchmarks. So I'd rather keep the current linear scan approach, which has a

Re: [PR] Speed up advancing within a block. [lucene]

2024-09-16 Thread via GitHub
jpountz commented on PR #13692: URL: https://github.com/apache/lucene/pull/13692#issuecomment-2352315844 Woops, sorry I'm only seeing your reply now. The above analysis you referred to uses branchless binary search over the full buffer of 128 doc IDs. -- This is an automated message from

Re: [PR] Fix Flaky Test In TestBlockJoinBulkScorer [lucene]

2024-09-16 Thread via GitHub
jpountz commented on code in PR #13785: URL: https://github.com/apache/lucene/pull/13785#discussion_r1760649790 ## lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoinBulkScorer.java: ## @@ -283,7 +292,33 @@ public void collect(int doc) throws IOException {

Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-16 Thread via GitHub
jpountz commented on PR #13697: URL: https://github.com/apache/lucene/pull/13697#issuecomment-2352178493 FYI @Mikep86 opened a PR at https://github.com/apache/lucene/pull/13785. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]

2024-09-16 Thread via GitHub
javanna commented on PR #13697: URL: https://github.com/apache/lucene/pull/13697#issuecomment-2352173213 There's a couple of recent test failures, in main as well as 9x, that may have to do with this change, judging from the area that it touches: ``` FAILED: org.apache.lucene.sea