[GitHub] [lucene] mikemccand commented on a diff in pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-29 Thread via GitHub
mikemccand commented on code in PR #12590: URL: https://github.com/apache/lucene/pull/12590#discussion_r1341253190 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -216,6 +216,10 @@ protected TopDocs exactSearch(LeafReaderContext context, DocI

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
javanna commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341307014 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,10 +79,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOExcept

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
javanna commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341307528 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Re

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
javanna commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341305002 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## Review Comment: Yep. -- This is an automated message from the Apache Git Service.

[GitHub] [lucene] gf2121 commented on a diff in pull request #12591: Sort update terms with stable radix sorter

2023-09-29 Thread via GitHub
gf2121 commented on code in PR #12591: URL: https://github.com/apache/lucene/pull/12591#discussion_r1341354760 ## lucene/core/src/java/org/apache/lucene/util/StableStringSorter.java: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[GitHub] [lucene] javanna commented on pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
javanna commented on PR #12606: URL: https://github.com/apache/lucene/pull/12606#issuecomment-1740890009 Thanks for looking @shubhamvishu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [lucene] jpountz commented on a diff in pull request #12591: Sort update terms with stable radix sorter

2023-09-29 Thread via GitHub
jpountz commented on code in PR #12591: URL: https://github.com/apache/lucene/pull/12591#discussion_r1341368129 ## lucene/core/src/java/org/apache/lucene/util/StableStringSorter.java: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] kaivalnp commented on a diff in pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-29 Thread via GitHub
kaivalnp commented on code in PR #12590: URL: https://github.com/apache/lucene/pull/12590#discussion_r1341368597 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -216,6 +216,10 @@ protected TopDocs exactSearch(LeafReaderContext context, DocIdS

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
shubhamvishu commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341393303 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** -

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
shubhamvishu commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341393303 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** -

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
shubhamvishu commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341413183 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** -

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-29 Thread via GitHub
shubhamvishu commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1341413183 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** -

[GitHub] [lucene] Shradha26 commented on issue #12553: [DISCUSS] Identifying Gaps in Lucene’s Faceting

2023-09-29 Thread via GitHub
Shradha26 commented on issue #12553: URL: https://github.com/apache/lucene/issues/12553#issuecomment-1741079621 Thanks, Mike! > What do you mean by an aggregation group? Is this like counting documents that are either red or blue? Yes, exactly. > Do we need to do the low-lev

[GitHub] [lucene] gf2121 commented on pull request #12604: Reduce FST block size for BlockTreeTermsWriter

2023-09-29 Thread via GitHub
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1741082917 Here is the young GC statistics and allocation profile after indexing`wikimedium10m` (without facets and dvs) https://bytedance.feishu.cn/sheets/G5dwsdvZ7hOxXftyfDkcvUkYnqB"; data

[GitHub] [lucene] gf2121 opened a new pull request, #12610: Improve fallback sorter for BKD

2023-09-29 Thread via GitHub
gf2121 opened a new pull request, #12610: URL: https://github.com/apache/lucene/pull/12610 ### Description This PR proposes to use a more efficient way to compare bytes when RadixSorter fallback to MergeSorter. -- This is an automated message from the Apache Git Service. To res

[GitHub] [lucene] gsmiller commented on issue #12585: Is it correct for facets to assume positive aggregation values?

2023-09-29 Thread via GitHub
gsmiller commented on issue #12585: URL: https://github.com/apache/lucene/issues/12585#issuecomment-1741276611 Yeah, this is a good callout. I ran into this when adding more flexibility to association faceting a while back (making note that supporting, e.g., "min" would require rethinking t

[GitHub] [lucene] stefanvodita commented on issue #12585: Is it correct for facets to assume positive aggregation values?

2023-09-29 Thread via GitHub
stefanvodita commented on issue #12585: URL: https://github.com/apache/lucene/issues/12585#issuecomment-1741471016 > not try to modify the faceting module as-is, but rather spin up a new "aggregations" module I'm definitely leaning that way too right now. @Shradha26 and I were consid

[GitHub] [lucene] gf2121 merged pull request #12591: Sort update terms with stable radix sorter

2023-09-29 Thread via GitHub
gf2121 merged PR #12591: URL: https://github.com/apache/lucene/pull/12591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-29 Thread via GitHub
kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1741698784 The previous build failed because a comment wasn't formatted correctly: ``` * What went wrong: Execution failed for task ':lucene:core:spotlessJavaCheck'. > The following

[GitHub] [lucene] stefanvodita commented on pull request #12506: Clean up ByteBlockPool

2023-09-30 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1741739207 I noticed the failing checks on this PR, but I haven't been able to reproduce them. They appear related to the nested javadoc tags I had introduced. I've removed them now. Hopefully

[GitHub] [lucene] pzygielo opened a new pull request, #12611: Avoid NPEx if the end of the stream has been reached without reading any characters

2023-09-30 Thread via GitHub
pzygielo opened a new pull request, #12611: URL: https://github.com/apache/lucene/pull/12611 e.g. by user responding with ^D ``` Press (n)ext page, (q)uit or enter number to jump to a page. Exception in thread "main" java.lang.NullPointerException: Cannot invoke "String.length()" be

[GitHub] [lucene] stefanvodita commented on a diff in pull request #12548: Ability to compute vector similarity scores with DoubleValuesSource

2023-09-30 Thread via GitHub
stefanvodita commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1341952008 ## lucene/core/src/test/org/apache/lucene/search/TestVectorSimilarityValuesSource.java: ## @@ -0,0 +1,385 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [lucene] stefanvodita commented on issue #12601: Reproducible TestDrillSideways failure

2023-09-30 Thread via GitHub
stefanvodita commented on issue #12601: URL: https://github.com/apache/lucene/issues/12601#issuecomment-1741757747 Reverting #921 fixes the test, so I think this is the same issue that @Yuti-G investigated in #12418. I ran the test in verbose mode (`./gradlew test --tests TestDrillSidewa

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12548: Ability to compute vector similarity scores with DoubleValuesSource

2023-10-01 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1342112409 ## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12548: Ability to compute vector similarity scores with DoubleValuesSource

2023-10-01 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1342112773 ## lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java: ## @@ -172,6 +173,40 @@ public LongValuesSource rewrite(IndexSearcher searcher) throws

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12548: Ability to compute vector similarity scores with DoubleValuesSource

2023-10-01 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1342112910 ## lucene/core/src/java/org/apache/lucene/search/FloatVectorSimilarityValuesSource.java: ## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #12548: Ability to compute vector similarity scores with DoubleValuesSource

2023-10-01 Thread via GitHub
shubhamvishu commented on code in PR #12548: URL: https://github.com/apache/lucene/pull/12548#discussion_r1342113181 ## lucene/core/src/test/org/apache/lucene/search/TestVectorSimilarityValuesSource.java: ## @@ -0,0 +1,385 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [lucene] uschindler opened a new pull request, #12612: Upgrade forbiddenapis to 3.6 and ASM for APIJAR extraction to 9.6

2023-10-01 Thread via GitHub
uschindler opened a new pull request, #12612: URL: https://github.com/apache/lucene/pull/12612 the usual maintenance. Allows us to process Java 22 APIs soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [lucene] uschindler commented on pull request #12612: Upgrade forbiddenapis to 3.6 and ASM for APIJAR extraction to 9.6

2023-10-01 Thread via GitHub
uschindler commented on PR #12612: URL: https://github.com/apache/lucene/pull/12612#issuecomment-1742097851 I regenerated the APIJAR files, it worked without problems out of box. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [lucene] uschindler merged pull request #12612: Upgrade forbiddenapis to 3.6 and ASM for APIJAR extraction to 9.6

2023-10-01 Thread via GitHub
uschindler merged PR #12612: URL: https://github.com/apache/lucene/pull/12612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] dweiss commented on pull request #11724: LUCENE-10520 / #11556 HTMLStripCharFilter bugfix

2023-10-01 Thread via GitHub
dweiss commented on PR #11724: URL: https://github.com/apache/lucene/pull/11724#issuecomment-1742195597 I will take a look tomorrow - was out of office last week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [lucene] dweiss merged pull request #11724: LUCENE-10520 / #11556 HTMLStripCharFilter bugfix

2023-10-01 Thread via GitHub
dweiss merged PR #11724: URL: https://github.com/apache/lucene/pull/11724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[GitHub] [lucene] dweiss closed issue #11556: HTMLStripCharFilter fails on '>' or '<' characters in attribute values [LUCENE-10520]

2023-10-01 Thread via GitHub
dweiss closed issue #11556: HTMLStripCharFilter fails on '>' or '<' characters in attribute values [LUCENE-10520] URL: https://github.com/apache/lucene/issues/11556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Simplify TaskExecutor API [lucene]

2023-10-02 Thread via GitHub
javanna merged PR #12603: URL: https://github.com/apache/lucene/pull/12603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742668578 Hi @jpountz ! Would you please take a look at this PR when you have time? Looking forward to getting your suggestions on this topic ~ -- This is an automated message from the Apache Git

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub
jpountz commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742678478 Oh, interesting find, it makes sense to me but I'm not the most familiar one with this piece of code. @mikemccand or @s1monw what do you think? -- This is an automated message from the

Re: [PR] Improve fallback sorter for BKD [lucene]

2023-10-02 Thread via GitHub
jpountz commented on code in PR #12610: URL: https://github.com/apache/lucene/pull/12610#discussion_r1342470366 ## lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java: ## @@ -81,6 +86,40 @@ protected int byteAt(int i, int k) { return (read

[PR] Override `normalize` method in the `PatternReplaceFilterFactory` [lucene]

2023-10-02 Thread via GitHub
dainiusjocas opened a new pull request, #12613: URL: https://github.com/apache/lucene/pull/12613 ### Description I've been debugging a problem with a classic query parser producing a `TermRangeQuery` with wrong tokens. I'm using a `CustomAnalyzer` that uses the `PatternReplaceFilter

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-02 Thread via GitHub
jpountz commented on issue #12418: URL: https://github.com/apache/lucene/issues/12418#issuecomment-1742740792 Sorry @Yuti-G I had missed your reply! I think that reverting the commit that you linked only made the test pass because this commit adds a call to `random().nextInt()` to `L

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub
s1monw commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742998087 This change makes sense to me. @mikemccand WDYT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

2023-10-02 Thread via GitHub
jbellis closed pull request #12421: Concurrent hnsw graph and builder, take two URL: https://github.com/apache/lucene/pull/12421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

2023-10-02 Thread via GitHub
jbellis commented on PR #12421: URL: https://github.com/apache/lucene/pull/12421#issuecomment-1743159095 Thanks for the feedback. I've switched my efforts to a DiskANN implementation in JVector, so closing this out. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add missing create github release step to release wizard [lucene]

2023-10-02 Thread via GitHub
javanna merged PR #12607: URL: https://github.com/apache/lucene/pull/12607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-02 Thread via GitHub
reta commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1342999505 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Retur

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343009944 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343038417 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return sli

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343042425 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return sli

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343048682 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-02 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343092520 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -139,7 +139,7 @@ public int nextDoc() throws IOException { } /** View over mult

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343142475 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180733 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slice

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180948 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slice

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343185358 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slice

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343266881 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slic

[PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-02 Thread via GitHub
gtroitskiy opened a new pull request, #12614: URL: https://github.com/apache/lucene/pull/12614 …cy check **Root cause** `onQueryCache` increases `ramBytesUsed` for specified amount, that is being calculated with respect to query being `Accountable` or not. Unfortunately, `onQuer

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-02 Thread via GitHub
hossman commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744028622 git bisect has identified `f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d` as the cause of a recent jenkins failure in `TestBlockMaxConjunction.testRandom` that reproduces reliably for me loca

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-02 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744317702 Thanks Hoss! I had missed this failure, it looks like a real one. I'm looking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-03 Thread via GitHub
javanna merged PR #12606: URL: https://github.com/apache/lucene/pull/12606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343749320 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343776898 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek commented on code in PR #12614: URL: https://github.com/apache/lucene/pull/12614#discussion_r1343791876 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -385,7 +385,9 @@ public void clearQuery(Query query) { private void onEviction(Query

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek commented on PR #12614: URL: https://github.com/apache/lucene/pull/12614#issuecomment-1744588907 Can you run `./gradlew tidy` at the root of the project to make sure the formatting is all correct? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-03 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-174462 I just pushed a fix: https://github.com/apache/lucene/commit/3f81f2f315745f86de3b516d53bf02fde61015a3. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] FST#Compiler allocates too much memory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12598: URL: https://github.com/apache/lucene/issues/12598#issuecomment-1744644275 Thanks @gf2121 -- this is a great discovery (and thank you https://blunders.io for the [awesome integrated profiling in Lucene's nightly benchmarks](https://blunders.io/posts/luc

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on code in PR #12604: URL: https://github.com/apache/lucene/pull/12604#discussion_r1343852045 ## lucene/CHANGES.txt: ## @@ -163,6 +163,8 @@ Optimizations * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by descending score. (Adr

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343870579 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +265,132 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
s1monw commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1744672419 @mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge? it might mitigate your concerns. -- This is an au

[I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-03 Thread via GitHub
mikemccand opened a new issue, #12615: URL: https://github.com/apache/lucene/issues/12615 ### Description I came across this compelling sounding [JVector project](https://foojay.io/today/jvector-1-0/) which looks to have awesome QPS performance. It uses [DiskANN](https://www.

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
gtroitskiy commented on PR #12614: URL: https://github.com/apache/lucene/pull/12614#issuecomment-1744720246 Thanks for reviewing! I ran tidy and made some refactoring -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1744763221 I do think Lucene's read-only segment based architecture leads itself to support quantization (required for DiskANN). It would be an interesting experiment to see how index

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343896112 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
dungba88 commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1744786263 I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one can decide whether on-heap or off-heap or on-heap without blocks is best

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343965288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343976622 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344010395 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344026176 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344048515 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344049288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1744944312 > I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one can decide whether on-heap or off-heap or on-heap without blocks is b

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1744945237 Copying the comment from #10520 that's really about this issue: > I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one

[PR] Minor refactor for HNSW graph merging logic [lucene]

2023-10-03 Thread via GitHub
benwtrent opened a new pull request, #12616: URL: https://github.com/apache/lucene/pull/12616 This is a minor refactor of HNSW graph merging logic. Instead of directly checking the KnnVectorReader version, this commit adjusts the logic to see if a specific interface is satisfied for r

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on code in PR #12600: URL: https://github.com/apache/lucene/pull/12600#discussion_r1344093638 ## lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int le

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1744979771 I changed the https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/ (Linux MMAP Job to use your branch). Please wait a bit until the checker is happy, as it tries all differet java ver

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1744981658 In general to me it is still questionable if we really need a bulk random access byte[] reader. I am partly agree with thiy, but if somebody asks for float[] or long[] bulk reads with

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1745017914 > Maybe we could first allow FSTCompiler to specify its own DataOutput even when building the tree on-the-fly, instead of always relying on BytesStore? And we are free to choose

Re: [PR] Make LRUQueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek merged PR #12614: URL: https://github.com/apache/lucene/pull/12614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1745049853 Copying another comment from #10520: > Maybe we could first allow FSTCompiler to specify its own DataOutput even when building the tree on-the-fly, instead of always relyin

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
iverase commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1745118236 I am just trying to have an abstraction that can replace the BytesRef output for binary doc values with something that does not impose the internal representation of the bytes like Bytes

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1344264569 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slice

[I] Easy access TermStates's needStats flag [lucene]

2023-10-03 Thread via GitHub
yugushihuang opened a new issue, #12617: URL: https://github.com/apache/lucene/issues/12617 ### Description When we build TermStates we pass a flag needStats to determine if we want to front loading all term statistics, however, we did not have easy access to know if this flag has be

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1344356526 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +265,132 @@ protected LeafSlice[] slices(List leaves) { return slice

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1745354257 Thanks for all review and suggestions here! > @mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge?

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1745389156 @dungba88 raised a good point -- FST construction also needs to read prior bytes it wrote even as it is appending new bytes to the end of the file. Lucene's IndexInput/Outp

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344513453 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344514851 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on PR #12523: URL: https://github.com/apache/lucene/pull/12523#issuecomment-1745484064 This looks great thanks @quux00 ! Could you add an entry to the lucene/CHANGES.txt file under Lucene 9.9 please? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Minor refactor for HNSW graph merging logic [lucene]

2023-10-03 Thread via GitHub
benwtrent merged PR #12616: URL: https://github.com/apache/lucene/pull/12616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] avoid-circular-jar-checks [lucene]

2023-10-03 Thread via GitHub
risdenk opened a new pull request, #12618: URL: https://github.com/apache/lucene/pull/12618 ### Description jar-checks.gradle can go into an infinite loop if there are dependencies that could be circular. In Solr, grpc-utils has a compile dependency on grpc-core and grpc-core has a r

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344560786 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on code in PR #12604: URL: https://github.com/apache/lucene/pull/12604#discussion_r1344628522 ## lucene/CHANGES.txt: ## @@ -163,6 +163,9 @@ Optimizations * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by descending score. (Adr

<    1   2   3   4   5   6   7   8   9   10   >