Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
jpountz merged PR #12807: URL: https://github.com/apache/lucene/pull/12807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
cavorite commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1393698430 ## lucene/CHANGES.txt: ## @@ -7,6 +7,8 @@ http://s.apache.org/luceneversions API Changes - +* GITHUB-12695: Deprecated public constructor of F

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [I] Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-14 Thread via GitHub
vsop-479 closed issue #12788: Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? URL: https://github.com/apache/lucene/issues/12788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Make FST BytesStore grow smoothly [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on issue #12619: URL: https://github.com/apache/lucene/issues/12619#issuecomment-1811613353 In https://github.com/apache/lucene/pull/12624, I moved the main FST body out of `BytesStore` into `ByteBuffersDataOutput`, and BytesStore becomes only a single `byte[]` for the cu

Re: [I] Can FST read bytes forward? [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on issue #12355: URL: https://github.com/apache/lucene/issues/12355#issuecomment-1811598793 > reverse byte[] after writing them all Interestingly we are specifically reverse the byte[] after the write to make it backward. To make it forward we can simply *not* do th

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393462969 ## lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java: ## @@ -64,22 +66,13 @@ public FSTStore init(DataInput in, long numBytes) throws IOException {

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811586309 See also https://github.com/apache/beam/pull/24930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393461923 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -247,16 +306,14 @@ public Builder directAddressingMaxOversizingFactor(float factor) {

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811585147 Hi @dweiss, The issue with errorprone is exactly the same like we have seen for turbocharger of Java options: https://github.com/gradle/gradle/issues/22746 The new version of

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1811511529 Have you merged in the latest main branch, so this PR is uptodate? This could be an issue which already existed when the PR was created. -- This is an automated message from the Apa

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-14 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1811494543 Kind of confused why this check is failing. This was never changed and I've tried merging. 1``` . ERROR in /home/runner/work/lucene/lucene/lucene/test-framework/src/java/org

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler commented on PR #12808: URL: https://github.com/apache/lucene/pull/12808#issuecomment-1811439713 Merged to lucene/branch_9x, solr/main + solr/branch_9x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Fix errorprone with alternative runtime [lucene]

2023-11-14 Thread via GitHub
uschindler merged PR #12808: URL: https://github.com/apache/lucene/pull/12808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
jpountz commented on code in PR #12807: URL: https://github.com/apache/lucene/pull/12807#discussion_r1393332881 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -92,18 +92,8 @@ public final class Lucene99HnswVectorsReader extends K

Re: [PR] Close all files when hitting an I/O exception with vectors. [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12807: URL: https://github.com/apache/lucene/pull/12807#discussion_r1393328067 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -92,18 +92,8 @@ public final class Lucene99HnswVectorsReader extends

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1393228359 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -0,0 +1,246 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-14 Thread via GitHub
benwtrent commented on code in PR #12806: URL: https://github.com/apache/lucene/pull/12806#discussion_r1393164542 ## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ## @@ -779,6 +781,16 @@ Directory getIndexStore( doc.add(getKnnVectorField(f

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1811057058 Keeping the `visitLimit` = 0 (immediately fallback to lazy iterator) we expect an exact search to be performed (and `recall` = 1) as soon as the first node is visited (`numVisited` = 1)

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393115663 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -4051,15 +4057,30 @@ public static Options parseOptions(String[] args) { int i = 0; whi

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393046003 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on code in PR #12797: URL: https://github.com/apache/lucene/pull/12797#discussion_r1393046003 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -3974,7 +3974,7 @@ public static class Options { boolean doExorcise = false; boolean do

Re: [PR] CheckIndex - Making -fast the default behaviour [lucene]

2023-11-14 Thread via GitHub
slow-J commented on PR #12797: URL: https://github.com/apache/lucene/pull/12797#issuecomment-1810872291 > ```java > msg(infoStream, "Skipping logical integrity checks: pass -ea -slow to check logical integrity") > ``` I tried and this is not possible to do, because while runni

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand merged PR #12805: URL: https://github.com/apache/lucene/pull/12805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392935143 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392710843 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
uschindler commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392697633 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
uschindler commented on code in PR #12805: URL: https://github.com/apache/lucene/pull/12805#discussion_r1392688486 ## lucene/queryparser/src/java/org/apache/lucene/queryparser/surround/parser/QueryParser.java: ## @@ -174,7 +174,6 @@ protected SrndQuery getTruncQuery(String trunc

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392647780 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392645763 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

[PR] Remove angry errant lurking semicolons [lucene]

2023-11-14 Thread via GitHub
mikemccand opened a new pull request, #12805: URL: https://github.com/apache/lucene/pull/12805 I noticed yet another errant `;` and then grep'd and found tons of them and removed them. Note that it was a bit tricky because some lines that have only whitespace and a semicolon are actu

Re: [PR] Fix large 99 percentile match latency in Monitor during concurrent commits and purges [lucene]

2023-11-14 Thread via GitHub
romseygeek merged PR #12801: URL: https://github.com/apache/lucene/pull/12801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Make FST BytesStore grow smoothly [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on issue #12619: URL: https://github.com/apache/lucene/issues/12619#issuecomment-1810237337 Note that `oal.store.ByteBuffersDataOutput` takes a different and neat approach to gracefully growing: it picks an initial block size, and appends new blocks as you write bytes,

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392603399 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; impor

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392598634 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392595332 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar t

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on PR #12715: URL: https://github.com/apache/lucene/pull/12715#issuecomment-1810209410 Thank you @cavorite! Much cleaner to use a consistent API for building FSTs... -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392585398 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -120,31 +122,54 @@ public class FSTCompiler { final float directAddressingMaxOversizin

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1392585037 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -125,8 +125,11 @@ public class FSTCompiler { /** * Instantiates an FST/FSA builder

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-14 Thread via GitHub
mikemccand merged PR #12715: URL: https://github.com/apache/lucene/pull/12715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-14 Thread via GitHub
gf2121 commented on code in PR #12800: URL: https://github.com/apache/lucene/pull/12800#discussion_r1392571640 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/DocSorterBenchmark.java: ## @@ -0,0 +1,241 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Use group-varint encoding for the tail of postings [lucene]

2023-11-14 Thread via GitHub
easyice commented on PR #12782: URL: https://github.com/apache/lucene/pull/12782#issuecomment-1810036730 Thank you @jpountz , I pushed the benchmark code, and added a new comparison between `ByteArrayDataInput` vs `ByteBufferIndexInput` . For `readVInt`, the `ByteBufferIndexInput` is a bit

Re: [PR] Add downloading binutils instructions for the macos [lucene]

2023-11-14 Thread via GitHub
rmuir commented on PR #12804: URL: https://github.com/apache/lucene/pull/12804#issuecomment-1809976433 thank you @vsop-479 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add downloading binutils instructions for the macos [lucene]

2023-11-14 Thread via GitHub
rmuir merged PR #12804: URL: https://github.com/apache/lucene/pull/12804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [I] Unroll or vectorize Math.max in CompetitiveImpactAccumulator.addAll? [lucene]

2023-11-14 Thread via GitHub
vsop-479 commented on issue #12788: URL: https://github.com/apache/lucene/issues/12788#issuecomment-1809865124 > To benchmark then use the benchmark-jmh Gradle module I measured max with scalar, unroll, vector implementation by benchmark-jmh: Benchmark

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1391973476 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -26,7 +26,8 @@ // TODO: merge with PagedBytes, except PagedBytes doesn't // let you read w

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392201110 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-14 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392197630 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-14 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1809743395 ### Benchmark Setup Sharing my benchmark setup for reproducibility in [this branch](https://github.com/kaivalnp/lucene/tree/similarity-benchmark) (see [this commit](https://gith