[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-08-11 Thread via GitHub
searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1675721505 Looking great, many thanks! Could you remind me what is ordered and reversed? This is something related to insertion order? -- This is an automated message from the Apache G

[GitHub] [lucene-solr] squirmy closed pull request #1681: SOLR-10804: Allow same version updates in DocBasedVersionConstraintsProcessor

2023-08-11 Thread via GitHub
squirmy closed pull request #1681: SOLR-10804: Allow same version updates in DocBasedVersionConstraintsProcessor URL: https://github.com/apache/lucene-solr/pull/1681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [lucene] reta commented on issue #12498: Simplify task executor for concurrent operations

2023-08-11 Thread via GitHub
reta commented on issue #12498: URL: https://github.com/apache/lucene/issues/12498#issuecomment-1675403043 > It makes sense to me to push the responsibility of figuring out how to execute tasks to the executor. Also pinging @reta. Thanks @jpountz , I second that > Additionally,

[GitHub] [lucene] jpountz merged pull request #12415: Optimize disjunction counts.

2023-08-11 Thread via GitHub
jpountz merged PR #12415: URL: https://github.com/apache/lucene/pull/12415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] ashvardanian commented on issue #12502: USearch integration and potential Vector Search performance improvements

2023-08-11 Thread via GitHub
ashvardanian commented on issue #12502: URL: https://github.com/apache/lucene/issues/12502#issuecomment-1675201748 Thank you, @benwtrent, @jbellis, and @uschindler! It's very insightful! [Nmslib.java](https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index

[GitHub] [lucene] sabi0 commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
sabi0 commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1675193472 > > Besides having two implementations ... with the same "Lucene84" name will likely result in a lookup error? > Exactly and because of that its final. I just do not understa

[GitHub] [lucene] jbellis commented on a diff in pull request #12421: Concurrent hnsw graph and builder, take two

2023-08-11 Thread via GitHub
jbellis commented on code in PR #12421: URL: https://github.com/apache/lucene/pull/12421#discussion_r1291603910 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentNeighborSet.java: ## @@ -0,0 +1,292 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [lucene] benwtrent opened a new issue, #12505: Re-explore the logic around when Vector search should be Exact

2023-08-11 Thread via GitHub
benwtrent opened a new issue, #12505: URL: https://github.com/apache/lucene/issues/12505 ### Description Lucene always does an approximate nearest neighbors search when no filter is provided. This seems like unnecessary work. Some benchmarks would have to be done, but some id

[GitHub] [lucene] benwtrent opened a new pull request, #12504: ToParentBlockJoin[Byte|Float]KnnVectorQuery needs to handle the case when parents are missing

2023-08-11 Thread via GitHub
benwtrent opened a new pull request, #12504: URL: https://github.com/apache/lucene/pull/12504 This is a follow up to: https://github.com/apache/lucene/pull/12434 Adds a test for when parents are missing in the index and verifies we return no hits. Previously this would have thrown an

[GitHub] [lucene] uschindler commented on issue #12502: USearch integration and potential Vector Search performance improvements

2023-08-11 Thread via GitHub
uschindler commented on issue #12502: URL: https://github.com/apache/lucene/issues/12502#issuecomment-1675084211 Yes: - no external libraries for Lucene Core - no native code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] jbellis commented on issue #12502: USearch integration and potential Vector Search performance improvements

2023-08-11 Thread via GitHub
jbellis commented on issue #12502: URL: https://github.com/apache/lucene/issues/12502#issuecomment-1675081860 Hi Ash, (1) Have you compared usearch directly with Lucene? This could be a useful starting point: https://github.com/jbellis/hnswrecall (2) My understanding is that i

[GitHub] [lucene] benwtrent commented on issue #12502: USearch integration and potential Vector Search performance improvements

2023-08-11 Thread via GitHub
benwtrent commented on issue #12502: URL: https://github.com/apache/lucene/issues/12502#issuecomment-1675079917 I don't think we need a native implementation. JNI stuff can be dangerous. I honestly don't know the history around Lucene and if there have ever been considerations in the area b

[GitHub] [lucene] henryrneh opened a new issue, #12503: OutOfMemoryrror found by OSS-Fuzz (issue 60248)

2023-08-11 Thread via GitHub
henryrneh opened a new issue, #12503: URL: https://github.com/apache/lucene/issues/12503 ### Description Dear Apache Lucene maintainers, The OutOfMemory is triggered in this [line](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/ArrayUtil.

[GitHub] [lucene] ashvardanian opened a new issue, #12502: USearch integration and potential Vector Search performance improvements

2023-08-11 Thread via GitHub
ashvardanian opened a new issue, #12502: URL: https://github.com/apache/lucene/issues/12502 ### Description I was recently approached by Lucene and Elastic users, facing low performance and high memory consumption issues, running Vector Search tasks on JVM. Some have also been using

[GitHub] [lucene] uschindler commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674996502 > The postings format classes are `final`. Besides having two implementations (`Lucene84PostingsFormat` in lucene-core and `MyLucene84PostingsFormat`) with the same "Lucene84" na

[GitHub] [lucene] sabi0 commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
sabi0 commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674983089 The codec classes are `final`. Besides having two implementations (`Lucene84PostingsFormat` in lucene-core and `MyLucene84PostingsFormat`) with the same "Lucene84" name will likely

[GitHub] [lucene] uschindler commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674970444 The general rule is: If you want to change the index postings format (but nothing else like codec itsself) when writing a new index, you need to subclass default codec. By that i

[GitHub] [lucene] uschindler closed issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler closed issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class URL: https://github.com/apache/lucene/issues/12501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [lucene] uschindler commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674957426 For the FST loading mode mentioned above, the codec does not need to be changed, you can tell DirectoryReader to use FST load modes using attributes. -- This is an automated me

[GitHub] [lucene] uschindler commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674955262 > So it looked like this SPI extension point loss was an unwitting side-effect of a sequence of refactorings. No, the SPI fromName does not allow you to change the implemen

[GitHub] [lucene] sabi0 commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
sabi0 commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674950075 I see. Thank you for the explanation. The commits that change this behavior did not say anything about this. So it looked like this SPI extension point loss was an unwitting side-

[GitHub] [lucene] uschindler commented on issue #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
uschindler commented on issue #12501: URL: https://github.com/apache/lucene/issues/12501#issuecomment-1674943197 Hi, the SPI should only be used when READING indexes. When you create a codec for IndexWriter the codec version hardcodes its postings formats and other subtypes. As you see i

[GitHub] [lucene] sabi0 opened a new issue, #12501: Default PostingsFormat lost the SPI extension point in the Codec class

2023-08-11 Thread via GitHub
sabi0 opened a new issue, #12501: URL: https://github.com/apache/lucene/issues/12501 ### Description `Lucene70Codec` had: ``` private final PostingsFormat defaultFormat = PostingsFormat.forName("Lucene50"); ``` In the `Lucene80Codec` PostingsFormat instantiation was mo

[GitHub] [lucene] benwtrent merged pull request #12500: Fix flaky testToString method for Knn Vector queries

2023-08-11 Thread via GitHub
benwtrent merged PR #12500: URL: https://github.com/apache/lucene/pull/12500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[GitHub] [lucene] uschindler commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-08-11 Thread via GitHub
uschindler commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1674532419 Just open public issues. Actually not all of those errors would be fixed, because Apache Lucene does not always do all possible checks, as performance is more important th

[GitHub] [lucene] henryrneh commented on issue #12165: Integrating Apache Lucene into OSS-Fuzz

2023-08-11 Thread via GitHub
henryrneh commented on issue #12165: URL: https://github.com/apache/lucene/issues/12165#issuecomment-1674365549 Now we have started to do some bug triaging of bugs from OSS-Fuzz. There are multiple issues discovered with the fuzzer, for example OutOfMemory or StackOverflow, that we can disc