date:20240703

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-03 Thread via GitHub

ChrisHegarty commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2205713035 >Merging uses an IOContext with a MergeInfo attached. This uses same ReadAdvice SEQUENTIAL, but that is different to the Singleton READONCE IOContext. Or do I miss something?

Re: [PR] Only search soft deleted in SoftDeletesRetentionMergePolicy.applyRetentionQuery [lucene]

2024-07-03 Thread via GitHub

vsop-479 commented on PR #13536: URL: https://github.com/apache/lucene/pull/13536#issuecomment-2205724663 > Can we extract softLiveDocs from liveDocs and hardLiveDocs, and use it directly, even instead of FieldExistsQuery? Maybe it won't get any speed up. -- This is an automated me

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-03 Thread via GitHub

uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2205763601 P.S.: Because of this I proposed that READONCE IndexInputs should throw exceptions on clone/slice. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1664079407 ## lucene/core/src/java/org/apache/lucene/codecs/KnnTensorsFormat.java: ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1664091301 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -143,14 +143,22 @@ public int nextDoc() throws IOException { public static final cl

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1664102266 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -209,17 +210,17 @@ public static VectorEncoding readVectorEncoding(D

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

mikemccand commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206113117 > My concern for 8 bit quantization is the algebraic expansion of dot-product and the corrective terms. > > For scalar quantization, the score corrections for dotProduct ar

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

mikemccand commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206145047 Could someone summarize what is actually wrong with Lucene here? From my `luceneutil` tests it looks like `int8` with `DOT_PRODUCT` is buggy / does not work (.006 recall on `coh

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

javanna commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664308783 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -308,7 +306,7 @@ public void testInvokeAllCatchesMultipleExceptions() { } public

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664333847 ## lucene/CHANGES.txt: ## @@ -277,6 +277,15 @@ Optimizations * GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (St

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664337927 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -251,14 +271,15 @@ public void testInvokeAllDoesNotLeaveTasksBehind() { f

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664338521 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -251,14 +271,15 @@ public void testInvokeAllDoesNotLeaveTasksBehind() { f

[PR] Reduce heap usage for knn index writers [lucene]

2024-07-03 Thread via GitHub

benwtrent opened a new pull request, #13538: URL: https://github.com/apache/lucene/pull/13538 This slightly reduces heap utilization for KnnIndex writers by: - Only constructing one `DocsWithFieldSet` instead of 3 for Scalar quantized HNSW formats - Only having one `List` instead

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206500270 @mikemccand > Does the Lucene implementation have per-dimension calculation of quantiles/min/max? Or is it really a global min/max computed across all dimensions and all v

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664369069 ## lucene/CHANGES.txt: ## @@ -277,6 +277,15 @@ Optimizations * GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (St

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

mikemccand commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206798896 > > I know we are discussing how to make the conflated quantization and distance metric combinations math work out (i.e. how to fix the issue), but I'm just trying to get the big

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1664549231 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -143,14 +143,22 @@ public int nextDoc() throws IOException { public static final cl

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1664549231 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -143,14 +143,22 @@ public int nextDoc() throws IOException { public static final cl

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206891266 @MilindShyani I am not 100% your equation is correct. (NOTE: `scale` is just the inverse of alpha). It seems to me that: ``` byte = (float - minQuantile) * alph

Re: [PR] in KnnVectorsWriter reduce code duplication w.r.t. MergedVectorValues.merge(Float|Byte)VectorValues [lucene]

2024-07-03 Thread via GitHub

cpoerschke commented on code in PR #13539: URL: https://github.com/apache/lucene/pull/13539#discussion_r1664572651 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -143,61 +145,81 @@ public int nextDoc() throws IOException { public static final cl

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

javanna commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664608482 ## lucene/CHANGES.txt: ## @@ -277,6 +277,15 @@ Optimizations * GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (Stefan Vodita

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2206982541 @naveentatikonda @mikemccand https://github.com/apache/lucene/compare/main...benwtrent:fix-8-bit?expand=1 That is my current WIP -- This is an automated message from the

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664618492 ## lucene/CHANGES.txt: ## @@ -277,6 +277,15 @@ Optimizations * GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (St

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

benwtrent commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2207015770 With this change, I get the following recall@100 over 500_000 coherev2 (using max-inner-product as designed for the model): ``` recall maxConn beamWidth 0.872 1

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

javanna commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664646220 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -251,14 +271,15 @@ public void testInvokeAllDoesNotLeaveTasksBehind() { for (int i =

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

javanna commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664646838 ## lucene/CHANGES.txt: ## @@ -277,6 +277,15 @@ Optimizations * GITHUB#12941: Don't preserve auxiliary buffer contents in LSBRadixSorter if it grows. (Stefan Vodita

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-03 Thread via GitHub

jpountz commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2207164356 Thanks for checking, I misremembered. @uschindler +1 to disallowing clones and slices. I' m contemplating making `MockDirectoryWrapper` also fail if `IndexInput`s get used in a dif

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-07-03 Thread via GitHub

original-brownbear commented on code in PR #13472: URL: https://github.com/apache/lucene/pull/13472#discussion_r1664758046 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -251,14 +271,15 @@ public void testInvokeAllDoesNotLeaveTasksBehind() { f

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-03 Thread via GitHub

uschindler commented on PR #13535: URL: https://github.com/apache/lucene/pull/13535#issuecomment-2207421958 bq. I'm contemplating making MockDirectoryWrapper also fail if IndexInputs get used in a different thread from the one where they were created. You mean this for READONCE ones,

Re: [PR] Reduce heap usage for knn index writers [lucene]

2024-07-03 Thread via GitHub

gautamworah96 commented on PR #13538: URL: https://github.com/apache/lucene/pull/13538#issuecomment-2207757150 Thanks for the quick turnaround on this Ben. I am not too familiar with this code and am taking some time off this week. I'll try to review the code by Friday. In the meant

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-03 Thread via GitHub

naveentatikonda commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2207802401 @benwtrent The new numbers looks good. But, I have few questions based on your [code changes ](https://github.com/apache/lucene/compare/main...benwtrent:fix-8-bit?expand=1)

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

2024-07-03 Thread via GitHub

vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2207897576 Did some refactor to reuse the same format. Also renamed "tensor" to "multi-vector".. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Multi-Vector support for HNSW search [lucene]

2024-07-03 Thread via GitHub

vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2207906861 Diff is down to 45 files now (from 75, yikes!), and most touch points are still due to FieldInfo changes. These are the files with key changes: ```ruby lucene/core/src/java/org

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

Re: [PR] Only search soft deleted in SoftDeletesRetentionMergePolicy.applyRetentionQuery [lucene]

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

[PR] Reduce heap usage for knn index writers [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] in KnnVectorsWriter reduce code duplication w.r.t. MergedVectorValues.merge(Float|Byte)VectorValues [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

Re: [PR] Reduce heap usage for knn index writers [lucene]

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

Re: [PR] Tensor (multi-valued vector) support for HNSW search [lucene]

Re: [PR] Multi-Vector support for HNSW search [lucene]

33 matches

Site Navigation

Mail list logo

Footer information