Re: [PR] Record if block API has been used in SegmentInfo [lucene]

2023-10-19 Thread via GitHub
jpountz commented on code in PR #12685: URL: https://github.com/apache/lucene/pull/12685#discussion_r1365023966 ## lucene/CHANGES.txt: ## @@ -147,9 +147,13 @@ API Changes New Features - + * GITHUB#12548: Added similarityToQueryVector API to compute vecto

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12690: URL: https://github.com/apache/lucene/pull/12690#issuecomment-1770366875 Actually, I'm nervous about my git merging skills! Let's try to review/push the [RAM limited FST building PR](https://github.com/apache/lucene/pull/12633) first? -- This is an aut

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on PR #12690: URL: https://github.com/apache/lucene/pull/12690#issuecomment-1770371305 That's ok for me too! I'll rebase after the other PR is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365175738 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,6 +63,9 @@ public abstract class MultiLevelSkipListWriter { /** for e

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365181118 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365184008 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -130,12 +129,14 @@ public void bufferSkip(int df) throws IOException {

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365181118 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Reduce collection operations when minShouldMatch == 0. [lucene]

2023-10-19 Thread via GitHub
zouxiang1993 closed pull request #12602: Reduce collection operations when minShouldMatch == 0. URL: https://github.com/apache/lucene/pull/12602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Random access term dictionary [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1770448647 I'll try to review this soon -- it sounds compelling @Tony-X! I like how it is inspired by Tantivy's term dictionary format (which holds all terms + their metadata in RAM). Al

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770452771 That's a neat idea (separate codec that trades off index size for faster search performance). Maybe it could also fold in the [fully in RAM FST term dictionary](https://github.c

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770457453 > Posting results below The results are impressive! Conjunctive (-like) queries see sizable gains. Did you turn off patching for all encoded `int[]` blocks (docs, fr

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770461719 Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this). Today, our skipper is forced to align to

Re: [I] Using a searcher with an executor service does not work from within a Callable called by that same executor service [LUCENE-3803] [lucene]

2023-10-19 Thread via GitHub
javanna commented on issue #4876: URL: https://github.com/apache/lucene/issues/4876#issuecomment-1770467481 I stumbled upon this issue by coincidence, and I believe it has been addressed by https://github.com/apache/lucene/pull/12569 . There are real situations where this can happen now as

Re: [I] Using a searcher with an executor service does not work from within a Callable called by that same executor service [LUCENE-3803] [lucene]

2023-10-19 Thread via GitHub
javanna closed issue #4876: Using a searcher with an executor service does not work from within a Callable called by that same executor service [LUCENE-3803] URL: https://github.com/apache/lucene/issues/4876 -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on PR #12653: URL: https://github.com/apache/lucene/pull/12653#issuecomment-1770506196 Thanks for the review @mikemccand ! I have addressed the comments in the new revision. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1770713206 Thank you @gf2121 , it is confirmed. I include just the part of the table that is relevant. It is really great that you caught this. | ID | Description

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770722125 I don't think we should 'add' unsigned vectors format, if it is better we should change to it and remove the signed format. We have to maintain all this stuff. -- This is an automated m

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on code in PR #12694: URL: https://github.com/apache/lucene/pull/12694#discussion_r1365375516 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byte[] b

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770730840 seems like this should be implemented as e.g. ZERO_EXTEND_B2I and ZERO_EXTEND_B2S instead of adding branches to the code and AND instructions. -- This is an automated message from the Ap

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-19 Thread via GitHub
jpountz commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1365380977 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770813179 > Quantizing within `[0-255]` can reduce error. This doesn't make any sense to me, it is 8 bits either way. But supporting _both_ signed and unsigned is a nonstarter for me, it

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
benwtrent commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770816807 > ZERO_EXTEND_B2I and ZERO_EXTEND_B2S instead of adding branches to the code and AND instructions. Thank you! > I don't think we should 'add' unsigned vectors format, if i

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1770827026 I did a first indexing run on wikibigall with the following merge policy, which I tried to make as lightweight as possible: ``` BPIndexReorderer reorderer = new BPIndex

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770836409 The number of formats (float, binary) multiplies by the number of functions (dot product, cosine, square), so you aren't just adding one function here, it is 3. And in the future perhaps i

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770825224 > This is tricky as folks who give Lucene `byte[]` vectors now expected signed operations. While this isn't an issue with euclidean, it is an issue with dot_product, etc. Wouldn't it be a

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770887986 also i'd recommend writing some tests, at least enough to know if the code is viable. It is not clear to me that the vector methods are correct, if they do 16-bit multiplication on two uns

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770900939 This means the only way you can do this correctly, is to remove all 16-bit multiplications and all use of `short` completely and go straight from 8-bit to 32-bit with ZERO_EXTEND_B2I.

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
gf2121 commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365326455 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -827,18 +826,21 @@ int readNextArcLabel(Arc arc, BytesReader in) throws IOException { if (arc

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770907953 fwiw, i think you can keep the performance and solve the last problem by zero-extending twice: 8-16bit, then 16-32bit -- This is an automated message from the Apache Git Service. To resp

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
uschindler commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770911584 > also i'd recommend writing some tests, at least enough to know if the code is viable. It is not clear to me that the vector methods are correct, if they do 16-bit multiplication on

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on code in PR #12694: URL: https://github.com/apache/lucene/pull/12694#discussion_r1365469935 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java: ## @@ -164,6 +173,23 @@ public float cosine(byte[] a, byte[] b) { re

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
uschindler commented on code in PR #12694: URL: https://github.com/apache/lucene/pull/12694#discussion_r1365474987 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byt

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
uschindler commented on code in PR #12694: URL: https://github.com/apache/lucene/pull/12694#discussion_r1365474987 ## lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java: ## @@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byt

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770931189 > There is a test missing in TestVectorUtilSupport that compares the results of vectorized and standard impl. Also some basic tests using extreme vectors should be added due to overflows.

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
uschindler commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770934367 P.S.: It is too bad that we have no C preprocessor so we could expand and inline the methods automatically. We could maybe write a python script that generates the PanamaVectorUtilSup

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1365488019 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-19 Thread via GitHub
rmuir commented on PR #12694: URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770956279 I also question this is the correct design with respect to the hardware. Look at instruction support for doing this stuff which uses signed bytes: https://www.felixcloutier.com/x86/vpdpbus

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1365567273 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1365569612 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365585941 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -135,32 +123,27 @@ public class FSTCompiler { * Instantiates an FST/FSA builder with

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365589420 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -827,18 +826,21 @@ int readNextArcLabel(Arc arc, BytesReader in) throws IOException { if

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365607237 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -20,76 +20,159 @@ import org.apache.lucene.util.packed.PackedInts; import org.apache.lucen

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1771091088 I don't think you need to wrap `ReaderContext` classes -- you can create your new `TimeoutLeafReader` class, subclassing `FilterLeafReader`, and overriding the methods (likely with ad

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
gf2121 commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365627269 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -20,76 +20,159 @@ import org.apache.lucene.util.packed.PackedInts; import org.apache.lucene.ut

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365645503 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-19 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1365648915 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365650001 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365657190 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#12

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771129756 This would be awesome to enable by default. It would somehow disable itself if the application sets its own static index sort? It's odd/curious that `PKLookup` got slower

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1771137767 Thanks @gf2121 -- I agree! So much more intuitive to tell the FST compiler how much RAM it can use to make as minimal an FST as it can. This means we can build bigger FSTs with less

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on PR #12698: URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771145600 > This can go back to 9.x right? I think that too. I rebased the change, re-added the assertion and updated the CHANGES log :) -- This is an automated message from the Ap

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12698: URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771152844 Excellent, thanks @dungba88 -- I'll merge & backport soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365692736 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for e

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771171862 > It would somehow disable itself if the application sets its own static index sort? This is correct. This bit already works on the PR, IndexWriter doesn't check the new met

Re: [PR] Random access term dictionary [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1365702707 ## lucene/core/src/java/module-info.java: ## @@ -35,6 +35,7 @@ exports org.apache.lucene.codecs.lucene95; exports org.apache.lucene.codecs.lucene90.blocktree;

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1365740888 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-19 Thread via GitHub
jpountz commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1365756941 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on PR #12653: URL: https://github.com/apache/lucene/pull/12653#issuecomment-1771277660 @mikemccand I have added a `CHANGES` entry to 9.9. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1771275256 >Did you turn off patching for all encoded int[] blocks (docs, freqs, positions)? Yes, I think so. All uses of `pforUtil` in the postingsReader and writer were replaced with t

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-19 Thread via GitHub
gf2121 commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1365835239 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe }

Re: [PR] Avoid object construction when linear searching arcs [lucene]

2023-10-19 Thread via GitHub
gf2121 merged PR #12692: URL: https://github.com/apache/lucene/pull/12692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365869011 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundati

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771393466 I was just checking out a profile, and with this lightweight BP configuration, we end up spending more time on building the forward index (essentially calling `OfflineSorter` on all

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365895340 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365917081 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,824 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365922840 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-10-19 Thread via GitHub
Shibi-bala commented on code in PR #12626: URL: https://github.com/apache/lucene/pull/12626#discussion_r1365943821 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -1996,6 +1996,41 @@ public void testGetCommitData() throws Exception { dir.close();

[I] Should we handle negative scores due to floating point arithmetic errors? [lucene]

2023-10-19 Thread via GitHub
benwtrent opened a new issue, #12700: URL: https://github.com/apache/lucene/issues/12700 ### Description VectorSimilarityFunction might return negative scores in extreme circumstances. This could happen if `VectorUtil#cosine` returns something like `-1.001` instead of just

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-10-19 Thread via GitHub
KunalSanghvi commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1771980750 @jpountz Hi, I was thinking of solving this error, just wanted to know if it is solved? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-19 Thread via GitHub
dsmiley commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772084185 What say you @jbellis :-) I recommended a module of Lucene when we spoke at Community-over-Code. A dependency outside is okay for non-core. -- This is an automated message fro

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-19 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1772185807 @msokolov I incorporate your change about passing in the executor and addVector from range. I also added the wire up to passing in parameters from VectorFormat all the way in. For the N