Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1365567273 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1365569612 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365585941 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -135,32 +123,27 @@ public class FSTCompiler { * Instantiates an FST/FSA builder with

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365589420 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -827,18 +826,21 @@ int readNextArcLabel(Arc arc, BytesReader in) throws IOException { if

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365607237 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -20,76 +20,159 @@ import org.apache.lucene.util.packed.PackedInts; import org.apache.lucen

Re: [PR] LUCENE-10641: IndexSearcher#setTimeout should also abort query rewrites, point ranges and vector searches [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12345: URL: https://github.com/apache/lucene/pull/12345#issuecomment-1771091088 I don't think you need to wrap `ReaderContext` classes -- you can create your new `TimeoutLeafReader` class, subclassing `FilterLeafReader`, and overriding the methods (likely with ad

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
gf2121 commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1365627269 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -20,76 +20,159 @@ import org.apache.lucene.util.packed.PackedInts; import org.apache.lucene.ut

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365645503 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-19 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1365648915 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365650001 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on code in PR #12698: URL: https://github.com/apache/lucene/pull/12698#discussion_r1365657190 ## lucene/CHANGES.txt: ## @@ -325,6 +325,8 @@ Bug Fixes * GITHUB#12571: Fix HNSW graph read bug when built with excessive connections. (Ben Trent). +* GITHUB#12

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771129756 This would be awesome to enable by default. It would somehow disable itself if the application sets its own static index sort? It's odd/curious that `PKLookup` got slower

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1771137767 Thanks @gf2121 -- I agree! So much more intuitive to tell the FST compiler how much RAM it can use to make as minimal an FST as it can. This means we can build bigger FSTs with less

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
dungba88 commented on PR #12698: URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771145600 > This can go back to 9.x right? I think that too. I rebased the change, re-added the assertion and updated the CHANGES log :) -- This is an automated message from the Ap

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on PR #12698: URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771152844 Excellent, thanks @dungba88 -- I'll merge & backport soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365692736 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for e

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771171862 > It would somehow disable itself if the application sets its own static index sort? This is correct. This bit already works on the PR, IndexWriter doesn't check the new met

Re: [PR] Random access term dictionary [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12688: URL: https://github.com/apache/lucene/pull/12688#discussion_r1365702707 ## lucene/core/src/java/module-info.java: ## @@ -35,6 +35,7 @@ exports org.apache.lucene.codecs.lucene95; exports org.apache.lucene.codecs.lucene90.blocktree;

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-19 Thread via GitHub
mikemccand commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1365740888 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-19 Thread via GitHub
jpountz commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1365756941 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on PR #12653: URL: https://github.com/apache/lucene/pull/12653#issuecomment-1771277660 @mikemccand I have added a `CHANGES` entry to 9.9. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-19 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1771275256 >Did you turn off patching for all encoded int[] blocks (docs, freqs, positions)? Yes, I think so. All uses of `pforUtil` in the postingsReader and writer were replaced with t

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-19 Thread via GitHub
shubhamvishu commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-19 Thread via GitHub
gf2121 commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1365835239 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe }

Re: [PR] Avoid object construction when linear searching arcs [lucene]

2023-10-19 Thread via GitHub
gf2121 merged PR #12692: URL: https://github.com/apache/lucene/pull/12692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365869011 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundati

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-19 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771393466 I was just checking out a profile, and with this lightweight BP configuration, we end up spending more time on building the forward index (essentially calling `OfflineSorter` on all

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365895340 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -0,0 +1,628 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365917081 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,824 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-19 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1365922840 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software F

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-10-19 Thread via GitHub
Shibi-bala commented on code in PR #12626: URL: https://github.com/apache/lucene/pull/12626#discussion_r1365943821 ## lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java: ## @@ -1996,6 +1996,41 @@ public void testGetCommitData() throws Exception { dir.close();

[I] Should we handle negative scores due to floating point arithmetic errors? [lucene]

2023-10-19 Thread via GitHub
benwtrent opened a new issue, #12700: URL: https://github.com/apache/lucene/issues/12700 ### Description VectorSimilarityFunction might return negative scores in extreme circumstances. This could happen if `VectorUtil#cosine` returns something like `-1.001` instead of just

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-10-19 Thread via GitHub
KunalSanghvi commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1771980750 @jpountz Hi, I was thinking of solving this error, just wanted to know if it is solved? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-19 Thread via GitHub
dsmiley commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772084185 What say you @jbellis :-) I recommended a module of Lucene when we spoke at Community-over-Code. A dependency outside is okay for non-core. -- This is an automated message fro

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-19 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1772185807 @msokolov I incorporate your change about passing in the executor and addVector from range. I also added the wire up to passing in parameters from VectorFormat all the way in. For the N

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-10-20 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1772314503 Thanks @msokolov : These are really good suggestions. I will try to incorporate these ideas in solutions. I think in the end there can be multiple ways to allow more connecti

Re: [I] Optimize FST suffix sharing for block tree index [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on issue #12702: URL: https://github.com/apache/lucene/issues/12702#issuecomment-1772457667 > The floor data is guaranteed to be stored within single arc (never be prefix shared) in FST because fp is encoded before it. But won't the leading bytes of `fp` be shared

[PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty opened a new pull request, #12703: URL: https://github.com/apache/lucene/pull/12703 [ This PR is draft - not ready to me merged. It is intended to help facilitate a discussion ] This PR enhances the vector similarity functions so that they can access the underlying memor

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772530717 Some benchmark results. Mac M2, 128 bit ``` INFO: Java vector incubator API enabled; uses preferredBitSize=128 ... Benchmark (si

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772535368 Thanks @rmuir @gf2121 I need to spend a bit more evaluating this. But it looks like no action is needed here? -- This is an automated message from the Apache Git Service. To res

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-20 Thread via GitHub
mikemccand merged PR #12698: URL: https://github.com/apache/lucene/pull/12698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] ArrayIndexOutOfBoundsException when writing the FSTStore-backed FST with different DataOutput for meta [lucene]

2023-10-20 Thread via GitHub
mikemccand closed issue #12697: ArrayIndexOutOfBoundsException when writing the FSTStore-backed FST with different DataOutput for meta URL: https://github.com/apache/lucene/issues/12697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
bruno-roustant commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1366758770 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -20,76 +20,161 @@ import org.apache.lucene.util.packed.PackedInts; import org.apache.l

Re: [PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on PR #12698: URL: https://github.com/apache/lucene/pull/12698#issuecomment-1772566523 Thanks @dungba88 -- I merged to `main` and `branch_9x`! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Random access term dictionary [lucene]

2023-10-20 Thread via GitHub
bruno-roustant commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1772589968 I'll also try to review! On the bit packing subject, I have some handy generic code (not in Lucene yet) to write and read variable size bits. Tell me if you are interested. --

Re: [PR] [DRAFT] Concurrent HNSW Merge [lucene]

2023-10-20 Thread via GitHub
benwtrent commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1772603178 This is awesome. I am so happy it's a clean change without tons of complexity and we still get 4x speed up with additional threads. I will give it a review this weekend or early

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1366900931 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -135,32 +123,28 @@ public class FSTCompiler { * Instantiates an FST/FSA builder with

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1366903022 ## lucene/core/src/java/org/apache/lucene/util/packed/AbstractPagedMutable.java: ## @@ -110,8 +110,10 @@ protected long baseRamBytesUsed() { public long ramBytes

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1366910269 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -99,20 +184,18 @@ private long hash(FSTCompiler.UnCompiledNode node) { h += 17;

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772656211 @ChrisHegarty there are plenty of actions we could take... but I implemented this specific same optimization in question safely in #12681 See https://en.wikipedia.org/wiki/Advanced_

[I] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-20 Thread via GitHub
mikemccand opened a new issue, #12704: URL: https://github.com/apache/lucene/issues/12704 ### Description Spinoff from [this cool comment](https://github.com/apache/lucene/pull/12633#discussion_r1366847986), thanks to hashing guru @bruno-roustant: ``` Instead, we should mul

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1366913164 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -99,20 +184,18 @@ private long hash(FSTCompiler.UnCompiledNode node) { h += 17;

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772661255 to have any decent performance, we really need information on the CPU in question and its vector capabilities. And the idea you can write "one loop" that "runs anywhere" is an obvious pipe

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772669239 also i think JMH is bad news when it comes to downclocking. It does not show the true performance impact of this. It slows down other things on the machine as well: the user might have oth

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1772672994 I'll also confirm `Test2BFST` still passes ... soon this test will no longer require a 35 GB heap to run! -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772673981 > to have any decent performance, we really need information on the CPU in question and its vector capabilities. And the idea you can write "one loop" that "runs anywhere" is an obvio

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772679957 > Unfortunately this approach is slightly suboptimal for your Rocket Lake which doesn't suffer from downclocking, but it is a disaster elsewhere, so we have to play it safe. We

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772695285 Vector API should also fix its bugs. It is totally senseless to have `IntVector.SPECIES_PREFERRED` and `FloatVector.SPECIES_PREFERRED` and then always set them to '512' on every avx-512 ma

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772702122 I would really just fix the api: instead of `IntVector.SPECIES_PREFERRED` constant which is meaningless, it should be a method taking `VectorOperation...` about how you plan to use it. it

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-20 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772704049 Responding top to bottom, > I wonder how much the speed difference is due to (1) Vectors being out of memory (and if they used PQ for diskann, if they did, we should test PQ w

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772706786 such a method would solve 95% of my problems, if it would throw UnsupportedOperationException or return `null` if the hardware/hotspot doesnt support all the requested VectorOperators. -

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-20 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772711571 > DiskANN is known to be slower at indexing than HNSW I don't remember the numbers here, maybe 10% slower? It wasn't material enough to make me worry about it. (This is wit

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-20 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772722737 > It is possible that the candidate postings (gathered via HNSW) don't contain ANY filtered docs. This would require gathering more candidate postings. This was a big problem

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-20 Thread via GitHub
jbellis commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772724758 > Or perhaps we "just" make a Lucene Codec component (KnnVectorsFormat) that wraps jvector? (https://github.com/jbellis/jvector) I'm happy to support anyone who wants to try t

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-20 Thread via GitHub
easyice commented on PR #12658: URL: https://github.com/apache/lucene/pull/12658#issuecomment-1772756782 I think we can only use this optimization without deleted docs for merges, because we can't use the cardinality of `liveDocs` as docCount, the `liveDocs` is set to 1 when initialized.

Re: [PR] Avoid use docsSeen in BKDWriter [lucene]

2023-10-20 Thread via GitHub
easyice commented on code in PR #12658: URL: https://github.com/apache/lucene/pull/12658#discussion_r1366998517 ## lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java: ## @@ -519,9 +526,8 @@ private Runnable writeFieldNDims( // compute the min/max for this slice

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772787526 Thanks for investigating this! Can we just fix vector code to take MemorySegment and wrap array code? I don't think we should add yet another factor to multiply the number of vector

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772792213 as far as performance in practice, what kind of alignment is necessary such that it is reasonable for mmap'd files? Please, let it not be 64 bytes alignment for avx-512, that's too wastefu

Re: [I] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-20 Thread via GitHub
bruno-roustant commented on issue #12704: URL: https://github.com/apache/lucene/issues/12704#issuecomment-1772810122 @dweiss will probably say more than me about the awesome BitMixer#PHI_C64 constant! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on PR #12653: URL: https://github.com/apache/lucene/pull/12653#issuecomment-1772824555 Thanks @shubhamvishu -- looks great! I plan to merge later today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1367036009 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for e

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-20 Thread via GitHub
shubhamvishu commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1367106152 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-20 Thread via GitHub
shubhamvishu commented on PR #12682: URL: https://github.com/apache/lucene/pull/12682#issuecomment-1772913419 Thanks for the approval @jpountz ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772915097 > > Unfortunately this approach is slightly suboptimal for your Rocket Lake which doesn't suffer from downclocking, but it is a disaster elsewhere, so we have to play it safe. > > W

Re: [PR] SOLR-15055 Re-implement 'withCollection' and 'maxShardsPerNode' [lucene-solr]

2023-10-20 Thread via GitHub
ljak commented on PR #2179: URL: https://github.com/apache/lucene-solr/pull/2179#issuecomment-1772924915 Hi, I know it's an old thread but I have a question. As far as I can tell (after searching), the `maxShardsPerNode` function wasn't re-implemented right (in the new autoscal

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772952661 Hi, > The jvm already has these. For example a user can set max vector width and avx instructiom level already. I assume that avx 512 users who are running on downclock-suscept

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772963134 @rmuir If I understand your comment correctly. I unaligned the vector data in the mmap file, in the benchmark. The results are similar enough to the aligned, maybe a little less wh

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1772974346 > Thanks for investigating this! Can we just fix vector code to take MemorySegment and wrap array code? Yes, that is a good idea. I'll do it and see how poorly is performs. I

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1772982177 Just dropping an ACK here, for now. I do get the issues, and I agree that there could be better ways to frame things at the vector API level. -- This is an automated message from

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1772988443 `Test2BFST` passed! ``` The slowest tests (exceeding 500 ms) during this run:

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-20 Thread via GitHub
mikemccand merged PR #12633: URL: https://github.com/apache/lucene/pull/12633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality [lucene]

2023-10-20 Thread via GitHub
mikemccand commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1772993863 I've merged the change into `main`! I'll let it bake for some time (week or two?) and if all looks good, backport to 9.x. -- This is an automated message from the Apache Git S

Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-20 Thread via GitHub
uschindler commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1773038045 > Just dropping an ACK here, for now. I do get the issues, and I agree that there could be better ways to frame things at the vector API level. Let's write a proposal together i

Re: [PR] Random access term dictionary [lucene]

2023-10-20 Thread via GitHub
Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1773113866 Thanks @bruno-roustant ! If you're okay to share it feel free to share it here. I'm in the process of baking my own specific implementation (which internally uses a single long as

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773365569 Well... as simple wrapping of float[] into MemorySegment is not going to work out, the Vector API does not like it due to alignment constraints (which seems overly pedantic since it

Re: [PR] Capture build scans on ge.apache.org to benefit from deep build insights [lucene]

2023-10-20 Thread via GitHub
dsmiley commented on PR #12293: URL: https://github.com/apache/lucene/pull/12293#issuecomment-1773459815 I'm eager to see the kind of build insights Gradle Enterprise offers us. If there are no further concerns, I'll merge Tuesday. -- This is an automated message from the Apache Git Serv

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-20 Thread via GitHub
rmuir commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773612369 > Well... as simple wrapping of float[] into MemorySegment is not going to work out, the Vector API does not like it due to alignment constraints (which seems overly pedantic since it can

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-20 Thread via GitHub
dungba88 commented on PR #12690: URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773630051 As the other PR has been merged, I have rebased and resolved the conflict -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check that the "closed" condition) [lucene]

2023-10-21 Thread via GitHub
uschindler opened a new pull request, #12705: URL: https://github.com/apache/lucene/pull/12705 See the dev thread by @msokolov @ https://lists.apache.org/thread/qts8wvrjs54gkgz04pk4p93fg0wjbq3o The handling of NPE is very special in ByteBufferIndexInput and also MemorySegmentIndexIn

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-21 Thread via GitHub
uschindler commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773761875 Hi, I was also thinking about this but came to a bit different setup. My problem here is that it is directly linking the code in the Java 20+ code to each other and adding instance

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub
mikemccand commented on PR #12690: URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773767253 Thanks @dungba88 -- looks great, I'll merge soon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-21 Thread via GitHub
mikemccand merged PR #12653: URL: https://github.com/apache/lucene/pull/12653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-21 Thread via GitHub
mikemccand commented on PR #12653: URL: https://github.com/apache/lucene/pull/12653#issuecomment-1773768806 I merged to `main` and `9.x` (9.9)! Thanks @shubhamvishu. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub
mikemccand merged PR #12690: URL: https://github.com/apache/lucene/pull/12690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-21 Thread via GitHub
mikemccand commented on PR #12690: URL: https://github.com/apache/lucene/pull/12690#issuecomment-1773775196 Thanks @dungba88 -- I'll wait to backport this until after backporting https://github.com/apache/lucene/pull/12633 -- This is an automated message from the Apache Git Service. To re

<    9   10   11   12   13   14   15   16   17   18   >