jpountz commented on code in PR #12685:
URL: https://github.com/apache/lucene/pull/12685#discussion_r1365023966
##
lucene/CHANGES.txt:
##
@@ -147,9 +147,13 @@ API Changes
New Features
-
+
* GITHUB#12548: Added similarityToQueryVector API to compute vecto
mikemccand commented on PR #12690:
URL: https://github.com/apache/lucene/pull/12690#issuecomment-1770366875
Actually, I'm nervous about my git merging skills! Let's try to review/push
the [RAM limited FST building PR](https://github.com/apache/lucene/pull/12633)
first?
--
This is an aut
dungba88 commented on PR #12690:
URL: https://github.com/apache/lucene/pull/12690#issuecomment-1770371305
That's ok for me too! I'll rebase after the other PR is merged
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365175738
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,6 +63,9 @@ public abstract class MultiLevelSkipListWriter {
/** for e
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365181118
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365184008
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -130,12 +129,14 @@ public void bufferSkip(int df) throws IOException {
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365181118
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for
zouxiang1993 closed pull request #12602: Reduce collection operations when
minShouldMatch == 0.
URL: https://github.com/apache/lucene/pull/12602
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the s
mikemccand commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1770448647
I'll try to review this soon -- it sounds compelling @Tony-X! I like how it
is inspired by Tantivy's term dictionary format (which holds all terms + their
metadata in RAM).
Al
mikemccand commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770452771
That's a neat idea (separate codec that trades off index size for faster
search performance). Maybe it could also fold in the [fully in RAM FST term
dictionary](https://github.c
mikemccand commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770457453
> Posting results below
The results are impressive! Conjunctive (-like) queries see sizable gains.
Did you turn off patching for all encoded `int[]` blocks (docs, fr
mikemccand commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1770461719
Another exciting optimization such a "patch-less" encoding could implement
is within-block skipping (I believe Tantivy does this).
Today, our skipper is forced to align to
javanna commented on issue #4876:
URL: https://github.com/apache/lucene/issues/4876#issuecomment-1770467481
I stumbled upon this issue by coincidence, and I believe it has been
addressed by https://github.com/apache/lucene/pull/12569 . There are real
situations where this can happen now as
javanna closed issue #4876: Using a searcher with an executor service does not
work from within a Callable called by that same executor service [LUCENE-3803]
URL: https://github.com/apache/lucene/issues/4876
--
This is an automated message from the Apache Git Service.
To respond to the messag
shubhamvishu commented on PR #12653:
URL: https://github.com/apache/lucene/pull/12653#issuecomment-1770506196
Thanks for the review @mikemccand ! I have addressed the comments in the new
revision.
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
rmuir commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1770713206
Thank you @gf2121 , it is confirmed. I include just the part of the table
that is relevant. It is really great that you caught this.
| ID | Description
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770722125
I don't think we should 'add' unsigned vectors format, if it is better we
should change to it and remove the signed format. We have to maintain all this
stuff.
--
This is an automated m
rmuir commented on code in PR #12694:
URL: https://github.com/apache/lucene/pull/12694#discussion_r1365375516
##
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byte[] b
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770730840
seems like this should be implemented as e.g. ZERO_EXTEND_B2I and
ZERO_EXTEND_B2S instead of adding branches to the code and AND instructions.
--
This is an automated message from the Ap
jpountz commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1365380977
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770813179
> Quantizing within `[0-255]` can reduce error.
This doesn't make any sense to me, it is 8 bits either way.
But supporting _both_ signed and unsigned is a nonstarter for me, it
benwtrent commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770816807
> ZERO_EXTEND_B2I and ZERO_EXTEND_B2S instead of adding branches to the code
and AND instructions.
Thank you!
> I don't think we should 'add' unsigned vectors format, if i
jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1770827026
I did a first indexing run on wikibigall with the following merge policy,
which I tried to make as lightweight as possible:
```
BPIndexReorderer reorderer = new BPIndex
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770836409
The number of formats (float, binary) multiplies by the number of functions
(dot product, cosine, square), so you aren't just adding one function here, it
is 3. And in the future perhaps i
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770825224
> This is tricky as folks who give Lucene `byte[]` vectors now expected
signed operations. While this isn't an issue with euclidean, it is an issue
with dot_product, etc. Wouldn't it be a
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770887986
also i'd recommend writing some tests, at least enough to know if the code
is viable. It is not clear to me that the vector methods are correct, if they
do 16-bit multiplication on two uns
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770900939
This means the only way you can do this correctly, is to remove all 16-bit
multiplications and all use of `short` completely and go straight from 8-bit to
32-bit with ZERO_EXTEND_B2I.
gf2121 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1365326455
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -827,18 +826,21 @@ int readNextArcLabel(Arc arc, BytesReader in) throws
IOException {
if (arc
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770907953
fwiw, i think you can keep the performance and solve the last problem by
zero-extending twice: 8-16bit, then 16-32bit
--
This is an automated message from the Apache Git Service.
To resp
uschindler commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770911584
> also i'd recommend writing some tests, at least enough to know if the code
is viable. It is not clear to me that the vector methods are correct, if they
do 16-bit multiplication on
rmuir commented on code in PR #12694:
URL: https://github.com/apache/lucene/pull/12694#discussion_r1365469935
##
lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java:
##
@@ -164,6 +173,23 @@ public float cosine(byte[] a, byte[] b) {
re
uschindler commented on code in PR #12694:
URL: https://github.com/apache/lucene/pull/12694#discussion_r1365474987
##
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byt
uschindler commented on code in PR #12694:
URL: https://github.com/apache/lucene/pull/12694#discussion_r1365474987
##
lucene/core/src/java20/org/apache/lucene/internal/vectorization/PanamaVectorUtilSupport.java:
##
@@ -352,6 +382,11 @@ private int dotProductBody512(byte[] a, byt
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770931189
> There is a test missing in TestVectorUtilSupport that compares the results
of vectorized and standard impl. Also some basic tests using extreme vectors
should be added due to overflows.
uschindler commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770934367
P.S.: It is too bad that we have no C preprocessor so we could expand and
inline the methods automatically. We could maybe write a python script that
generates the PanamaVectorUtilSup
mikemccand commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1365488019
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
rmuir commented on PR #12694:
URL: https://github.com/apache/lucene/pull/12694#issuecomment-1770956279
I also question this is the correct design with respect to the hardware.
Look at instruction support for doing this stuff which uses signed bytes:
https://www.felixcloutier.com/x86/vpdpbus
dungba88 commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1365567273
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
dungba88 commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1365569612
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1365585941
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -135,32 +123,27 @@ public class FSTCompiler {
* Instantiates an FST/FSA builder with
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1365589420
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -827,18 +826,21 @@ int readNextArcLabel(Arc arc, BytesReader in) throws
IOException {
if
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1365607237
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -20,76 +20,159 @@
import org.apache.lucene.util.packed.PackedInts;
import org.apache.lucen
mikemccand commented on PR #12345:
URL: https://github.com/apache/lucene/pull/12345#issuecomment-1771091088
I don't think you need to wrap `ReaderContext` classes -- you can create
your new `TimeoutLeafReader` class, subclassing `FilterLeafReader`, and
overriding the methods (likely with ad
gf2121 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1365627269
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -20,76 +20,159 @@
import org.apache.lucene.util.packed.PackedInts;
import org.apache.lucene.ut
mikemccand commented on code in PR #12698:
URL: https://github.com/apache/lucene/pull/12698#discussion_r1365645503
##
lucene/CHANGES.txt:
##
@@ -325,6 +325,8 @@ Bug Fixes
* GITHUB#12571: Fix HNSW graph read bug when built with excessive connections.
(Ben Trent).
+* GITHUB#
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1365648915
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
mikemccand commented on code in PR #12698:
URL: https://github.com/apache/lucene/pull/12698#discussion_r1365650001
##
lucene/CHANGES.txt:
##
@@ -325,6 +325,8 @@ Bug Fixes
* GITHUB#12571: Fix HNSW graph read bug when built with excessive connections.
(Ben Trent).
+* GITHUB#
dungba88 commented on code in PR #12698:
URL: https://github.com/apache/lucene/pull/12698#discussion_r1365657190
##
lucene/CHANGES.txt:
##
@@ -325,6 +325,8 @@ Bug Fixes
* GITHUB#12571: Fix HNSW graph read bug when built with excessive connections.
(Ben Trent).
+* GITHUB#12
mikemccand commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771129756
This would be awesome to enable by default. It would somehow disable itself
if the application sets its own static index sort?
It's odd/curious that `PKLookup` got slower
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1771137767
Thanks @gf2121 -- I agree! So much more intuitive to tell the FST compiler
how much RAM it can use to make as minimal an FST as it can. This means we can
build bigger FSTs with less
dungba88 commented on PR #12698:
URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771145600
> This can go back to 9.x right?
I think that too.
I rebased the change, re-added the assertion and updated the CHANGES log :)
--
This is an automated message from the Ap
mikemccand commented on PR #12698:
URL: https://github.com/apache/lucene/pull/12698#issuecomment-1771152844
Excellent, thanks @dungba88 -- I'll merge & backport soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
mikemccand commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365692736
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for e
jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771171862
> It would somehow disable itself if the application sets its own static
index sort?
This is correct. This bit already works on the PR, IndexWriter doesn't check
the new met
mikemccand commented on code in PR #12688:
URL: https://github.com/apache/lucene/pull/12688#discussion_r1365702707
##
lucene/core/src/java/module-info.java:
##
@@ -35,6 +35,7 @@
exports org.apache.lucene.codecs.lucene95;
exports org.apache.lucene.codecs.lucene90.blocktree;
mikemccand commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1365740888
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
jpountz commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1365756941
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for
shubhamvishu commented on PR #12653:
URL: https://github.com/apache/lucene/pull/12653#issuecomment-1771277660
@mikemccand I have added a `CHANGES` entry to 9.9. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
slow-J commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1771275256
>Did you turn off patching for all encoded int[] blocks (docs, freqs,
positions)?
Yes, I think so. All uses of `pforUtil` in the postingsReader and writer
were replaced with t
shubhamvishu commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1365782486
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for
gf2121 commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1365835239
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
}
gf2121 merged PR #12692:
URL: https://github.com/apache/lucene/pull/12692
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365869011
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java:
##
@@ -0,0 +1,231 @@
+/*
+ * Licensed to the Apache Software Foundati
jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1771393466
I was just checking out a profile, and with this lightweight BP
configuration, we end up spending more time on building the forward index
(essentially calling `OfflineSorter` on all
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -0,0 +1,628 @@
+/*
+ * Licensed to the Apache Software Foundati
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365880082
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -0,0 +1,628 @@
+/*
+ * Licensed to the Apache Software Foundati
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -0,0 +1,628 @@
+/*
+ * Licensed to the Apache Software Foundati
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365887225
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -0,0 +1,628 @@
+/*
+ * Licensed to the Apache Software Foundati
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365895340
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -0,0 +1,628 @@
+/*
+ * Licensed to the Apache Software Foundati
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365917081
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,824 @@
+/*
+ * Licensed to the Apache Softwa
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1365922840
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software F
Shibi-bala commented on code in PR #12626:
URL: https://github.com/apache/lucene/pull/12626#discussion_r1365943821
##
lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java:
##
@@ -1996,6 +1996,41 @@ public void testGetCommitData() throws Exception {
dir.close();
benwtrent opened a new issue, #12700:
URL: https://github.com/apache/lucene/issues/12700
### Description
VectorSimilarityFunction might return negative scores in extreme
circumstances.
This could happen if `VectorUtil#cosine` returns something like `-1.001`
instead of just
KunalSanghvi commented on issue #12675:
URL: https://github.com/apache/lucene/issues/12675#issuecomment-1771980750
@jpountz Hi, I was thinking of solving this error, just wanted to know if it
is solved?
--
This is an automated message from the Apache Git Service.
To respond to the message
dsmiley commented on issue #12615:
URL: https://github.com/apache/lucene/issues/12615#issuecomment-1772084185
What say you @jbellis :-)
I recommended a module of Lucene when we spoke at Community-over-Code. A
dependency outside is okay for non-core.
--
This is an automated message fro
zhaih commented on PR #12660:
URL: https://github.com/apache/lucene/pull/12660#issuecomment-1772185807
@msokolov I incorporate your change about passing in the executor and
addVector from range. I also added the wire up to passing in parameters from
VectorFormat all the way in.
For the N
77 matches
Mail list logo