Pulkitg64 commented on PR #13401:
URL: https://github.com/apache/lucene/pull/13401#issuecomment-2123880947
@benwtrent @uschindler @ChrisHegarty
Could you please take a look, if you get a chance?
--
This is an automated message from the Apache Git Service.
To respond to the message, plea
gautamworah96 commented on issue #13403:
URL: https://github.com/apache/lucene/issues/13403#issuecomment-2123614671
> I would expect the first stab at dimension reduction would be PQ not PCA.
Hmm. I would've expected the opposite? If the number of dimensions are
reduced, you don't eve
benwtrent commented on issue #13403:
URL: https://github.com/apache/lucene/issues/13403#issuecomment-2123458172
Is PCA ever preferred for vector information retrieval over Product
Quantization? I would expect the first stab at dimension reduction would be PQ
not PCA.
Maybe a first st
ChrisHegarty merged PR #13402:
URL: https://github.com/apache/lucene/pull/13402
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
uschindler commented on code in PR #13402:
URL: https://github.com/apache/lucene/pull/13402#discussion_r1608878571
##
lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.txt:
##
@@ -1,2 +0,0 @@
-The version of VectorUtilPanamaProvider for Java 21 is identical
gautamworah96 opened a new issue, #13403:
URL: https://github.com/apache/lucene/issues/13403
### Description
I opened this issue as a discussion topic. With the advancement in int8,
int4 type vector storage, I believe Lucene takes the unquantized vectors as
inputs, intelligently calc
jmazanec15 commented on issue #13350:
URL: https://github.com/apache/lucene/issues/13350#issuecomment-2123110105
I am trying to understand one thing: Does the corrective offset for dot
product rectify issues with sign shift that is caused by going from signed
domain: [-x, +y] to unsigned do
uschindler commented on PR #13402:
URL: https://github.com/apache/lucene/pull/13402#issuecomment-2123092672
But I tend to not put this into Lucene 9.x. IMHO, it's too risky.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
uschindler commented on PR #13402:
URL: https://github.com/apache/lucene/pull/13402#issuecomment-2123086712
This is not so easy to do. I think we have to clone the whole vector code to
Java 21, but without the memorysegment shortcuts.
I'd suggest:
- keep the Java 20 code as is
-
uschindler commented on PR #13339:
URL: https://github.com/apache/lucene/pull/13339#issuecomment-2123079691
Looks like first Java 22 build also worked fine, so no API incompatibilities
in JDK (foreign preview vs final):
https://jenkins.thetaphi.de/job/Lucene-main-Linux/48322/consoleText
-
Pulkitg64 opened a new pull request, #13401:
URL: https://github.com/apache/lucene/pull/13401
### Description
This PR is to get feedback on the idea and any major changes required in the
commit.
In this commit we are using Java SPI instead of ENUM to define
VectorSimilarityFun
ChrisHegarty merged PR #13339:
URL: https://github.com/apache/lucene/pull/13339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucen
bruno-roustant opened a new pull request, #13400:
URL: https://github.com/apache/lucene/pull/13400
Add IntHashSet and LongHashSet to the HPPC fork. Use them to replace usages
of Set and Set.
Refactor a bit the forked HPPC classes, add tests. On the way I discovered a
small bug in HPP
bruno-roustant merged PR #13392:
URL: https://github.com/apache/lucene/pull/13392
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@luc
benwtrent commented on issue #13350:
URL: https://github.com/apache/lucene/issues/13350#issuecomment-2122846701
I used int7 for my experiments. While losing one bit of precision isn't the
best, it works well.
I explored adding an unsigned byte dot product, but that got rejected as too
uschindler commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608442177
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software
jpountz commented on PR #13359:
URL: https://github.com/apache/lucene/pull/13359#issuecomment-2122746505
It creates a 50GB terms dictionary while my machine only has ~28GB of RAM
for the page cache, so many terms dictionary lookups result in page faults.
--
This is an automated message fr
mikemccand commented on PR #13359:
URL: https://github.com/apache/lucene/pull/13359#issuecomment-2122733760
> But I created a benchmark that starts looking like running a Lucene query
that is encouraging
Was this with a forced-cold index?
--
This is an automated message from the Ap
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608349500
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Softwar
mikemccand commented on issue #13350:
URL: https://github.com/apache/lucene/issues/13350#issuecomment-2122686859
Thank you @naveentatikonda for the deep dive here and a nice unit test ... I
couldn't follow all of the logic you described, but if we are indeed first
normalizing a dimension's
romseygeek commented on code in PR #13315:
URL: https://github.com/apache/lucene/pull/13315#discussion_r1608372668
##
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java:
##
@@ -194,6 +194,15 @@ private DisjunctionMatchesIterator(List
matches) throws I
mikemccand commented on PR #13315:
URL: https://github.com/apache/lucene/pull/13315#issuecomment-2122672757
What a fun and tricky corner case -- thank you @scampi for uncovering this,
showing the bug with the added unit tests, and the tentative fix.
I think it is actually technically
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608349500
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Softwar
mikemccand commented on code in PR #13315:
URL: https://github.com/apache/lucene/pull/13315#discussion_r1608345538
##
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java:
##
@@ -194,6 +194,15 @@ private DisjunctionMatchesIterator(List
matches) throws I
mikemccand commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2122642094
I like this idea! I hope we can find a simple enough API exposed through
IWC to enable the optional grouping.
This also has nice mechanical sympathy / symmetry with the di
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608169944
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Softwar
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608169944
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Softwar
uschindler commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608136276
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentFlatVectorsScorer.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software
alessandrobenedetti commented on PR #13399:
URL: https://github.com/apache/lucene/pull/13399#issuecomment-2122366365
Obviously no hard opinion in naming sub-packages or how to group classes,
but my feeling is that the general audience would benefit
--
This is an automated message from the
alessandrobenedetti commented on PR #13399:
URL: https://github.com/apache/lucene/pull/13399#issuecomment-2122364895
> Does it actually improve readability? I know some Java projects like to be
very granular in how they organize packages, but I've come to like Lucene's
relatively flat packa
uschindler commented on PR #13339:
URL: https://github.com/apache/lucene/pull/13339#issuecomment-2122358609
> > We may add a method like getByteBufferSlice().
>
> I experimented locally with similar before, and the performance impact
when converting to/from MemorySegment was horri
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608103780
##
lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerUtil.java:
##
@@ -35,6 +35,6 @@ private FlatVectorScorerUtil() {}
* on certain platforms
jpountz commented on PR #13399:
URL: https://github.com/apache/lucene/pull/13399#issuecomment-2122338445
Does it actually improve readability? I know some Java projects like to be
very granular in how they organize packages, but I've come to like Lucene's
relatively flat package structure,
uschindler commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608084373
##
lucene/core/src/java/org/apache/lucene/codecs/hnsw/FlatVectorScorerUtil.java:
##
@@ -35,6 +35,6 @@ private FlatVectorScorerUtil() {}
* on certain platforms.
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608072908
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java:
##
@@ -73,4 +75,9 @@ private static T doPrivileged(Privilege
ChrisHegarty commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1608070916
##
lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java:
##
@@ -91,6 +92,8 @@ public static VectorizationProvider getInstance(
alessandrobenedetti opened a new pull request, #13399:
URL: https://github.com/apache/lucene/pull/13399
### Description
The code for vector formats in the core codec package grew up quite
consistently, impacting readability and maintainability.
My main concerns are around duplicate
jpountz closed issue #13396: Test failure in TestBlockMaxConjunction.testRandom.
URL: https://github.com/apache/lucene/issues/13396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
jpountz merged PR #13397:
URL: https://github.com/apache/lucene/pull/13397
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz closed issue #13371: Reproducible failure
org.apache.lucene.search.TestBlockMaxConjunction
URL: https://github.com/apache/lucene/issues/13371
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
jpountz closed issue #13396: Test failure in TestBlockMaxConjunction.testRandom.
URL: https://github.com/apache/lucene/issues/13396
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
jpountz commented on PR #13221:
URL: https://github.com/apache/lucene/pull/13221#issuecomment-219842
> I am so excited to see if this (nightly benchmarks auto-regolding) finally
works
Looks like it worked!
https://people.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html
uschindler commented on code in PR #13339:
URL: https://github.com/apache/lucene/pull/13339#discussion_r1607992706
##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/PanamaVectorizationProvider.java:
##
@@ -73,4 +75,9 @@ private static T doPrivileged(PrivilegedA
vsop-479 commented on PR #13395:
URL: https://github.com/apache/lucene/pull/13395#issuecomment-2122131720
@mikemccand
Please take a look when you get a chance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
vsop-479 commented on PR #13398:
URL: https://github.com/apache/lucene/pull/13398#issuecomment-2122130138
@mikemccand
Please take a look when you get a chance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
vsop-479 opened a new pull request, #13398:
URL: https://github.com/apache/lucene/pull/13398
### Description
When a segment is already fully deleted by prior `delTerms` or `delQueries`,
in `FrozenBufferedUpdates.applyQueryDeletes` and
`FrozenBufferedUpdates.applyTermDeletes`.
We c
uschindler commented on PR #13389:
URL: https://github.com/apache/lucene/pull/13389#issuecomment-2122073560
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsub
romseygeek closed issue #13388: Add method to `Intervals#noIntervals(String
reason)` to `Intervals` class
URL: https://github.com/apache/lucene/issues/13388
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
romseygeek merged PR #13389:
URL: https://github.com/apache/lucene/pull/13389
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
jpountz opened a new pull request, #13397:
URL: https://github.com/apache/lucene/pull/13397
It sums up max scores in a float when it should sum them up in a double like
we do for `Scorer#score()`. Otherwise, max scores may be returned that are less
than actual scores.
This bug was in
jpountz commented on issue #13396:
URL: https://github.com/apache/lucene/issues/13396#issuecomment-2121985101
Thanks, I had started looking into #13371 but this one was easier to debug
and I could figure out the problem. I'll open a PR shortly.
--
This is an automated message from the Apa
jpountz merged PR #13381:
URL: https://github.com/apache/lucene/pull/13381
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
52 matches
Mail list logo