gf2121 commented on PR #12632:
URL: https://github.com/apache/lucene/pull/12632#issuecomment-1770155713
> @gf2121 i think we could diagnose it further with
https://github.com/travisdowns/avx-turbo
Thanks @rmuir for profile guide!
Sorry for the delay. It took me some time to app
dungba88 commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1364843168
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
dweiss commented on issue #12697:
URL: https://github.com/apache/lucene/issues/12697#issuecomment-1770065868
Thank you for cleaning up these classes - it's a big improvement!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
gf2121 commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1364900678
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
}
gf2121 closed pull request #12661: Optimize outputs accumulating for
SegmentTermsEnum and IntersectTermsEnum
URL: https://github.com/apache/lucene/pull/12661
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above t
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1769976958
The direction has changed and the conversation is too long to track so i
raised https://github.com/apache/lucene/pull/12699 to make a summary of these.
--
This is an automated message f
gf2121 opened a new pull request, #12699:
URL: https://github.com/apache/lucene/pull/12699
## Description
Previous talk is too long to track so i opened a new PR to make a summery
here. More details are available in https://github.com/apache/lucene/pull/12661.
After merging of http
dungba88 commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1364843168
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
dungba88 opened a new pull request, #12698:
URL: https://github.com/apache/lucene/pull/12698
### Description
Fix for #12697. This will move the writing of numBytes from
`OnHeapFSTStore.writeTo()` to `FST.save()`.
`OnHeapFSTStore.size()` will also be modified to return only the
dungba88 opened a new issue, #12697:
URL: https://github.com/apache/lucene/issues/12697
### Description
After writing the FSTStore-backed FST to DataOutput, and specifying a
different DataOutput for meta, if we try to read from these (using the FST
public ctor) we will get the follow
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769299867
Here is the gist of my benchmark:
https://gist.github.com/kaivalnp/79808017ed7666214540213d1e2a21cf
I'm calculating the baseline / individual results as "count of vectors above
t
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769235109
Thanks for running this @benwtrent!
I just had a couple of questions:
1. What was your baseline in the test? If the baseline / goal is to "get the
K-Nearest Neighbors", then th
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769218762
@kaivalnp I see the issue with my test, you are specifically testing
"post-filtering" on the top values, not just getting the top10 k. I understand
my issue.
Could you post your
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769123216
OK, I tried testing with KnnGraphTester.
I indexed 100_000 normalized Cohere vectors (768 dims).
With regular knn, recall@10:
```
recall latency nDocfa
gsmiller commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1769095158
These results are really interesting! As another option, I wonder if it's
worth thinking about this problem as a new codec (sandbox module to start?)
that biases towards query spee
slow-J opened a new issue, #12696:
URL: https://github.com/apache/lucene/issues/12696
### Description
Background: In https://github.com/Tony-X/search-benchmark-game we were
comparing performance of Tantivy and Lucene. "One difference between Lucene and
Tantivy is Lucene uses the "pat
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768958493
Sorry for the confusion, I tried renaming the branch from
`radius-based-vector-search` to `similarity-based-vector-search` and the PR
closed automatically. I guess I'm stuck with this b
kaivalnp closed pull request #12679: Add support for similarity-based vector
searches
URL: https://github.com/apache/lucene/pull/12679
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific co
mikemccand commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1364204027
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter {
/** for e
mikemccand commented on code in PR #12653:
URL: https://github.com/apache/lucene/pull/12653#discussion_r1364203049
##
lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java:
##
@@ -63,6 +63,9 @@ public abstract class MultiLevelSkipListWriter {
/** for eve
benwtrent commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1768925721
@uschindler I switch it around. Lucene95Codec is back, moved the previous
HNSW format into backwards_codec and switched Lucene95Codec to use the
Lucene99Hnsw format.
--
This is an
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1364177301
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
jmazanec15 commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362956122
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768862106
> the Collector is full by flagging "incomplete" (I think this is possible)
once a threshold is reached
Do you mean that we return incomplete results?
Instead, maybe we can
mikemccand commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1364130745
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1768776163
OK I think this is ready -- I removed/downgraded all `nocommit`s, added
CHANGES entry, rebased to latest `main`. Tests and precommit passed for me (at
least once).
I set the d
s1monw commented on PR #12685:
URL: https://github.com/apache/lucene/pull/12685#issuecomment-1768768579
@jpountz I pushed new commits, wanna take a new look
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768737424
The results:
https://github.com/apache/lucene/pull/12679#issuecomment-1766995337
Are astounding! I will try and replicate with Lucene Util.
The numbers seem almost too goo
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768734871
> Something like a lazy-loading iterator, where we perform vector
comparisons and determine whether a doc matches on #advance?
I think @kaivalnp the thing to do would be to say
mikemccand commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1768710355
> What do you think of making a class like `ByteSlicePool` to separte
concerns from other `TermsHashPerField` functionality?
This sounds compelling to me. The byte slicing/inte
shubhamvishu closed pull request #12671: Move private static classes or
functions out of DoubleValuesSource
URL: https://github.com/apache/lucene/pull/12671
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
shubhamvishu commented on PR #12671:
URL: https://github.com/apache/lucene/pull/12671#issuecomment-1768664812
Thanks @gsmiller and @msokolov for sharing your views. I agree with your
points and at this point this refactoring doesn't seem to provide much value,
so I think we should just leav
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1364013877
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
msokolov commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1364007944
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
jpountz commented on PR #12670:
URL: https://github.com/apache/lucene/pull/12670#issuecomment-1768596280
This yielded a noticeable speedup on
[`OrHighHigh`](http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html)
and
[`OrHighMed`](http://people.apache.org/~mikemccand/lucenebench/
msokolov merged PR #12693:
URL: https://github.com/apache/lucene/pull/12693
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.ap
msokolov closed issue #12666: normalize() override provided in Simple example
in Analyzer class doc is missing String fieldName parameter
URL: https://github.com/apache/lucene/issues/12666
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1363939444
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
benwtrent opened a new pull request, #12694:
URL: https://github.com/apache/lucene/pull/12694
{DRAFT}
After finalizing work and merging:
https://github.com/apache/lucene/pull/12582
Investigation on if adding unsigned vector operations should occur.
Quantizing within `[0-255]`
gf2121 commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1363924060
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
}
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1363921374
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fou
mikemccand commented on code in PR #12692:
URL: https://github.com/apache/lucene/pull/12692#discussion_r1363842558
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc
follow, Arc arc, BytesRe
javanna commented on PR #12260:
URL: https://github.com/apache/lucene/pull/12260#issuecomment-1768403778
> I'm able to cherrypick this fix into branch_9_4, but I'm not sure if
there'll be release 9.4.2 ever.
Indeed, there won't be.
--
This is an automated message from the Apache G
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1363823134
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,79 +17,177 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-impor
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1768373972
OK, I switched the accounting to approximate RAM usage of the `NodeHash`,
which is more intuitive for users. It behaves monodically / smoothly: as you
give more RAM for the suffixes
scampi opened a new pull request, #12693:
URL: https://github.com/apache/lucene/pull/12693
Close #12666
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-ma
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1363802913
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
msokolov commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1363796956
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
msokolov commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1768325379
I'm thinking of something like this: first build a fully-connected graph (no
link removal, no enforcement of maxconns). Then apply a global pruning
algorithm of some sort that atte
msokolov commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1768315395
Have you any results how connectivity varies with maxconn? I found [this
article](https://qdrant.tech/articles/filtrable-hnsw/) that talks about
connectivity when filtering; anyway
gf2121 commented on PR #12692:
URL: https://github.com/apache/lucene/pull/12692#issuecomment-1768310698
Result on wikimediumall (nothing changed obviously) :
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
msokolov commented on PR #12671:
URL: https://github.com/apache/lucene/pull/12671#issuecomment-1768298204
I think I share Greg's reluctance to make this big-looking change. Sorry, I
think I was the instigator with the comment you referenced above. I confess I
have always been confused that
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1363725046
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
gf2121 commented on PR #12557:
URL: https://github.com/apache/lucene/pull/12557#issuecomment-1768169254
Thanks @easyice !
As we are exposing API like `softUpdateDocument(Term term, Iterable doc, Field... softDeletes)`. We cannot guarantee that
all soft delete fields have the same valu
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1363627307
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1363626228
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
easyice commented on PR #12557:
URL: https://github.com/apache/lucene/pull/12557#issuecomment-1768092984
I get some great suggestion from @gf2121 , we can get if the doc values has
single value from `ReadersAndUpdates#onDiskDocValues` and
`ReadersAndUpdates#updatesToApply` this can be avoi
gf2121 opened a new pull request, #12692:
URL: https://github.com/apache/lucene/pull/12692
### Description
This PR resolves a todo left in `FST` that we construct some useless objects
during linear search. This is also an effort that tries to avoid `Outputs#read`
as we have more outp
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1363535420
##
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java:
##
@@ -113,6 +113,7 @@ public void testEmptyChildFilter() throws Exception {
final Direc
jpountz commented on PR #12589:
URL: https://github.com/apache/lucene/pull/12589#issuecomment-1768020392
Updated luceneutil results using wikibigall:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
jpountz commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1363513095
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
dungba88 commented on code in PR #12691:
URL: https://github.com/apache/lucene/pull/12691#discussion_r1363511529
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out)
throws IOException {
dungba88 opened a new pull request, #12691:
URL: https://github.com/apache/lucene/pull/12691
### Description
This change will make the transition to off-heap FST writing easier. The
idea is to only build the FST after the end of the process (when calling
`compile()`).
- Conso
jpountz commented on PR #12589:
URL: https://github.com/apache/lucene/pull/12589#issuecomment-1767952830
I moved the optimization as part of the partitioning logic so that it's
easier to test. It's ready for review.
--
This is an automated message from the Apache Git Service.
To respond t
gf2121 commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1363431399
##
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java:
##
@@ -113,6 +113,7 @@ public void testEmptyChildFilter() throws Exception {
final Direct
65 matches
Mail list logo