gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765808689
Hi @jpountz , Thanks a lot for the suggestion!
> another option could be to encode the number of supplementary bytes using
unary coding (like UTF8).
This is a great idea that
jpountz commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361646368
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.score
jpountz commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765890646
Oh your explanation makes sense, and I agree with you that a more
efficient encoding would unlikely help conterbalance the fact that more arcs
need to be read per output. So this loo
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361736510
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361737240
##
lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java:
##
@@ -504,9 +504,9 @@ public TFIDFScorer(float boost, Explanation idf, float[
jpountz commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361739665
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.score
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765964640
> I wonder if extending the Outputs class directly would help, instead of
storing data in an opaque byte[]?
Yes ,The reuse is exactly what `Outputs` wants to do ! (see this
[todo](
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1765975335
If I read correctly, this query ends up calling
`LeafReader#searchNearestNeighbors` with k=Integer.MAX_VALUE, which will not
only run in O(maxDoc) time but also use O(maxDoc) memory. I d
shubhamvishu commented on PR #12682:
URL: https://github.com/apache/lucene/pull/12682#issuecomment-1765985160
Thanks @jpountz for the review! I have addressed the comments in the new
revision.
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361773569
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361783707
##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -5144,20 +5145,71 @@ public int length() {
}
mergeReaders.add(wrappedReader);
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361793823
##
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java:
##
@@ -468,7 +468,11 @@ public void checkIntegrity() throws IOException {
@Override
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361798124
##
lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java:
##
@@ -0,0 +1,998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361799042
##
lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java:
##
@@ -0,0 +1,998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361802385
##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -5144,20 +5145,71 @@ public int length() {
}
mergeReaders.add(wrappedReader);
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1361812236
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,50 +17,80 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-import
gf2121 merged PR #12587:
URL: https://github.com/apache/lucene/pull/12587
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1361814551
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -99,31 +87,23 @@ public class FSTCompiler {
* tuning and tweaking, see {@link Builder
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766041651
> > With the PR, you unfortunately cannot easily say "give me a minimal FST
at all costs", like you can with main today. You'd have to keep trying larger
and larger NodeHash sizes unt
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766048357
Hi @mikemccand , it would be great if you can take a look too :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
gf2121 merged PR #12652:
URL: https://github.com/apache/lucene/pull/12652
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766082689
Thanks for the suggestions @dungba88! I took the approach you suggested,
with a few more pushed commits just now. Despite the increase in `nocommit`s I
think this is actually close!
gf2121 merged PR #12586:
URL: https://github.com/apache/lucene/pull/12586
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
jpountz merged PR #12672:
URL: https://github.com/apache/lucene/pull/12672
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #12670:
URL: https://github.com/apache/lucene/pull/12670
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362016333
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
jpountz commented on PR #12668:
URL: https://github.com/apache/lucene/pull/12668#issuecomment-1766342540
Even though the speedup is less pronounced than in the above luceneutil run,
there seems to be an actual speedup in nightly benchmarks for boolean queries.
E.g. the last 3 data points of
tomsquest commented on issue #11326:
URL: https://github.com/apache/lucene/issues/11326#issuecomment-1766389365
This issue occurred to us also, and not only for numbers. Actually, token
finishing by `1` will be stemmed!
```
GET _analyze
{
"tokenizer": "standard",
"filt
jpountz commented on PR #12602:
URL: https://github.com/apache/lucene/pull/12602#issuecomment-1766428044
I would be surprisid if this change would yield a noticeable speedup? Does
it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundat
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundat
msokolov commented on code in PR #12683:
URL: https://github.com/apache/lucene/pull/12683#discussion_r1362245604
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:
##
@@ -59,11 +60,26 @@ protected HnswGraph() {}
*
* @param level level of the graph
*
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766741617
> If I read correctly, this query ends up calling
LeafReader#searchNearestNeighbors with k=Integer.MAX_VALUE
No, we're calling the [new
API](https://github.com/apache/lucene/blob
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362432970
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362437535
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1362456237
##
lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1362457390
##
lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java:
##
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766795903
Thanks for explaining, I had overlooked how the `Integer.MAX_VALUE` was used
indeed. I'm still interested in figuring out if we can have stronger guarantees
on the worst-case memory usag
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362472568
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362475149
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362474206
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362476143
##
lucene/core/src/java/org/apache/lucene/search/RnnCollector.java:
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ *
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766834111
Thanks for the review @shubhamvishu! Addressed some of the comments above
> Is it right to call it a radius-based search here?
I think of it as finding all results within a
mingshl commented on PR #12260:
URL: https://github.com/apache/lucene/pull/12260#issuecomment-1766881156
Thank you! @mkhludnev
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
benwtrent merged PR #12657:
URL: https://github.com/apache/lucene/pull/12657
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766958252
An idea comes to me that maybe we do not really need to do combine all these
`BytesRef`s to a single `BytesRef`, we can just build a `DataInput` over these
`BytesRef`s to read. Luckily, o
javanna opened a new pull request, #12689:
URL: https://github.com/apache/lucene/pull/12689
When operations are parallelized, like query rewrite, or search, or
createWeight, one of the tasks may throw an exception. In that case we
wait for all tasks to be completed before re-throwing th
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182
> I think of it as finding all results within a high-dimensional circle /
sphere / equivalent,
dot-product, cosine, etc. don't really follow that same idea as you point
out. I w
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362620375
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621063
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950
##
lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java:
##
@@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase {
public static vo
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950
##
lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java:
##
@@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase {
public static vo
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337
### Benchmarks
Using the vector file from
https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec (enwiki
dataset, unit vectors, 100 dimensions)
The setup was 1
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1767022898
> stronger guarantees on the worst-case memory usage
Totally agreed @jpountz! It is very easy to go wrong in the new API,
specially if the user passes a low threshold (high radius
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362661760
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundation (A
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362664506
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fou
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362665321
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fou
kaivalnp commented on issue #12579:
URL: https://github.com/apache/lucene/issues/12579#issuecomment-1767112899
> one other thing to think about is
https://weaviate.io/blog/weaviate-1-20-release#autocut
Interesting! They [seem
to](https://github.com/weaviate/weaviate/blob/c382dcbe6ff0
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362725464
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
gsmiller commented on PR #12671:
URL: https://github.com/apache/lucene/pull/12671#issuecomment-1767291109
Thanks for your further thoughts @shubhamvishu. Getting more opinions is
always good, and like I said, I don't feel strongly enough about this change to
block moving forward with it or
dungba88 opened a new pull request, #12690:
URL: https://github.com/apache/lucene/pull/12690
### Description
Follow-up of https://github.com/apache/lucene/pull/12646. NodeHash still
depends on both FSTCompiler and FST. With the current method signature, one can
create the NodeHash wi
dungba88 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1363098628
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,79 +17,177 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-import
nitirajrathore commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1767662289
I was able to run tests on wiki dataset using the luceneutils package. The
[results shows](https://github.com/mikemccand/luceneutil/pull/236) that even
with a single segment
iverase merged PR #12625:
URL: https://github.com/apache/lucene/pull/12625
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1767756956
> So this looks like a hard search/space trade-off: we either get fast reads
or good compression but we can't get both?
IMO theoretically yes. We ignored some potential optimization
gf2121 commented on code in PR #12661:
URL: https://github.com/apache/lucene/pull/12661#discussion_r1363317643
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##
@@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException {
66 matches
Mail list logo