s1monw commented on code in PR #12685:
URL: https://github.com/apache/lucene/pull/12685#discussion_r1360837043
##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -3368,9 +3368,15 @@ public void addIndexesReaderMerge(MergePolicy.OneMerge
merge) throws IOExce
dungba88 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1360866889
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -99,31 +87,23 @@ public class FSTCompiler {
* tuning and tweaking, see {@link Builder}.
dungba88 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1360875178
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,50 +17,80 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-import o
msfroh commented on code in PR #12626:
URL: https://github.com/apache/lucene/pull/12626#discussion_r1360878505
##
lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java:
##
@@ -1996,6 +1996,41 @@ public void testGetCommitData() throws Exception {
dir.close();
msfroh commented on code in PR #12626:
URL: https://github.com/apache/lucene/pull/12626#discussion_r1360880697
##
lucene/core/src/test/org/apache/lucene/index/TestIndexWriter.java:
##
@@ -1996,6 +1996,41 @@ public void testGetCommitData() throws Exception {
dir.close();
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1764814636
I made some effort to speed up the `add` operation for `BytesRef`, getting a
tiny improvement:
> Baseline: after https://github.com/apache/lucene/pull/12631; Candidate:
this patch;
harshavamsi opened a new issue, #12686:
URL: https://github.com/apache/lucene/issues/12686
### Description
While working with the `IndexOrDocValuesQuery`, I noticed that highlighting
was broken. This is potentially caused by the extract function that does not
check if the query is in
dweiss opened a new pull request, #12687:
URL: https://github.com/apache/lucene/pull/12687
Estimates taken from empirical run times (actions history), with a generous
buffer added.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r136233
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (A
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1361112199
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/IncrementalHnswGraphMerger.java:
##
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (A
mingshl commented on PR #12260:
URL: https://github.com/apache/lucene/pull/12260#issuecomment-1765091198
@romseygeek @mkhludnev, this bug was introduced since 9.4 version, can this
PR be back-ported to 9.4.2 to fix the issue?
--
This is an automated message from the Apache Git Service.
To
jmazanec15 commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765145453
Hey @benwtrent, sorry for delay, still looking through change. But 4x space
improvement with minimal recall loss is awesome.
--
This is an automated message from the Apache Git Ser
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1361186978
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
jmazanec15 commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1357440025
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
jpountz commented on code in PR #12685:
URL: https://github.com/apache/lucene/pull/12685#discussion_r1361188738
##
lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java:
##
@@ -153,6 +157,16 @@ public boolean getUseCompoundFile() {
return isCompoundFile;
}
+ /
jpountz commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765186395
If we're specializing the format anyway, I wonder if we could try different
layouts. E.g. another option could be to encode the number of supplementary
bytes using unary coding (like UTF
zhaih merged PR #12651:
URL: https://github.com/apache/lucene/pull/12651
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apach
uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765287960
Hi, why do we need a new Codec? The Lucebe main file format does not change,
olly the HNSW format was exchanged. Because like pistingsfornats and
dicvaluesformats, the SPI can detect
benwtrent commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765316000
@uschindler so I should just add a new format?
It would be a new Lucene99 HNSW format, but keep the default Lucene95 HNSW
format?
Or can we change the default vector form
jmazanec15 commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1361261168
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fo
jimczi commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765330759
> why do we need a new top-level Codec? The Lucene main file format does
not change, only the HNSW format was exchanged. Because like ppostingsfornats
and docvaluesformats, the SPI can d
Tony-X opened a new pull request, #12688:
URL: https://github.com/apache/lucene/pull/12688
### Description
Related issue https://github.com/apache/lucene/issues/12513
Opening this PR early to avoid massive diffs in one-shot
- [x] Encode (term type, local ord) in FST
T
uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765355363
> > why do we need a new top-level Codec? The Lucene main file format does
not change, only the HNSW format was exchanged. Because like ppostingsfornats
and docvaluesformats, the SPI
uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765362790
> @uschindler so I should just add a new format?
>
> It would be a new Lucene99 HNSW format, but keep the default Lucene95 HNSW
format?
>
> Or can we change the default v
uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765382016
I just checked the code, the 9.5 top-level codec addition was useless. Just
code duplication. We can't revert it anymore, but we should not repeat that.
The only required top-level Fo
sohami commented on code in PR #12606:
URL: https://github.com/apache/lucene/pull/12606#discussion_r1361333154
##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -420,13 +418,12 @@ public int count(Query query) throws IOException {
}
/**
- * Ret
zhaih commented on code in PR #12683:
URL: https://github.com/apache/lucene/pull/12683#discussion_r1361334258
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:
##
@@ -59,11 +60,26 @@ protected HnswGraph() {}
*
* @param level level of the graph
* @pa
uschindler commented on PR #12582:
URL: https://github.com/apache/lucene/pull/12582#issuecomment-1765386547
The simplest change is:
- Remove Lucene99Codec
- In Lucene95Codec just change this: `this.defaultKnnVectorsFormat = new
Lucene95HnswVectorsFormat();` to the new format.
Do
zhaih commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1361341835
##
lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java:
##
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
msfroh commented on issue #12032:
URL: https://github.com/apache/lucene/issues/12032#issuecomment-1765587096
I started to work on making DrillSidewaysScorer work on windows of doc IDs,
when I noticed the following comment added in TestDrillSideways as part of
https://github.com/apache/lucen
dweiss merged PR #12687:
URL: https://github.com/apache/lucene/pull/12687
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mkhludnev commented on PR #12260:
URL: https://github.com/apache/lucene/pull/12260#issuecomment-1765741713
Hi, @mingshl
I'm able to cherrypick this fix into branch_9_4, but I'm not sure if
there'll be release 9.4.2 ever.
--
This is an automated message from the Apache Git Service.
To
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765808689
Hi @jpountz , Thanks a lot for the suggestion!
> another option could be to encode the number of supplementary bytes using
unary coding (like UTF8).
This is a great idea that
jpountz commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361646368
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.score
jpountz commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765890646
Oh your explanation makes sense, and I agree with you that a more
efficient encoding would unlikely help conterbalance the fact that more arcs
need to be read per output. So this loo
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361736510
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361737240
##
lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java:
##
@@ -504,9 +504,9 @@ public TFIDFScorer(float boost, Explanation idf, float[
jpountz commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361739665
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.score
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765964640
> I wonder if extending the Outputs class directly would help, instead of
storing data in an opaque byte[]?
Yes ,The reuse is exactly what `Outputs` wants to do ! (see this
[todo](
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1765975335
If I read correctly, this query ends up calling
`LeafReader#searchNearestNeighbors` with k=Integer.MAX_VALUE, which will not
only run in O(maxDoc) time but also use O(maxDoc) memory. I d
shubhamvishu commented on PR #12682:
URL: https://github.com/apache/lucene/pull/12682#issuecomment-1765985160
Thanks @jpountz for the review! I have addressed the comments in the new
revision.
--
This is an automated message from the Apache Git Service.
To respond to the message, please l
shubhamvishu commented on code in PR #12682:
URL: https://github.com/apache/lucene/pull/12682#discussion_r1361773569
##
lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java:
##
@@ -266,7 +265,7 @@ public float score() throws IOException {
score += optScorer.
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361783707
##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -5144,20 +5145,71 @@ public int length() {
}
mergeReaders.add(wrappedReader);
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361793823
##
lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java:
##
@@ -468,7 +468,11 @@ public void checkIntegrity() throws IOException {
@Override
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361798124
##
lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java:
##
@@ -0,0 +1,998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361799042
##
lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java:
##
@@ -0,0 +1,998 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361802385
##
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##
@@ -5144,20 +5145,71 @@ public int length() {
}
mergeReaders.add(wrappedReader);
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1361812236
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,50 +17,80 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-import
gf2121 merged PR #12587:
URL: https://github.com/apache/lucene/pull/12587
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mikemccand commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1361814551
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -99,31 +87,23 @@ public class FSTCompiler {
* tuning and tweaking, see {@link Builder
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766041651
> > With the PR, you unfortunately cannot easily say "give me a minimal FST
at all costs", like you can with main today. You'd have to keep trying larger
and larger NodeHash sizes unt
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766048357
Hi @mikemccand , it would be great if you can take a look too :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use t
gf2121 merged PR #12652:
URL: https://github.com/apache/lucene/pull/12652
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
mikemccand commented on PR #12633:
URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766082689
Thanks for the suggestions @dungba88! I took the approach you suggested,
with a few more pushed commits just now. Despite the increase in `nocommit`s I
think this is actually close!
gf2121 merged PR #12586:
URL: https://github.com/apache/lucene/pull/12586
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
jpountz merged PR #12672:
URL: https://github.com/apache/lucene/pull/12672
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #12670:
URL: https://github.com/apache/lucene/pull/12670
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362016333
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
jpountz commented on PR #12668:
URL: https://github.com/apache/lucene/pull/12668#issuecomment-1766342540
Even though the speedup is less pronounced than in the above luceneutil run,
there seems to be an actual speedup in nightly benchmarks for boolean queries.
E.g. the last 3 data points of
tomsquest commented on issue #11326:
URL: https://github.com/apache/lucene/issues/11326#issuecomment-1766389365
This issue occurred to us also, and not only for numbers. Actually, token
finishing by `1` will be stemmed!
```
GET _analyze
{
"tokenizer": "standard",
"filt
jpountz commented on PR #12602:
URL: https://github.com/apache/lucene/pull/12602#issuecomment-1766428044
I would be surprisid if this change would yield a noticeable speedup? Does
it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundat
mayya-sharipova commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundat
msokolov commented on code in PR #12683:
URL: https://github.com/apache/lucene/pull/12683#discussion_r1362245604
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java:
##
@@ -59,11 +60,26 @@ protected HnswGraph() {}
*
* @param level level of the graph
*
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766741617
> If I read correctly, this query ends up calling
LeafReader#searchNearestNeighbors with k=Integer.MAX_VALUE
No, we're calling the [new
API](https://github.com/apache/lucene/blob
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362432970
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362437535
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1362456237
##
lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
benwtrent commented on code in PR #12657:
URL: https://github.com/apache/lucene/pull/12657#discussion_r1362457390
##
lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java:
##
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) un
jpountz commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766795903
Thanks for explaining, I had overlooked how the `Integer.MAX_VALUE` was used
indeed. I'm still interested in figuring out if we can have stronger guarantees
on the worst-case memory usag
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362472568
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362475149
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362474206
##
lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java:
##
@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
kaivalnp commented on code in PR #12679:
URL: https://github.com/apache/lucene/pull/12679#discussion_r1362476143
##
lucene/core/src/java/org/apache/lucene/search/RnnCollector.java:
##
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ *
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766834111
Thanks for the review @shubhamvishu! Addressed some of the comments above
> Is it right to call it a radius-based search here?
I think of it as finding all results within a
mingshl commented on PR #12260:
URL: https://github.com/apache/lucene/pull/12260#issuecomment-1766881156
Thank you! @mkhludnev
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
benwtrent merged PR #12657:
URL: https://github.com/apache/lucene/pull/12657
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766958252
An idea comes to me that maybe we do not really need to do combine all these
`BytesRef`s to a single `BytesRef`, we can just build a `DataInput` over these
`BytesRef`s to read. Luckily, o
javanna opened a new pull request, #12689:
URL: https://github.com/apache/lucene/pull/12689
When operations are parallelized, like query rewrite, or search, or
createWeight, one of the tasks may throw an exception. In that case we
wait for all tasks to be completed before re-throwing th
benwtrent commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182
> I think of it as finding all results within a high-dimensional circle /
sphere / equivalent,
dot-product, cosine, etc. don't really follow that same idea as you point
out. I w
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362620375
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621063
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950
##
lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java:
##
@@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase {
public static vo
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950
##
lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java:
##
@@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase {
public static vo
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337
### Benchmarks
Using the vector file from
https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec (enwiki
dataset, unit vectors, 100 dimensions)
The setup was 1
kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1767022898
> stronger guarantees on the worst-case memory usage
Totally agreed @jpountz! It is very easy to go wrong in the new API,
specially if the user passes a low threshold (high radius
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362661760
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java:
##
@@ -0,0 +1,1149 @@
+/*
+ * Licensed to the Apache Software Foundation (A
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362664506
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fou
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362665321
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java:
##
@@ -0,0 +1,782 @@
+/*
+ * Licensed to the Apache Software Fou
kaivalnp commented on issue #12579:
URL: https://github.com/apache/lucene/issues/12579#issuecomment-1767112899
> one other thing to think about is
https://weaviate.io/blog/weaviate-1-20-release#autocut
Interesting! They [seem
to](https://github.com/weaviate/weaviate/blob/c382dcbe6ff0
benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1362725464
##
lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java:
##
@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
gsmiller commented on PR #12671:
URL: https://github.com/apache/lucene/pull/12671#issuecomment-1767291109
Thanks for your further thoughts @shubhamvishu. Getting more opinions is
always good, and like I said, I don't feel strongly enough about this change to
block moving forward with it or
dungba88 opened a new pull request, #12690:
URL: https://github.com/apache/lucene/pull/12690
### Description
Follow-up of https://github.com/apache/lucene/pull/12646. NodeHash still
depends on both FSTCompiler and FST. With the current method signature, one can
create the NodeHash wi
dungba88 commented on code in PR #12633:
URL: https://github.com/apache/lucene/pull/12633#discussion_r1363098628
##
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##
@@ -17,79 +17,177 @@
package org.apache.lucene.util.fst;
import java.io.IOException;
-import
nitirajrathore commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1767662289
I was able to run tests on wiki dataset using the luceneutils package. The
[results shows](https://github.com/mikemccand/luceneutil/pull/236) that even
with a single segment
iverase merged PR #12625:
URL: https://github.com/apache/lucene/pull/12625
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
gf2121 commented on PR #12661:
URL: https://github.com/apache/lucene/pull/12661#issuecomment-1767756956
> So this looks like a hard search/space trade-off: we either get fast reads
or good compression but we can't get both?
IMO theoretically yes. We ignored some potential optimization
gf2121 commented on code in PR #12661:
URL: https://github.com/apache/lucene/pull/12661#discussion_r1363317643
##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##
@@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException {
gf2121 commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1363431399
##
lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java:
##
@@ -113,6 +113,7 @@ public void testEmptyChildFilter() throws Exception {
final Direct
jpountz commented on PR #12589:
URL: https://github.com/apache/lucene/pull/12589#issuecomment-1767952830
I moved the optimization as part of the partitioning logic so that it's
easier to test. It's ready for review.
--
This is an automated message from the Apache Git Service.
To respond t
1101 - 1200 of 22902 matches
Mail list logo