stefanvodita commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773790152
I've rebased #12506. I like having a separate class for slice allocation,
but if there's disagreement over that, I can put the code back in
`TermsHashPerField`.
--
This is an aut
stefanvodita commented on PR #12506:
URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773789994
The last commit is a large rebase + conflict resolution after #12625 got
merged. What this PR does hasn't really changed.
--
This is an automated message from the Apache Git Serv
mikemccand commented on PR #12506:
URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773797296
Thanks @stefanvodita -- I'll try to have a look soon! And thank you for
gracefully handling the "two people made very similar changes" situation :)
This happens often in open s
mikemccand commented on PR #12625:
URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773797762
Thanks @stefanvodita -- I'll try to have a look soon at your rebased PR
#12506.
And thank you for gracefully handling the "two people made very similar
changes" situation :)
ChrisHegarty commented on PR #12703:
URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773837768
> I am out of office the next week, I'd like to participate in the
discussion; we should not rush anything.
Take your time. Your input and ideas are very much welcome. We will
bruno-roustant commented on PR #12688:
URL: https://github.com/apache/lucene/pull/12688#issuecomment-1773923204
This is some code I wrote a long time ago. It has been tested and used, so
I'm confident on the functional aspect, and it might benefit from a
benchmark for perf.
Le ve
rmuir commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1773935712
Should we just do more tests and start writing indexes without patching?
Only a 4 percent disk savings? It is a lot of complexity, especially to
vectorize. A runtime option is more ex
gf2121 commented on PR #12692:
URL: https://github.com/apache/lucene/pull/12692#issuecomment-1773995253
Nightly benchmark shows fuzzy queries are a bit happy for this change:
https://home.apache.org/~mikemccand/lucenebench/2023.10.19.18.03.18.html.
--
This is an automated message from the
uschindler commented on PR #12706:
URL: https://github.com/apache/lucene/pull/12706#issuecomment-1774030650
I updated the Jenkins jobs running mmap tests to use this branch:
https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/,
https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Windows/
--
Th
uschindler merged PR #12705:
URL: https://github.com/apache/lucene/pull/12705
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
gf2121 commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774050262
> essentially calling OfflineSorter on all postings
FYI, I came up with some ideas to optimize this sort before, hoping to be
helpful :)
1. If we use a stable sorter, we
gf2121 closed issue #12701: Specialize arc store for continuous label in FST
URL: https://github.com/apache/lucene/issues/12701
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
uschindler commented on PR #12705:
URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774091447
IllegalStateException cannot happen in that code, only in access to memory
segments closed by other threads.
NPE was a special case as it may happen easier. IllegalStateExceptio
uschindler commented on PR #12705:
URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774092282
> If that's the case, it seems fine, although a bit fragile to maintain?
I argued during the long journey of Panama Foreign to have a specific
subclass of IllegalStateException
msokolov commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367900707
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(
uschindler commented on PR #12705:
URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774105554
> > If that's the case, it seems fine, although a bit fragile to maintain?
>
> I argued during the long journey of Panama Foreign to have a specific
subclass of IllegalStateExce
uschindler commented on PR #12705:
URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774106700
You can call `segment.scope().isAlive()` to figure out if the scope is still
alive. This works for Java 20+. The Java 19 version can't use this.
I will possibly create a new PR
uschindler opened a new pull request, #12707:
URL: https://github.com/apache/lucene/pull/12707
Followup on #12705: With memory segments we get an IllegalStateException.
Instead of always rewriting it to AlreadyClosedException we confirm before if
the segment scope (session in Java 19) is no
uschindler commented on PR #12705:
URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774116253
I improved the IllegalStateHandling in #12707 in the same way by confirming
the state of the segment's scope (Java20+) / session (Java19).
@msokolov: Please have a quick look be
uschindler merged PR #12707:
URL: https://github.com/apache/lucene/pull/12707
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.
dweiss commented on issue #12704:
URL: https://github.com/apache/lucene/issues/12704#issuecomment-1774156970
I borrowed that constant in BitMixer from Sebastiano Vigna, I believe. Here
is a nice overview of its origin/ rationale:
https://softwareengineering.stackexchange.com/question
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367952944
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends
Kn
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953194
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953519
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -26,17 +26,39 @@
import java.util.ArrayList;
import java.util.Arrays;
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953845
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953931
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955218
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955272
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955515
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -33,7 +33,7 @@
* Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955644
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956157
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956372
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956726
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956886
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start,
long t) {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367957707
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start,
long t) {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958357
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphMerger.java:
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958754
##
lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java:
##
@@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws
IOException {
msokolov commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971257
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(
msokolov commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971976
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends
msokolov commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367972354
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) und
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367981481
##
lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java:
##
@@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367982181
##
lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java:
##
@@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws
IOException {
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984246
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends
Kn
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984559
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java:
##
@@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986199
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986490
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986729
##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -33,7 +33,7 @@
* Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo
cavorite commented on issue #12695:
URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774229048
I'm be willing to work on this issues (as a way to get more familiar with
Lucene's internal code base). First, I'd like to see if I'm understanding the
work needed.
So far,
mikemccand commented on issue #12695:
URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774232380
Yes that's exactly the idea! Thank you @cavorite for tackling this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
mikemccand commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1774236604
> Should we just do more tests and start writing indexes without patching?
Only a 4 percent disk savings? It is a lot of complexity, especially to
vectorize. A runtime option is
zhaih commented on PR #12660:
URL: https://github.com/apache/lucene/pull/12660#issuecomment-1774469691
> I would be curious to see the contention times and also understand how
this changes CPU usage vs. single-threaded.
@msokolov as for CPU usage, I just tested with 1M docs, and on my
jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774523421
These sound like great ideas!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
uschindler opened a new issue, #12708:
URL: https://github.com/apache/lucene/issues/12708
### Description
The test
`org.apache.lucene.queryparser.xml.TestCoreParser#testSpanNearQueryWithoutSlopXML`
fails in Java 22 EA builds:
```
org.junit.ComparisonFailure: expected:<...be
s1monw merged PR #12685:
URL: https://github.com/apache/lucene/pull/12685
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apac
uschindler commented on issue #12708:
URL: https://github.com/apache/lucene/issues/12708#issuecomment-1774619628
It only affect the empty String.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to t
uschindler commented on issue #12708:
URL: https://github.com/apache/lucene/issues/12708#issuecomment-1774641113
See the issue in openjdk: https://bugs.openjdk.org/browse/JDK-8318646
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Gi
dungba88 opened a new pull request, #12709:
URL: https://github.com/apache/lucene/pull/12709
### Description
Consolidate the FSTStore and BytesStore in FST. The two are similar, except
that FSTStore has an `init()` method, which is not needed for BytesStore. Thus
I extracted the comm
gf2121 opened a new pull request, #12710:
URL: https://github.com/apache/lucene/pull/12710
Make `Outputs#common` take advantage of `Arrays#mismatch`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368339258
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
gf2121 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368344352
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
assert
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368349251
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
s1monw opened a new pull request, #12711:
URL: https://github.com/apache/lucene/pull/12711
Today you can use the `add/UpdateDocuments` API even if a index sort is
configured. This leads to broken indices if users rely on the guarantees of
this API that document IDs are consecutive. This cha
msokolov commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775052675
This is what we do today: we're careful to add blocks of docs that sort
together. What is the alternative going to be? Instead one should sequentially
call addDocument()?
I have
jpountz commented on PR #12589:
URL: https://github.com/apache/lucene/pull/12589#issuecomment-1775098957
I plan on merging in the next couple days if there are no objections.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
dungba88 commented on code in PR #12710:
URL: https://github.com/apache/lucene/pull/12710#discussion_r1368616130
##
lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java:
##
@@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) {
asse
gf2121 opened a new pull request, #12712:
URL: https://github.com/apache/lucene/pull/12712
Based on the idea mentioned
[here](https://github.com/apache/lucene/issues/12665#issuecomment-1774050262):
> 1. If we use a stable sorter, we can only compare docIds because termIds
are already in
risdenk commented on PR #12293:
URL: https://github.com/apache/lucene/pull/12293#issuecomment-1775322129
@dsmiley I mentioned this on the Solr PR for the same change -
https://github.com/apache/solr/pull/1626#issuecomment-1553288366
https://ci-builds.apache.org/job/Lucene/job/Lucene-C
gf2121 commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1775324737
I initialized a PR on these ideas https://github.com/apache/lucene/pull/12712
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on t
benwtrent commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1368782730
##
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##
@@ -35,6 +38,9 @@ public class NeighborArray {
float[] score;
int[] node;
private
slow-J commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775353027
If we want to remove the patching entirely, which Lucene version (and which
Codec) should we implement this in? Would this be a potential change for Lucene
9.9 or perhaps 10.0?
jpountz opened a new pull request, #12713:
URL: https://github.com/apache/lucene/pull/12713
This adds a bit more specialization to how we handle the 2nd clause in
conjunctions, which seems to help the JVM quite significantly.
--
This is an automated message from the Apache Git Service.
To
jpountz commented on PR #12713:
URL: https://github.com/apache/lucene/pull/12713#issuecomment-1775568713
Wikibigall:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDevPct diff p-value
IntNRQ
gf2121 commented on PR #12712:
URL: https://github.com/apache/lucene/pull/12712#issuecomment-1775618552
To get an quick insight, i make a naive benchmark on the sorter, showing
generally 5x faster than baseline.
* JVM 8G (result in the ram budget of `OfflineSorter` = 800MB)
* No Fo
benwtrent merged PR #12682:
URL: https://github.com/apache/lucene/pull/12682
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.a
mikemccand commented on code in PR #12506:
URL: https://github.com/apache/lucene/pull/12506#discussion_r1368976156
##
lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java:
##
@@ -46,6 +65,7 @@ protected Allocator(int blockSize) {
public abstract void recycleByte
mikemccand commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775638986
> Are there any additional corpora that we should also test this with?
Maybe the NYC taxis? This is a more sparse, and tiny docs (vs dense and
medium/large docs in `enwiki
mikemccand commented on code in PR #12709:
URL: https://github.com/apache/lucene/pull/12709#discussion_r1369002933
##
lucene/core/src/java/org/apache/lucene/util/fst/FST.java:
##
@@ -487,19 +473,18 @@ public String toString() {
}
void finish(long newStartNode) throws IOE
mikemccand commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775676869
I don't think we should make a hard block here. As @msokolov points out, if
you are careful, so your static sort is congruent with your blocks, the blocks
will be preserved.
I
mikemccand commented on issue #12701:
URL: https://github.com/apache/lucene/issues/12701#issuecomment-1775685100
This is a neat idea @gf2121 -- did you close it because it's similar / same
as the direct addressing case?
--
This is an automated message from the Apache Git Service.
To respo
mikemccand opened a new issue, #12714:
URL: https://github.com/apache/lucene/issues/12714
### Description
To share suffixes, for creating as minimal an FST as we can, `FSTCompiler`
using `NodeHash` to record the most recently used/shared suffixes. But it
stores the values (the nodes
Tony-X commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775698779
> It is a lot of complexity, especially to vectorize.
+1. I recalled that @gsmiller was playing with some SIMD algos for decoding
blocks of delta-encoded ints. Even if that is
mikemccand commented on issue #12714:
URL: https://github.com/apache/lucene/issues/12714#issuecomment-1775711575
I think we can use `ByteBlockPool` to store the `byte[]` slices, just
appending a new `byte[]` slice when we store a new suffix. We never delete
individual suffixes, but rather
gsmiller commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775717147
I like the idea of removing the complexity associated with patching if we're
convinced it's the right trade-off (and +1 to the pain of vectorizing with
patching going away).
msokolov commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775716306
> Hmm, can you elaborate how it can be fully backwards-compatible on with
the indexes that have patching?
I think the idea is that because we always maintain readers that can
gsmiller commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775725064
> +1. I recalled that @gsmiller was playing with some SIMD algos for
decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky
to apply it because of the patch
Tony-X commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775807115
> In 11.0, remove all patching logic which will, a) simplify the code a bit,
and b) remove the (likely minor) overhead on read of looking up the number of
patches in a block, which i
gsmiller commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775871779
> Maybe write something in the index header to indicate if patching is there
(default to yes - in 9.x ). Then new indexes will write additional header to
indicate there is not patc
javanna commented on code in PR #12689:
URL: https://github.com/apache/lucene/pull/12689#discussion_r1369190050
##
lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java:
##
@@ -64,64 +67,124 @@ public final class TaskExecutor {
* @param the return type of the task
s1monw commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775926459
Would an expert API on the IndexSort work for you folks? Like a getter that
indicates if it’s a stable sort and preserves blocks? On 23. Oct 2023, at
19:28, Michael McCandless ***@***.***
Tony-X commented on issue #12696:
URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775993940
> would the goal here be to eliminate overhead of having to read the number
of patches when decoding each block?
Yes. This means we could know upfront at segment opening time w
dungba88 commented on code in PR #12709:
URL: https://github.com/apache/lucene/pull/12709#discussion_r1369373040
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode nodeIn,
int tailLength) t
dungba88 commented on code in PR #12709:
URL: https://github.com/apache/lucene/pull/12709#discussion_r1369373040
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode nodeIn,
int tailLength) t
gf2121 commented on PR #12712:
URL: https://github.com/apache/lucene/pull/12712#issuecomment-1776364792
I forked the `LSBRadixSorter` to sort longs and use it when ram budget is
enough. Generally 5x faster than candidate, 25x faster than baseline.
https://bytedance.feishu.cn/sheets/HS
dungba88 commented on PR #12709:
URL: https://github.com/apache/lucene/pull/12709#issuecomment-1776396388
> I think we should land this only on main for now, and then backport it
eventually to 9.x along with the other FST changes?
I think this makes sense. Let hold off the backporting
dungba88 commented on code in PR #12715:
URL: https://github.com/apache/lucene/pull/12715#discussion_r1369634837
##
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##
@@ -122,8 +122,11 @@ public class FSTCompiler {
/**
* Instantiates an FST/FSA builder w
dungba88 commented on PR #12715:
URL: https://github.com/apache/lucene/pull/12715#issuecomment-1776546474
Thank you for change!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
zhaih commented on code in PR #12660:
URL: https://github.com/apache/lucene/pull/12660#discussion_r1369642741
##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsFormat.java:
##
@@ -146,18 +148,24 @@ public final class Lucene95HnswVectorsFormat extends
1401 - 1500 of 22968 matches
Mail list logo