Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-21 Thread via GitHub
stefanvodita commented on PR #12625: URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773790152 I've rebased #12506. I like having a separate class for slice allocation, but if there's disagreement over that, I can put the code back in `TermsHashPerField`. -- This is an aut

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-21 Thread via GitHub
stefanvodita commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773789994 The last commit is a large rebase + conflict resolution after #12625 got merged. What this PR does hasn't really changed. -- This is an automated message from the Apache Git Serv

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-21 Thread via GitHub
mikemccand commented on PR #12506: URL: https://github.com/apache/lucene/pull/12506#issuecomment-1773797296 Thanks @stefanvodita -- I'll try to have a look soon! And thank you for gracefully handling the "two people made very similar changes" situation :) This happens often in open s

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-21 Thread via GitHub
mikemccand commented on PR #12625: URL: https://github.com/apache/lucene/pull/12625#issuecomment-1773797762 Thanks @stefanvodita -- I'll try to have a look soon at your rebased PR #12506. And thank you for gracefully handling the "two people made very similar changes" situation :)

Re: [PR] [DRAFT] Load vector data directly from the memory segment [lucene]

2023-10-21 Thread via GitHub
ChrisHegarty commented on PR #12703: URL: https://github.com/apache/lucene/pull/12703#issuecomment-1773837768 > I am out of office the next week, I'd like to participate in the discussion; we should not rush anything. Take your time. Your input and ideas are very much welcome. We will

Re: [PR] Random access term dictionary [lucene]

2023-10-21 Thread via GitHub
bruno-roustant commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1773923204 This is some code I wrote a long time ago. It has been tested and used, so I'm confident on the functional aspect, and it might benefit from a benchmark for perf. Le ve

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-21 Thread via GitHub
rmuir commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1773935712 Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is more ex

Re: [PR] Avoid object construction when linear searching arcs [lucene]

2023-10-21 Thread via GitHub
gf2121 commented on PR #12692: URL: https://github.com/apache/lucene/pull/12692#issuecomment-1773995253 Nightly benchmark shows fuzzy queries are a bit happy for this change: https://home.apache.org/~mikemccand/lucenebench/2023.10.19.18.03.18.html. -- This is an automated message from the

Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1774030650 I updated the Jenkins jobs running mmap tests to use this branch: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/, https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Windows/ -- Th

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler merged PR #12705: URL: https://github.com/apache/lucene/pull/12705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-22 Thread via GitHub
gf2121 commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774050262 > essentially calling OfflineSorter on all postings FYI, I came up with some ideas to optimize this sort before, hoping to be helpful :) 1. If we use a stable sorter, we

Re: [I] Specialize arc store for continuous label in FST [lucene]

2023-10-22 Thread via GitHub
gf2121 closed issue #12701: Specialize arc store for continuous label in FST URL: https://github.com/apache/lucene/issues/12701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774091447 IllegalStateException cannot happen in that code, only in access to memory segments closed by other threads. NPE was a special case as it may happen easier. IllegalStateExceptio

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774092282 > If that's the case, it seems fine, although a bit fragile to maintain? I argued during the long journey of Panama Foreign to have a specific subclass of IllegalStateException

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367900707 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774105554 > > If that's the case, it seems fine, although a bit fragile to maintain? > > I argued during the long journey of Panama Foreign to have a specific subclass of IllegalStateExce

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774106700 You can call `segment.scope().isAlive()` to figure out if the scope is still alive. This works for Java 20+. The Java 19 version can't use this. I will possibly create a new PR

[PR] MMapDirectory with MemorySegment: Confirm that scope/session is no longer alive before throwing AlreadyClosedException [lucene]

2023-10-22 Thread via GitHub
uschindler opened a new pull request, #12707: URL: https://github.com/apache/lucene/pull/12707 Followup on #12705: With memory segments we get an IllegalStateException. Instead of always rewriting it to AlreadyClosedException we confirm before if the segment scope (session in Java 19) is no

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774116253 I improved the IllegalStateHandling in #12707 in the same way by confirming the state of the segment's scope (Java20+) / session (Java19). @msokolov: Please have a quick look be

Re: [PR] MMapDirectory with MemorySegment: Confirm that scope/session is no longer alive before throwing AlreadyClosedException [lucene]

2023-10-22 Thread via GitHub
uschindler merged PR #12707: URL: https://github.com/apache/lucene/pull/12707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-22 Thread via GitHub
dweiss commented on issue #12704: URL: https://github.com/apache/lucene/issues/12704#issuecomment-1774156970 I borrowed that constant in BitMixer from Sebastiano Vigna, I believe. Here is a nice overview of its origin/ rationale: https://softwareengineering.stackexchange.com/question

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367952944 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends Kn

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953194 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953519 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -26,17 +26,39 @@ import java.util.ArrayList; import java.util.Arrays;

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953845 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953931 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955218 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955272 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955515 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -33,7 +33,7 @@ * Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955644 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956157 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956372 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956726 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956886 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start, long t) {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367957707 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start, long t) {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958357 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphMerger.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958754 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971257 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971976 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367972354 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367981481 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367982181 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984246 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends Kn

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984559 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986199 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986490 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986729 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -33,7 +33,7 @@ * Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo

Re: [I] Don't provide two ways to build an FST [lucene]

2023-10-22 Thread via GitHub
cavorite commented on issue #12695: URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774229048 I'm be willing to work on this issues (as a way to get more familiar with Lucene's internal code base). First, I'd like to see if I'm understanding the work needed. So far,

Re: [I] Don't provide two ways to build an FST [lucene]

2023-10-22 Thread via GitHub
mikemccand commented on issue #12695: URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774232380 Yes that's exactly the idea! Thank you @cavorite for tackling this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-22 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1774236604 > Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1774469691 > I would be curious to see the contention times and also understand how this changes CPU usage vs. single-threaded. @msokolov as for CPU usage, I just tested with 1M docs, and on my

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-22 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774523421 These sound like great ideas! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-23 Thread via GitHub
uschindler opened a new issue, #12708: URL: https://github.com/apache/lucene/issues/12708 ### Description The test `org.apache.lucene.queryparser.xml.TestCoreParser#testSpanNearQueryWithoutSlopXML` fails in Java 22 EA builds: ``` org.junit.ComparisonFailure: expected:<...be

Re: [PR] Record if block API has been used in SegmentInfo [lucene]

2023-10-23 Thread via GitHub
s1monw merged PR #12685: URL: https://github.com/apache/lucene/pull/12685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-23 Thread via GitHub
uschindler commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1774619628 It only affect the empty String. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] xml.TestCoreParser#testSpanNearQueryWithoutSlopXML fails because of changed exception message [lucene]

2023-10-23 Thread via GitHub
uschindler commented on issue #12708: URL: https://github.com/apache/lucene/issues/12708#issuecomment-1774641113 See the issue in openjdk: https://bugs.openjdk.org/browse/JDK-8318646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Consolidate FSTStore and BytesStore in FST [lucene]

2023-10-23 Thread via GitHub
dungba88 opened a new pull request, #12709: URL: https://github.com/apache/lucene/pull/12709 ### Description Consolidate the FSTStore and BytesStore in FST. The two are similar, except that FSTStore has an `init()` method, which is not needed for BytesStore. Thus I extracted the comm

[PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
gf2121 opened a new pull request, #12710: URL: https://github.com/apache/lucene/pull/12710 Make `Outputs#common` take advantage of `Arrays#mismatch`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368339258 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
gf2121 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368344352 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { assert

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368341788 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368349251 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

[PR] Prevent users from using document block APIs when sort is configured [lucene]

2023-10-23 Thread via GitHub
s1monw opened a new pull request, #12711: URL: https://github.com/apache/lucene/pull/12711 Today you can use the `add/UpdateDocuments` API even if a index sort is configured. This leads to broken indices if users rely on the guarantees of this API that document IDs are consecutive. This cha

Re: [PR] Prevent users from using document block APIs when sort is configured [lucene]

2023-10-23 Thread via GitHub
msokolov commented on PR #12711: URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775052675 This is what we do today: we're careful to add blocks of docs that sort together. What is the alternative going to be? Instead one should sequentially call addDocument()? I have

Re: [PR] Sometimes intersect the essential clause and the best non-essential clause. [lucene]

2023-10-23 Thread via GitHub
jpountz commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-1775098957 I plan on merging in the next couple days if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Use Arrays#mismatch for Outputs#common operations [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12710: URL: https://github.com/apache/lucene/pull/12710#discussion_r1368616130 ## lucene/core/src/java/org/apache/lucene/util/fst/IntSequenceOutputs.java: ## @@ -43,28 +44,29 @@ public IntsRef common(IntsRef output1, IntsRef output2) { asse

[PR] Speed up the sort when building forward index [lucene]

2023-10-23 Thread via GitHub
gf2121 opened a new pull request, #12712: URL: https://github.com/apache/lucene/pull/12712 Based on the idea mentioned [here](https://github.com/apache/lucene/issues/12665#issuecomment-1774050262): > 1. If we use a stable sorter, we can only compare docIds because termIds are already in

Re: [PR] Capture build scans on ge.apache.org to benefit from deep build insights [lucene]

2023-10-23 Thread via GitHub
risdenk commented on PR #12293: URL: https://github.com/apache/lucene/pull/12293#issuecomment-1775322129 @dsmiley I mentioned this on the Solr PR for the same change - https://github.com/apache/solr/pull/1626#issuecomment-1553288366 https://ci-builds.apache.org/job/Lucene/job/Lucene-C

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-23 Thread via GitHub
gf2121 commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1775324737 I initialized a PR on these ideas https://github.com/apache/lucene/pull/12712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-23 Thread via GitHub
benwtrent commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1368782730 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -35,6 +38,9 @@ public class NeighborArray { float[] score; int[] node; private

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
slow-J commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775353027 If we want to remove the patching entirely, which Lucene version (and which Codec) should we implement this in? Would this be a potential change for Lucene 9.9 or perhaps 10.0?

[PR] Specialize the 2nd clause of conjunctions. [lucene]

2023-10-23 Thread via GitHub
jpountz opened a new pull request, #12713: URL: https://github.com/apache/lucene/pull/12713 This adds a bit more specialization to how we handle the 2nd clause in conjunctions, which seems to help the JVM quite significantly. -- This is an automated message from the Apache Git Service. To

Re: [PR] Specialize the 2nd clause of conjunctions. [lucene]

2023-10-23 Thread via GitHub
jpountz commented on PR #12713: URL: https://github.com/apache/lucene/pull/12713#issuecomment-1775568713 Wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value IntNRQ

Re: [PR] Speed up the sort when building forward index [lucene]

2023-10-23 Thread via GitHub
gf2121 commented on PR #12712: URL: https://github.com/apache/lucene/pull/12712#issuecomment-1775618552 To get an quick insight, i make a naive benchmark on the sorter, showing generally 5x faster than baseline. * JVM 8G (result in the ram budget of `OfflineSorter` = 800MB) * No Fo

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-23 Thread via GitHub
benwtrent merged PR #12682: URL: https://github.com/apache/lucene/pull/12682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Clean up ByteBlockPool [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on code in PR #12506: URL: https://github.com/apache/lucene/pull/12506#discussion_r1368976156 ## lucene/core/src/java/org/apache/lucene/util/ByteBlockPool.java: ## @@ -46,6 +65,7 @@ protected Allocator(int blockSize) { public abstract void recycleByte

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775638986 > Are there any additional corpora that we should also test this with? Maybe the NYC taxis? This is a more sparse, and tiny docs (vs dense and medium/large docs in `enwiki

Re: [PR] Consolidate FSTStore and BytesStore in FST [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on code in PR #12709: URL: https://github.com/apache/lucene/pull/12709#discussion_r1369002933 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -487,19 +473,18 @@ public String toString() { } void finish(long newStartNode) throws IOE

Re: [PR] Prevent users from using document block APIs when sort is configured [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on PR #12711: URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775676869 I don't think we should make a hard block here. As @msokolov points out, if you are careful, so your static sort is congruent with your blocks, the blocks will be preserved. I

Re: [I] Specialize arc store for continuous label in FST [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on issue #12701: URL: https://github.com/apache/lucene/issues/12701#issuecomment-1775685100 This is a neat idea @gf2121 -- did you close it because it's similar / same as the direct addressing case? -- This is an automated message from the Apache Git Service. To respo

[I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-23 Thread via GitHub
mikemccand opened a new issue, #12714: URL: https://github.com/apache/lucene/issues/12714 ### Description To share suffixes, for creating as minimal an FST as we can, `FSTCompiler` using `NodeHash` to record the most recently used/shared suffixes. But it stores the values (the nodes

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775698779 > It is a lot of complexity, especially to vectorize. +1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is

Re: [I] FSTCompiler's NodeHash should fully duplicate `byte[]` slices from the growing FST [lucene]

2023-10-23 Thread via GitHub
mikemccand commented on issue #12714: URL: https://github.com/apache/lucene/issues/12714#issuecomment-1775711575 I think we can use `ByteBlockPool` to store the `byte[]` slices, just appending a new `byte[]` slice when we store a new suffix. We never delete individual suffixes, but rather

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775717147 I like the idea of removing the complexity associated with patching if we're convinced it's the right trade-off (and +1 to the pain of vectorizing with patching going away).

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
msokolov commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775716306 > Hmm, can you elaborate how it can be fully backwards-compatible on with the indexes that have patching? I think the idea is that because we always maintain readers that can

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775725064 > +1. I recalled that @gsmiller was playing with some SIMD algos for decoding blocks of delta-encoded ints. Even if that is fruitful it'd be tricky to apply it because of the patch

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775807115 > In 11.0, remove all patching logic which will, a) simplify the code a bit, and b) remove the (likely minor) overhead on read of looking up the number of patches in a block, which i

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775871779 > Maybe write something in the index header to indicate if patching is there (default to yes - in 9.x ). Then new indexes will write additional header to indicate there is not patc

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-23 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1369190050 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Prevent users from using document block APIs when sort is configured [lucene]

2023-10-23 Thread via GitHub
s1monw commented on PR #12711: URL: https://github.com/apache/lucene/pull/12711#issuecomment-1775926459 Would an expert API on the IndexSort work for you folks? Like a getter that indicates if it’s a stable sort and preserves blocks? On 23. Oct 2023, at 19:28, Michael McCandless ***@***.***

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-23 Thread via GitHub
Tony-X commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1775993940 > would the goal here be to eliminate overhead of having to read the number of patches when decoding each block? Yes. This means we could know upfront at segment opening time w

Re: [PR] Consolidate FSTStore and BytesStore in FST [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12709: URL: https://github.com/apache/lucene/pull/12709#discussion_r1369373040 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode nodeIn, int tailLength) t

Re: [PR] Consolidate FSTStore and BytesStore in FST [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12709: URL: https://github.com/apache/lucene/pull/12709#discussion_r1369373040 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -317,8 +319,6 @@ private CompiledNode compileNode(UnCompiledNode nodeIn, int tailLength) t

Re: [PR] Speed up the sort when building forward index [lucene]

2023-10-23 Thread via GitHub
gf2121 commented on PR #12712: URL: https://github.com/apache/lucene/pull/12712#issuecomment-1776364792 I forked the `LSBRadixSorter` to sort longs and use it when ram budget is enough. Generally 5x faster than candidate, 25x faster than baseline. https://bytedance.feishu.cn/sheets/HS

Re: [PR] Consolidate FSTStore and BytesStore in FST [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on PR #12709: URL: https://github.com/apache/lucene/pull/12709#issuecomment-1776396388 > I think we should land this only on main for now, and then backport it eventually to 9.x along with the other FST changes? I think this makes sense. Let hold off the backporting

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on code in PR #12715: URL: https://github.com/apache/lucene/pull/12715#discussion_r1369634837 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -122,8 +122,11 @@ public class FSTCompiler { /** * Instantiates an FST/FSA builder w

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-10-23 Thread via GitHub
dungba88 commented on PR #12715: URL: https://github.com/apache/lucene/pull/12715#issuecomment-1776546474 Thank you for change! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-23 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1369642741 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsFormat.java: ## @@ -146,18 +148,24 @@ public final class Lucene95HnswVectorsFormat extends

<    10   11   12   13   14   15   16   17   18   19   >