Re: [PR] Initial impl of MMapDirectory for Java 22 [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12706: URL: https://github.com/apache/lucene/pull/12706#issuecomment-1774030650 I updated the Jenkins jobs running mmap tests to use this branch: https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/, https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Windows/ -- Th

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler merged PR #12705: URL: https://github.com/apache/lucene/pull/12705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-22 Thread via GitHub
gf2121 commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774050262 > essentially calling OfflineSorter on all postings FYI, I came up with some ideas to optimize this sort before, hoping to be helpful :) 1. If we use a stable sorter, we

Re: [I] Specialize arc store for continuous label in FST [lucene]

2023-10-22 Thread via GitHub
gf2121 closed issue #12701: Specialize arc store for continuous label in FST URL: https://github.com/apache/lucene/issues/12701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774091447 IllegalStateException cannot happen in that code, only in access to memory segments closed by other threads. NPE was a special case as it may happen easier. IllegalStateExceptio

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774092282 > If that's the case, it seems fine, although a bit fragile to maintain? I argued during the long journey of Panama Foreign to have a specific subclass of IllegalStateException

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367900707 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774105554 > > If that's the case, it seems fine, although a bit fragile to maintain? > > I argued during the long journey of Panama Foreign to have a specific subclass of IllegalStateExce

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774106700 You can call `segment.scope().isAlive()` to figure out if the scope is still alive. This works for Java 20+. The Java 19 version can't use this. I will possibly create a new PR

[PR] MMapDirectory with MemorySegment: Confirm that scope/session is no longer alive before throwing AlreadyClosedException [lucene]

2023-10-22 Thread via GitHub
uschindler opened a new pull request, #12707: URL: https://github.com/apache/lucene/pull/12707 Followup on #12705: With memory segments we get an IllegalStateException. Instead of always rewriting it to AlreadyClosedException we confirm before if the segment scope (session in Java 19) is no

Re: [PR] Improve handling of NullPointerException in MMapDirectory's IndexInputs (check the "closed" condition) [lucene]

2023-10-22 Thread via GitHub
uschindler commented on PR #12705: URL: https://github.com/apache/lucene/pull/12705#issuecomment-1774116253 I improved the IllegalStateHandling in #12707 in the same way by confirming the state of the segment's scope (Java20+) / session (Java19). @msokolov: Please have a quick look be

Re: [PR] MMapDirectory with MemorySegment: Confirm that scope/session is no longer alive before throwing AlreadyClosedException [lucene]

2023-10-22 Thread via GitHub
uschindler merged PR #12707: URL: https://github.com/apache/lucene/pull/12707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-10-22 Thread via GitHub
dweiss commented on issue #12704: URL: https://github.com/apache/lucene/issues/12704#issuecomment-1774156970 I borrowed that constant in BitMixer from Sebastiano Vigna, I believe. Here is a nice overview of its origin/ rationale: https://softwareengineering.stackexchange.com/question

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367952944 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends Kn

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953194 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953519 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -26,17 +26,39 @@ import java.util.ArrayList; import java.util.Arrays;

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953845 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367953931 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955218 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955272 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955515 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -33,7 +33,7 @@ * Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367955644 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956157 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956372 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956726 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -151,61 +159,124 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367956886 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start, long t) {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367957707 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -221,34 +292,39 @@ private long printGraphBuildStatus(int node, long start, long t) {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958357 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphMerger.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367958754 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971257 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367971976 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
msokolov commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367972354 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367981481 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367982181 ## lucene/core/src/test/org/apache/lucene/util/hnsw/HnswGraphTestCase.java: ## @@ -709,6 +710,7 @@ public void testHnswGraphBuilderInvalid() throws IOException {

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984246 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -50,13 +72,24 @@ public final class Lucene95HnswVectorsWriter extends Kn

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367984559 ## lucene/core/src/java/org/apache/lucene/codecs/lucene95/Lucene95HnswVectorsWriter.java: ## @@ -635,17 +667,31 @@ private static DocsWithFieldSet writeVectorData(

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986199 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986490 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on code in PR #12660: URL: https://github.com/apache/lucene/pull/12660#discussion_r1367986729 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -33,7 +33,7 @@ * Builder for HNSW graph. See {@link HnswGraph} for a gloss on the algo

Re: [I] Don't provide two ways to build an FST [lucene]

2023-10-22 Thread via GitHub
cavorite commented on issue #12695: URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774229048 I'm be willing to work on this issues (as a way to get more familiar with Lucene's internal code base). First, I'd like to see if I'm understanding the work needed. So far,

Re: [I] Don't provide two ways to build an FST [lucene]

2023-10-22 Thread via GitHub
mikemccand commented on issue #12695: URL: https://github.com/apache/lucene/issues/12695#issuecomment-1774232380 Yes that's exactly the idea! Thank you @cavorite for tackling this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-22 Thread via GitHub
mikemccand commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1774236604 > Should we just do more tests and start writing indexes without patching? Only a 4 percent disk savings? It is a lot of complexity, especially to vectorize. A runtime option is

Re: [PR] Concurrent HNSW Merge [lucene]

2023-10-22 Thread via GitHub
zhaih commented on PR #12660: URL: https://github.com/apache/lucene/pull/12660#issuecomment-1774469691 > I would be curious to see the contention times and also understand how this changes CPU usage vs. single-threaded. @msokolov as for CPU usage, I just tested with 1M docs, and on my

Re: [I] Enable recursive graph bisection out of the box? [lucene]

2023-10-22 Thread via GitHub
jpountz commented on issue #12665: URL: https://github.com/apache/lucene/issues/12665#issuecomment-1774523421 These sound like great ideas! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe