[GitHub] [lucene] jbellis commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


jbellis commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1529796265

   I've made a couple optimizations and merged to main.
   
   With the optimizations it's still about 38% slower than the non-concurrent 
code (including the HashMap optimization for the non-concurrent that went in).  
Nothing stands out in the profiler as a problem anymore; I guess that's what 
you get replacing HashMap with ConcurrentHashMap and more especially replacing 
ArrayList with ConcurrentHashMap in L0.
   
   I still do not know how to fix the ram usage estimates; any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


jbellis commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1529909165

   Replaced parallel stream with ExecutorService.  Only ThreadPoolExecutor is 
supported, since it knows how many thread it has available.  After a quick look 
that is the only type I see Lucene creating.
   
   I was pleasantly surprised that the TPE is no slower than parallel streams 
in my testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


tang-hi commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1529914311

   > I've made a couple optimizations and merged to main.
   > 
   > With the optimizations it's still about 38% slower than the non-concurrent 
code (including the HashMap optimization for the non-concurrent that went in). 
Nothing stands out in the profiler as a problem anymore; I guess that's what 
you get replacing HashMap with ConcurrentHashMap and more especially replacing 
ArrayList with ConcurrentHashMap in L0.
   > 
   > I still do not know how to fix the ram usage estimates; any suggestions?
   
   I believe the issue is related to the Java Platform Module System, which was 
introduced in Java 9. You can modify the following section in the [gradlew 
script](https://github.com/apache/lucene/blob/3c163745bb07aed51b52878de0666f1405696147/gradlew#L221)
 to avoid the exception:
   
https://github.com/apache/lucene/blob/3c163745bb07aed51b52878de0666f1405696147/gradlew#L221
   
   **modified version**
   ```shell
   JVM_ARGS=" -Ptests.jvmargs=\"--add-opens java.base/java.lang=ALL-UNNAMED 
--add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED\""
   
   # Collect all arguments for the java command, following the shell quoting 
and substitution rules
   
   eval set -- $JAVA_OPTS $GRADLE_OPTS 
"\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" 
org.gradle.wrapper.GradleWrapperMain "$APP_ARGS" "$JVM_ARGS"
   ```
   After making this change, the exception should no longer be thrown. However, 
the test may still fail due to an AssertionError, as shown below:
   
   ```
 java.lang.AssertionError: expected:<1185336.0> but was:<486483.0>
  > at 
__randomizedtesting.SeedInfo.seed([A41F62241729FCF:CF0592627AF94878]:0)
  > at org.junit.Assert.fail(Assert.java:89)
  > at org.junit.Assert.failNotEquals(Assert.java:835)
  > at org.junit.Assert.assertEquals(Assert.java:555)
  > at org.junit.Assert.assertEquals(Assert.java:685)
  > at 
org.apache.lucene.util.hnsw.HnswGraphTestCase.testRamUsageEstimate(HnswGraphTestCase.java:789)
  > at 
org.apache.lucene.util.hnsw.TestHnswFloatVectorGraph.testRamUsageEstimate(TestHnswFloatVectorGraph.java:37)
  > at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  > at java.base/jdk.internal.reflect.NativeMethodAccesso
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


jbellis commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1529956854

   Thanks! For me it is working:
   
   ```
   $ ./gradlew -p lucene/core test --tests "*Hnsw*"
   ...
   :lucene:core:test (SUCCESS): 75 test(s)
   ```
   
   Note that adding these args do not seem to be picked up by intellij when 
running tests, is there a better place to add them?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


tang-hi commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1530005788

   > Thanks! For me it is working:
   > 
   > ```
   > $ ./gradlew -p lucene/core test --tests "*Hnsw*"
   > ...
   > :lucene:core:test (SUCCESS): 75 test(s)
   > ```
   > 
   > Note that adding these args do not seem to be picked up by intellij when 
running tests, is there a better place to add them?
   
   May be you could add
   ```gradle
   test {
 jvmArgs(
 '--add-opens', 'java.base/java.lang=ALL-UNNAMED',
 '--add-opens', 'java.base/java.util.concurrent.atomic=ALL-UNNAMED'
 )
   }
   ```
   to file lucene/core/build.gradle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


jbellis commented on PR #12254:
URL: https://github.com/apache/lucene/pull/12254#issuecomment-1530167402

   thanks again.
   
   and you're right, the ram usage needed a rewrite, it is passing now with the 
seed you found.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] MarcusSorealheis commented on a diff in pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


MarcusSorealheis commented on code in PR #12254:
URL: https://github.com/apache/lucene/pull/12254#discussion_r1182114316


##
lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentOnHeapHnswGraph.java:
##
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.lucene.util.Accountable;
+import org.apache.lucene.util.RamUsageEstimator;
+
+/**
+ * An {@link HnswGraph} that offers concurrent access; for typical graphs you 
will get significant
+ * speedups in construction and searching as you add threads.
+ *
+ * To search this graph, you should use a View obtained from {@link 
#getView()} to perform `seek`
+ * and `nextNeighbor` operations. For convenience, you can use these methods 
directly on the graph
+ * instance, which will give you a ThreadLocal View, but you can call 
`getView` directly if you need
+ * more control, e.g. for performing a second search in the same thread while 
the first is still in
+ * progress.
+ */
+public final class ConcurrentOnHeapHnswGraph extends HnswGraph implements 
Accountable {
+  private final AtomicReference
+  entryPoint; // the current graph entry node on the top level. -1 if not 
set
+
+  // views for compatibility with HnswGraph interface; prefer creating views 
explicitly
+  private final ThreadLocal views =
+  ThreadLocal.withInitial(ConcurrentHnswGraphView::new);
+
+  // Unlike OnHeapHnswGraph (OHHG), we use the same data structure for Level 0 
and higher node
+  // lists,
+  // a ConcurrentHashMap.  While the ArrayList used for L0 in OHHG is faster 
for single-threaded
+  // workloads, it imposes an unacceptable contention burden for concurrent 
workloads.
+  private final ConcurrentMap> graphLevels;
+
+  // Neighbours' size on upper levels (nsize) and level 0 (nsize0)
+  private final int nsize;
+  private final int nsize0;
+
+  ConcurrentOnHeapHnswGraph(int M) {
+this.entryPoint =
+new AtomicReference<>(
+new NodeAtLevel(0, -1)); // Entry node should be negative until a 
node is added
+this.nsize = M;
+this.nsize0 = 2 * M;
+
+this.graphLevels = new ConcurrentHashMap<>();
+  }
+
+  /**
+   * Returns the neighbors connected to the given node.
+   *
+   * @param level level of the graph
+   * @param node the node whose neighbors are returned, represented as an 
ordinal on the level 0.
+   */
+  public ConcurrentNeighborSet getNeighbors(int level, int node) {
+return graphLevels.get(level).get(node);
+  }
+
+  @Override
+  public synchronized int size() {
+return graphLevels.get(0).size(); // all nodes are located on the 0th level
+  }
+
+  @Override
+  public void addNode(int level, int node) {
+if (level >= graphLevels.size()) {
+  for (int i = graphLevels.size(); i <= level; i++) {
+graphLevels.putIfAbsent(i, new ConcurrentHashMap<>());
+  }
+}
+
+graphLevels.get(level).put(node, new 
ConcurrentNeighborSet(connectionsOnLevel(level)));
+  }
+
+  /**
+   * must be called after addNode to a level > 0
+   *
+   * we don't do this as part of addNode itself, since it may not yet have 
been added to all the
+   * levels
+   */
+  void maybeUpdateEntryNode(int level, int node) {
+while (true) {
+  NodeAtLevel oldEntry = entryPoint.get();
+  if (oldEntry.node >= 0 && oldEntry.level >= level) {
+break;
+  }
+  entryPoint.compareAndSet(oldEntry, new NodeAtLevel(level, node));
+}
+  }
+
+  private int connectionsOnLevel(int level) {
+return level == 0 ? nsize0 : nsize;
+  }
+
+  @Override
+  public void seek(int level, int target) throws IOException {
+views.get().seek(level, target);
+  }
+
+  @Override
+  public int nextNeighbor() throws IOException {
+return views.get().nextNeighbor();
+  }
+
+  /**
+   * @return the current number of levels in the graph where nodes have been 
added and we have a
+   * valid entry point.
+   */
+

[GitHub] [lucene] MarcusSorealheis commented on a diff in pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


MarcusSorealheis commented on code in PR #12254:
URL: https://github.com/apache/lucene/pull/12254#discussion_r1182114316


##
lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentOnHeapHnswGraph.java:
##
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.lucene.util.Accountable;
+import org.apache.lucene.util.RamUsageEstimator;
+
+/**
+ * An {@link HnswGraph} that offers concurrent access; for typical graphs you 
will get significant
+ * speedups in construction and searching as you add threads.
+ *
+ * To search this graph, you should use a View obtained from {@link 
#getView()} to perform `seek`
+ * and `nextNeighbor` operations. For convenience, you can use these methods 
directly on the graph
+ * instance, which will give you a ThreadLocal View, but you can call 
`getView` directly if you need
+ * more control, e.g. for performing a second search in the same thread while 
the first is still in
+ * progress.
+ */
+public final class ConcurrentOnHeapHnswGraph extends HnswGraph implements 
Accountable {
+  private final AtomicReference
+  entryPoint; // the current graph entry node on the top level. -1 if not 
set
+
+  // views for compatibility with HnswGraph interface; prefer creating views 
explicitly
+  private final ThreadLocal views =
+  ThreadLocal.withInitial(ConcurrentHnswGraphView::new);
+
+  // Unlike OnHeapHnswGraph (OHHG), we use the same data structure for Level 0 
and higher node
+  // lists,
+  // a ConcurrentHashMap.  While the ArrayList used for L0 in OHHG is faster 
for single-threaded
+  // workloads, it imposes an unacceptable contention burden for concurrent 
workloads.
+  private final ConcurrentMap> graphLevels;
+
+  // Neighbours' size on upper levels (nsize) and level 0 (nsize0)
+  private final int nsize;
+  private final int nsize0;
+
+  ConcurrentOnHeapHnswGraph(int M) {
+this.entryPoint =
+new AtomicReference<>(
+new NodeAtLevel(0, -1)); // Entry node should be negative until a 
node is added
+this.nsize = M;
+this.nsize0 = 2 * M;
+
+this.graphLevels = new ConcurrentHashMap<>();
+  }
+
+  /**
+   * Returns the neighbors connected to the given node.
+   *
+   * @param level level of the graph
+   * @param node the node whose neighbors are returned, represented as an 
ordinal on the level 0.
+   */
+  public ConcurrentNeighborSet getNeighbors(int level, int node) {
+return graphLevels.get(level).get(node);
+  }
+
+  @Override
+  public synchronized int size() {
+return graphLevels.get(0).size(); // all nodes are located on the 0th level
+  }
+
+  @Override
+  public void addNode(int level, int node) {
+if (level >= graphLevels.size()) {
+  for (int i = graphLevels.size(); i <= level; i++) {
+graphLevels.putIfAbsent(i, new ConcurrentHashMap<>());
+  }
+}
+
+graphLevels.get(level).put(node, new 
ConcurrentNeighborSet(connectionsOnLevel(level)));
+  }
+
+  /**
+   * must be called after addNode to a level > 0
+   *
+   * we don't do this as part of addNode itself, since it may not yet have 
been added to all the
+   * levels
+   */
+  void maybeUpdateEntryNode(int level, int node) {
+while (true) {
+  NodeAtLevel oldEntry = entryPoint.get();
+  if (oldEntry.node >= 0 && oldEntry.level >= level) {
+break;
+  }
+  entryPoint.compareAndSet(oldEntry, new NodeAtLevel(level, node));
+}
+  }
+
+  private int connectionsOnLevel(int level) {
+return level == 0 ? nsize0 : nsize;
+  }
+
+  @Override
+  public void seek(int level, int target) throws IOException {
+views.get().seek(level, target);
+  }
+
+  @Override
+  public int nextNeighbor() throws IOException {
+return views.get().nextNeighbor();
+  }
+
+  /**
+   * @return the current number of levels in the graph where nodes have been 
added and we have a
+   * valid entry point.
+   */
+

[GitHub] [lucene] MarcusSorealheis commented on a diff in pull request #12254: add ConcurrentOnHeapHnswGraph and Builder

2023-05-01 Thread via GitHub


MarcusSorealheis commented on code in PR #12254:
URL: https://github.com/apache/lucene/pull/12254#discussion_r1182114316


##
lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentOnHeapHnswGraph.java:
##
@@ -0,0 +1,280 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.atomic.AtomicReference;
+import org.apache.lucene.util.Accountable;
+import org.apache.lucene.util.RamUsageEstimator;
+
+/**
+ * An {@link HnswGraph} that offers concurrent access; for typical graphs you 
will get significant
+ * speedups in construction and searching as you add threads.
+ *
+ * To search this graph, you should use a View obtained from {@link 
#getView()} to perform `seek`
+ * and `nextNeighbor` operations. For convenience, you can use these methods 
directly on the graph
+ * instance, which will give you a ThreadLocal View, but you can call 
`getView` directly if you need
+ * more control, e.g. for performing a second search in the same thread while 
the first is still in
+ * progress.
+ */
+public final class ConcurrentOnHeapHnswGraph extends HnswGraph implements 
Accountable {
+  private final AtomicReference
+  entryPoint; // the current graph entry node on the top level. -1 if not 
set
+
+  // views for compatibility with HnswGraph interface; prefer creating views 
explicitly
+  private final ThreadLocal views =
+  ThreadLocal.withInitial(ConcurrentHnswGraphView::new);
+
+  // Unlike OnHeapHnswGraph (OHHG), we use the same data structure for Level 0 
and higher node
+  // lists,
+  // a ConcurrentHashMap.  While the ArrayList used for L0 in OHHG is faster 
for single-threaded
+  // workloads, it imposes an unacceptable contention burden for concurrent 
workloads.
+  private final ConcurrentMap> graphLevels;
+
+  // Neighbours' size on upper levels (nsize) and level 0 (nsize0)
+  private final int nsize;
+  private final int nsize0;
+
+  ConcurrentOnHeapHnswGraph(int M) {
+this.entryPoint =
+new AtomicReference<>(
+new NodeAtLevel(0, -1)); // Entry node should be negative until a 
node is added
+this.nsize = M;
+this.nsize0 = 2 * M;
+
+this.graphLevels = new ConcurrentHashMap<>();
+  }
+
+  /**
+   * Returns the neighbors connected to the given node.
+   *
+   * @param level level of the graph
+   * @param node the node whose neighbors are returned, represented as an 
ordinal on the level 0.
+   */
+  public ConcurrentNeighborSet getNeighbors(int level, int node) {
+return graphLevels.get(level).get(node);
+  }
+
+  @Override
+  public synchronized int size() {
+return graphLevels.get(0).size(); // all nodes are located on the 0th level
+  }
+
+  @Override
+  public void addNode(int level, int node) {
+if (level >= graphLevels.size()) {
+  for (int i = graphLevels.size(); i <= level; i++) {
+graphLevels.putIfAbsent(i, new ConcurrentHashMap<>());
+  }
+}
+
+graphLevels.get(level).put(node, new 
ConcurrentNeighborSet(connectionsOnLevel(level)));
+  }
+
+  /**
+   * must be called after addNode to a level > 0
+   *
+   * we don't do this as part of addNode itself, since it may not yet have 
been added to all the
+   * levels
+   */
+  void maybeUpdateEntryNode(int level, int node) {
+while (true) {
+  NodeAtLevel oldEntry = entryPoint.get();
+  if (oldEntry.node >= 0 && oldEntry.level >= level) {
+break;
+  }
+  entryPoint.compareAndSet(oldEntry, new NodeAtLevel(level, node));
+}
+  }
+
+  private int connectionsOnLevel(int level) {
+return level == 0 ? nsize0 : nsize;
+  }
+
+  @Override
+  public void seek(int level, int target) throws IOException {
+views.get().seek(level, target);
+  }
+
+  @Override
+  public int nextNeighbor() throws IOException {
+return views.get().nextNeighbor();
+  }
+
+  /**
+   * @return the current number of levels in the graph where nodes have been 
added and we have a
+   * valid entry point.
+   */
+