[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12246: Set word2vec getSynonyms method synchronized

2023-05-16 Thread via GitHub


alessandrobenedetti commented on code in PR #12246:
URL: https://github.com/apache/lucene/pull/12246#discussion_r1194809968


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##
@@ -85,7 +86,7 @@ public List getSynonyms(
   SIMILARITY_FUNCTION,
   hnswGraph,
   null,
-  word2VecModel.size());
+  NO_LIMIT_ON_VISITED_NODES);

Review Comment:
   Thanks @zhaih ! I would suggest you close this PR as soon as the other two 
are merged.
   
   The one about concurrent problems, to be honest I would not wait further and 
proceed merging, for https://github.com/apache/lucene/pull/12235 I am not sure 
if you need some additional review, let me know!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-05-16 Thread via GitHub


alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1194958670


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatVectorFieldSource.java:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.search.DocIdSetIterator;
+
+public class FloatVectorFieldSource extends ValueSource {
+  private final String fieldName;
+
+  public FloatVectorFieldSource(String fieldName) {
+this.fieldName = fieldName;
+  }
+
+  @Override
+  public FunctionValues getValues(Map context, 
LeafReaderContext readerContext)
+  throws IOException {
+
+final FloatVectorValues vectorValues = 
readerContext.reader().getFloatVectorValues(fieldName);
+return new VectorFieldFunction(this) {
+  float[] defaultVector = null;

Review Comment:
   to discuss



##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ByteVectorFieldSource.java:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.search.DocIdSetIterator;
+
+public class ByteVectorFieldSource extends ValueSource {
+  private final String fieldName;
+
+  public ByteVectorFieldSource(String fieldName) {
+this.fieldName = fieldName;
+  }
+
+  @Override
+  public FunctionValues getValues(Map context, 
LeafReaderContext readerContext)
+  throws IOException {
+
+final ByteVectorValues vectorValues = 
readerContext.reader().getByteVectorValues(fieldName);
+return new VectorFieldFunction(this) {
+  byte[] defaultVector = null;

Review Comment:
   to discuss



##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/VectorFieldFunction.java:
##
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import org.apache.lucene.queries.function.Funct

[GitHub] [lucene] rmuir merged pull request #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-16 Thread via GitHub


rmuir merged PR #12298:
URL: https://github.com/apache/lucene/pull/12298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-16 Thread via GitHub


rmuir commented on PR #12298:
URL: https://github.com/apache/lucene/pull/12298#issuecomment-1549465353

   The limit can only be removed in `main` branch as there are still some 
recursive algorithms in `branch_9x`. More invasive/heavier automaton changes 
have happened in 10.x, I think it is ok, we are still moving forwards.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on pull request #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-16 Thread via GitHub


tang-hi commented on PR #12298:
URL: https://github.com/apache/lucene/pull/12298#issuecomment-1549551695

   thanks @rmuir 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] tang-hi commented on issue #12284: input automaton is too large: 1001 in Operations.topoSortStatesRecurse(Operations.java:1357)

2023-05-16 Thread via GitHub


tang-hi commented on issue #12284:
URL: https://github.com/apache/lucene/issues/12284#issuecomment-1549552433

   it should be solved in PR #12286 😆 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549737196

   Hi this is a good idea for Lucene 10.0 (minimum is Java 17).
   
   This can't be backported to Java 11, so won't be on Lucene 9.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12296:
URL: https://github.com/apache/lucene/pull/12296#discussion_r1195218209


##
lucene/core/src/java/org/apache/lucene/index/IndexReaderContext.java:
##
@@ -22,27 +22,25 @@
  * A struct like class that represents a hierarchical relationship between 
{@link IndexReader}
  * instances.
  */
-public abstract class IndexReaderContext {
+public abstract sealed class IndexReaderContext permits 
CompositeReaderContext, LeafReaderContext {
   /** The reader context for this reader's immediate parent, or null if none */
   public final CompositeReaderContext parent;
   /**
-   * true if this context struct represents the top level reader 
within the
+   * {@code true} if this context struct represents the top level reader 
within the
* hierarchical context
*/
   public final boolean isTopLevel;
-  /** the doc base for this reader in the parent, 0 if parent is 
null */
+  /** the doc base for this reader in the parent, {@code 0} if parent is null 
*/
   public final int docBaseInParent;
-  /** the ord for this reader in the parent, 0 if parent is null 
*/
+  /** the ord for this reader in the parent, {@code 0} if parent is null */
   public final int ordInParent;
 
   // An object that uniquely identifies this context without referencing
   // segments. The goal is to make it fine to have references to this
   // identity object, even after the index reader has been closed
-  final Object identity = new Object();
+  protected final Object identity = new Object();

Review Comment:
   I think this is an unrelated change. Do we need to make it protected?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12296:
URL: https://github.com/apache/lucene/pull/12296#discussion_r1195220727


##
lucene/core/src/java/org/apache/lucene/index/IndexReaderContext.java:
##
@@ -22,27 +22,25 @@
  * A struct like class that represents a hierarchical relationship between 
{@link IndexReader}
  * instances.
  */
-public abstract class IndexReaderContext {
+public abstract sealed class IndexReaderContext permits 
CompositeReaderContext, LeafReaderContext {
   /** The reader context for this reader's immediate parent, or null if none */
   public final CompositeReaderContext parent;
   /**
-   * true if this context struct represents the top level reader 
within the
+   * {@code true} if this context struct represents the top level reader 
within the
* hierarchical context
*/
   public final boolean isTopLevel;
-  /** the doc base for this reader in the parent, 0 if parent is 
null */
+  /** the doc base for this reader in the parent, {@code 0} if parent is null 
*/
   public final int docBaseInParent;
-  /** the ord for this reader in the parent, 0 if parent is null 
*/
+  /** the ord for this reader in the parent, {@code 0} if parent is null */
   public final int ordInParent;
 
   // An object that uniquely identifies this context without referencing
   // segments. The goal is to make it fine to have references to this
   // identity object, even after the index reader has been closed
-  final Object identity = new Object();
+  protected final Object identity = new Object();

Review Comment:
   Ah you mentioned it. I don't remember why this `identity` is there. Is it 
really needed to be public. I'd keep this as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549747674

   Please run `gradlew tidy`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


uschindler commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1549756344

   I am fine with this, unless it makes backports harder. So basically I use 
this for *new* code in new 10.x features only, but I'd like to prevent this 
from happing in code that is possibly backported.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


uschindler commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1549769045

   The precommit checks already answer most of your questions: Only use that 
feature for now if it is easy to apply. :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12290:
URL: https://github.com/apache/lucene/pull/12290#discussion_r1195248137


##
lucene/core/src/java/org/apache/lucene/store/ByteBufferGuard.java:
##
@@ -39,7 +40,7 @@ final class ByteBufferGuard {
* this to allow unmapping of bytebuffers with private Java APIs.
*/
   @FunctionalInterface
-  static interface BufferCleaner {
+  interface BufferCleaner {

Review Comment:
   be explicit here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-05-16 Thread via GitHub


alessandrobenedetti commented on code in PR #12253:
URL: https://github.com/apache/lucene/pull/12253#discussion_r1195251817


##
lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ByteVectorFieldSource.java:
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.queries.function.valuesource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Map;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.search.DocIdSetIterator;
+
+public class ByteVectorFieldSource extends ValueSource {
+  private final String fieldName;
+
+  public ByteVectorFieldSource(String fieldName) {
+this.fieldName = fieldName;
+  }
+
+  @Override
+  public FunctionValues getValues(Map context, 
LeafReaderContext readerContext)
+  throws IOException {
+
+final ByteVectorValues vectorValues = 
readerContext.reader().getByteVectorValues(fieldName);
+return new VectorFieldFunction(this) {
+  byte[] defaultVector = null;

Review Comment:
   after the discussion, possibly NaN is a better option



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


uschindler commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549785157

   I like it, but actually this class should go away at some point!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12298: move max recursion from Operations.java to AutomatonTestUtil.java

2023-05-16 Thread via GitHub


mikemccand commented on PR #12298:
URL: https://github.com/apache/lucene/pull/12298#issuecomment-1549796778

   Thank you @tang-hi!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on a diff in pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


JarvisCraft commented on code in PR #12290:
URL: https://github.com/apache/lucene/pull/12290#discussion_r1195269692


##
lucene/core/src/java/org/apache/lucene/store/ByteBufferGuard.java:
##
@@ -39,7 +40,7 @@ final class ByteBufferGuard {
* this to allow unmapping of bytebuffers with private Java APIs.
*/
   @FunctionalInterface
-  static interface BufferCleaner {
+  interface BufferCleaner {

Review Comment:
   Rolled back, although I feel like `static` does not add to expliicitness 
here considering that interfaces never can be non-static; also, most code seems 
to omit `static` on inner `interface`s: 23 without it VS 4 with it (including 
this one).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549806649

   > but actually this class should go away at some point!
   
   Yup, but as for now it seems more safe to rely on well-defined behaviour 
(until this class is gone).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on a diff in pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


JarvisCraft commented on code in PR #12296:
URL: https://github.com/apache/lucene/pull/12296#discussion_r1195281538


##
lucene/core/src/java/org/apache/lucene/index/IndexReaderContext.java:
##
@@ -22,27 +22,25 @@
  * A struct like class that represents a hierarchical relationship between 
{@link IndexReader}
  * instances.
  */
-public abstract class IndexReaderContext {
+public abstract sealed class IndexReaderContext permits 
CompositeReaderContext, LeafReaderContext {
   /** The reader context for this reader's immediate parent, or null if none */
   public final CompositeReaderContext parent;
   /**
-   * true if this context struct represents the top level reader 
within the
+   * {@code true} if this context struct represents the top level reader 
within the
* hierarchical context
*/
   public final boolean isTopLevel;
-  /** the doc base for this reader in the parent, 0 if parent is 
null */
+  /** the doc base for this reader in the parent, {@code 0} if parent is null 
*/
   public final int docBaseInParent;
-  /** the ord for this reader in the parent, 0 if parent is null 
*/
+  /** the ord for this reader in the parent, {@code 0} if parent is null */
   public final int ordInParent;
 
   // An object that uniquely identifies this context without referencing
   // segments. The goal is to make it fine to have references to this
   // identity object, even after the index reader has been closed
-  final Object identity = new Object();
+  protected final Object identity = new Object();

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549819407

   > Please run gradlew tidy.
   
   All done :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1549821494

   > The precommit checks already answer most of your questions: Only use that 
feature for now if it is easy to apply. :-)
   
   Fair point! As for now, I will run them locally (my bad here) and update the 
code accordingly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1549837524

   I've fixed the issues and applied `./gradlew tidy`.
   
   As for the backporting burden, I, unfortunately, lack the expertise in 
providing Lucene backports, so I guess that I'd better leave it up to you to 
decide which of these are okay to be applied and which ones are in "hot code" 
(although I possitively expect this specific lines to be rarely visited).
   
   Considering that this changes also require Java version higher than 11, this 
seems to be a Lucene 10 change, so I guess there is plenty of room for 
discussion here whether specific files should be changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


uschindler commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549844375

   The whole `ByteBufferGuard` does not help too much to prevent code from 
failing and crushing the VM! But yes, it's a good idea to enforce more.
   
   I played with that in the past: `ByteBufferGuard` never helped much, a `full 
fence` neither. This class was bullshit from the beginning as it only delayed 
the crushing of JVM a bit more on misuse.
   
   So it should never have existed...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


uschindler commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549857185

   Sorry,
   I'd suggest to not touch `ByteBufferGuard` at all, it did its job to throw 
AlreadyClosedException instead of segmentation fault consistently until memory 
access is compiled by Hotspot. Don't touch it.
   
   W have to live with it only until Project Panama is in mainline JDKs. 
Unfortunatley as of today, not even 21 will ship with non-preview Panama, so 
tricky compilation to prevent preview warnings in #12294 is needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549864252

   > Sorry,
   > I'd suggest to not touch ByteBufferGuard at all,
   
   No problems! Was glad to hear your review on this change)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


uschindler commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1549864245

   I already assigned the Lucene 10 milestone, not lucene 9.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549871892

   Could you apply the same change to `IndexReader`; it should be sealed, too? 
https://github.com/apache/lucene/blob/f53eb28af053d7612f7e4d1b2de05d33dc410645/lucene/core/src/java/org/apache/lucene/index/IndexReader.java#L72-L76


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


dweiss commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549878245

   Had to go back and re-read what all this was about in the beginning. I like 
the patch (nice API) but sort of agree with Uwe - the current solution is 
flawed (but working 99% of the time). The patch wouldn't change it to work 
correctly but it has the potential of having unknown side effects... Let's just 
keep the existing solution until a truly sound one is available.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549878949

   Please remove the protected from consructor. We don't want to show it in 
javadocs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12296:
URL: https://github.com/apache/lucene/pull/12296#discussion_r1195330910


##
lucene/core/src/java/org/apache/lucene/index/IndexReaderContext.java:
##
@@ -22,27 +22,26 @@
  * A struct like class that represents a hierarchical relationship between 
{@link IndexReader}
  * instances.
  */
-public abstract class IndexReaderContext {
+public abstract sealed class IndexReaderContext permits 
CompositeReaderContext, LeafReaderContext {
   /** The reader context for this reader's immediate parent, or null if none */
   public final CompositeReaderContext parent;
   /**
-   * true if this context struct represents the top level reader 
within the
-   * hierarchical context
+   * {@code true} if this context struct represents the top level reader 
within the hierarchical
+   * context
*/
   public final boolean isTopLevel;
-  /** the doc base for this reader in the parent, 0 if parent is 
null */
+  /** the doc base for this reader in the parent, {@code 0} if parent is null 
*/
   public final int docBaseInParent;
-  /** the ord for this reader in the parent, 0 if parent is null 
*/
+  /** the ord for this reader in the parent, {@code 0} if parent is null */
   public final int ordInParent;
 
   // An object that uniquely identifies this context without referencing
   // segments. The goal is to make it fine to have references to this
   // identity object, even after the index reader has been closed
   final Object identity = new Object();
 
-  IndexReaderContext(CompositeReaderContext parent, int ordInParent, int 
docBaseInParent) {
-if (!(this instanceof CompositeReaderContext || this instanceof 
LeafReaderContext))
-  throw new Error("This class should never be extended by custom code!");
+  protected IndexReaderContext(

Review Comment:
   Please make it package protected again, as we don't want it to appear in 
Javadocs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] sherman commented on pull request #12290: Make memory fence in `ByteBufferGuard` explicit

2023-05-16 Thread via GitHub


sherman commented on PR #12290:
URL: https://github.com/apache/lucene/pull/12290#issuecomment-1549883140

   I think it's a good idea to replace this tricky way to flush any CPU caches, 
such as using lazy set, with an explicit full fence. The purpose of a full 
fence is to provide a reliable memory barrier that works consistently across 
different architectures.
   
   But if you're quite certain that the days of this class are numbered, then 
it's fine.
   
   (By the way, we are still planning to test it in our implementation of the 
directory interface).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549883736

   Please don't change public javadocs, so just make the classes sealed and 
keep documentation the same. Only remove runtime check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12296: Seal `IndexReaderContext`

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549897649

   Ready:
   * brought back package-private visibility;
   * applied the same change to `IndexReader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JarvisCraft commented on pull request #12296: Seal `IndexReader` and `IndexReaderContext`

2023-05-16 Thread via GitHub


JarvisCraft commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1549969315

   > Do you want to add a CHANGES.txt entry
   
   Done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] JerryChin opened a new pull request, #12299: GITHUB-12291: Skip blank lines from stopwords list.

2023-05-16 Thread via GitHub


JerryChin opened a new pull request, #12299:
URL: https://github.com/apache/lucene/pull/12299

   ### Description
   
   Hi team,
   
   This PR fixes #12291, it will skip any blank lines when loading stopwords 
with `WordlistLoader#getWordSet`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gus-asf opened a new issue, #12300: Clearer error message if Codec loading fails under a Security Manager

2023-05-16 Thread via GitHub


gus-asf opened a new issue, #12300:
URL: https://github.com/apache/lucene/issues/12300

   ### Description
   
   I just spent quite a while tracking down a problem wherein a SecurityManager 
was silently ignoring the META-INF/services/org.apache.lucene.codecs.Codec 
file. We should enhance the message in the event that a security manager is 
detected to alert the user to this possibility and the potentially surprising 
detail that no SecurityException is thrown by java when it fails to load a file 
via the classpath. Even more confusingly, java can load classes and then 
silently fail to load files from the same jar file that the class came from, so 
alerting the user to this complex situation seems worthwhile. (PR soon)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] gus-asf opened a new pull request, #12301: Improve error message if codec not found Fixes #12300

2023-05-16 Thread via GitHub


gus-asf opened a new pull request, #12301:
URL: https://github.com/apache/lucene/pull/12301

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #12302: vector API integration, plan B

2023-05-16 Thread via GitHub


rmuir opened a new issue, #12302:
URL: https://github.com/apache/lucene/issues/12302

   ### Description
   
   For years we have explored using the vector api to actually take advantage 
of SIMD units on the hardware. A couple of approaches have been attempted so 
far:
   1. try to coerce the hotspot superword autovectorization into giving better 
results: I think @jpountz may know details of this and it currently is limited 
to postings list decode, attempting to use 64-bit long. It is better than 
nothing, but not good enough. especially for stuff like vectors encoding and 
other bottlenecks.
   2. vectorize code and hope that the openjdk feature will "graduate" soon. 
This is not happening. Java is dying and becoming the next COBOL, in my 
opinion, as a result of ignoring the problem and delaying until "perfection". 
For evidence, simply look at vector api being in "6th incubation" with really 
no api changes happening, just waiting on other features (some wet dream 
project valhalla or whatever) which will prolly never land before we retire.
   3. hackedy-häcks: this is stuff to bypass the problem, such as prototype i 
wrote in frustration *two years ago* here: 
https://github.com/apache/lucene/pull/18 . It is dirty and insecure but it 
demonstrates we can potentially make stuff easier on the user and take 
advantage of the hardware.
   
   Currently, unless the user is extremely technical, they can't make use of 
the vector support their hardware has, which is terribly sad. If they have 
immense resources/funding/etc, they can fork lucene and patch the source code, 
and maintain a fork, hooking in incubator openjdk stuff, but that's too hard on 
users.
   
   I think we have to draw a line in the sand, basically we can not rely upon 
openjdk to be managed as a performant project, their decisions make no sense, 
we have to play a little less nicer and apply some hacks! otherwise give up and 
switch to a different programming language with better perf!
   
   So I'd suggest to look at the work @uschindler has done with mmap and the 
preview apis, and let's carve out a path where we use the vector api *IFF* the 
user opts in via the command-line.
   
   Proposal (depends entirely upon user's jdk version and supported flags): 
   1. user does nothing and runs lucene without special flags: they get a 
warning message logged to the console (once!) telling them they need to add 
some stuff to the commandline for best performance. something such as "vector 
falling back to scalar implementation: please add "--add-modules .x.y.z"
   2. user supplies that command-line argument and lucene is faster and uses 
correct incubating vector api associated with their jdk version.
   
   Actually the system @uschindler developed I think is the correct design for 
this, the only trick is that the incubating api is more difficult than the 
preview api. So we need more build system support, it could require more stuff 
to be downloaded or build to be slower. But I think its the right decision? 
   
   We don't want to have base64-encoded hackiness that is hard to maintain, at 
the same time, we need to give the users option to opt-in to actually making 
use their hardware. I think we should suffer the complexity to make this easy 
on them. It fucking sucks that openjdk makes this almost impossible, but we 
need to do it for our users. That's what being a library is all about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #12302: vector API integration, plan B

2023-05-16 Thread via GitHub


rmuir commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1550523577

   it goes without saying, but i think there are a couple obvious hotspots 
where we'd want to integrate. I realize containing the complexity may be 
difficult:
   
   1. the `VectorUtil` methods (dotProduct etc). These are kinda simplest and 
contained and the obvious way to prove the approach.
   2. the postings list FOR/PFOR decode.
   3. possibly the docvalues decode.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-16 Thread via GitHub


benwtrent commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1550531087

   @jbellis would this change effect query per second? There is a 20% drop 
since this commit in QPS in Lucene bench nightlies:
   
   https://home.apache.org/~mikemccand/lucenebench/2023.05.08.18.03.38.html
   
   Looking at the other searches, none slow down this significantly, so it 
doesn't seem to be the lucene util change.
   
   Did your testing for recall changes show any performance differences on 
single threaded QPS?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-16 Thread via GitHub


jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1550610764

   The testing I performed pre-merge showed an improvement, but I'm happy to 
look closer.  Is there a way to get a graph of performance over time out of 
that tool or is it manually pasting things into a spreadsheet?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jbellis commented on pull request #12255: allocate one NeighborQueue per search for results

2023-05-16 Thread via GitHub


jbellis commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1550623733

   Also, is it actually *this* commit, or is it the HashMap commit 
3c163745bb07aed51b52878de0666f1405696147 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on issue #12302: vector API integration, plan B

2023-05-16 Thread via GitHub


uschindler commented on issue #12302:
URL: https://github.com/apache/lucene/issues/12302#issuecomment-1550830310

   Hi @rmuir,
   I agree with your proposals. #12219 looks like a first start: Making the 
implementations "pluggable" it allows users to have a custom implementation as 
separate jar file / module. This could also be a solution to add an 
implementation jar for each java version as a separate Lucene module. For 
compiling them, we can use exactly the same aproach like for MMapDirectory. Of 
course we can also include the implementation classes into the main JAR file.
   
   About the compilation:
   - yes, we can extract APIJAR stubs for the verctor API. I would use separate 
APIJAR stubs for this, especially as it is different java modules. The reason 
is how the compilation with module pathcing works in my code.
   - actually compiling might be simpler with the APIJAR stubs for incubator 
modules: They live in a private module hidden by default in the jdk. So the 
signatures are not visible unless you explicitely add the module to compiler. 
So here we can use a very simple approach: Just compile the vectorclasses 
against incubator classes without any module-patching, just add the APIJAR as a 
normal dependency. As the package names are unique there's no problem! This is 
the main reason why the APIJAR files need to be separate from the foreign ones.
   - We can use the same approach like for MMAP at runtime: Use the MR-JAR 
mechanism to separate the classes and hide them from stupid IDEs (actually the 
current approach would not need a MR-JAR at all, it is just used to "hide" the 
Java 19, 20, 21 classes from stupid IDEs). Of course user has to pass the 
`--add-module` explicitely. I'd not add the module to module-info as this would 
cause warning for ALL endusers.
   - at a later stage when the vector API goes into preview phase, we can merge 
all this with the current MMAP code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12301: Improve error message if codec not found Fixes #12300

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12301:
URL: https://github.com/apache/lucene/pull/12301#discussion_r1196017521


##
lucene/core/src/java/org/apache/lucene/util/NamedSPILoader.java:
##
@@ -110,7 +112,14 @@ public S lookup(String name) {
 + "' does not exist."
 + "  You need to add the corresponding JAR file supporting this 
SPI to your classpath."
 + "  The current classpath supports the following names: "
-+ availableServices());
++ availableServices()
++ ((System.getSecurityManager() == null)
+? ""
+: " We have detected that a security manager is installed so 
it is also possible "
++ "that the services file in the jar containing the codec 
is inaccessible under the current "
++ "security policy. A FilePermission implying 'read' 
access to the "
++ "jar containing the 
META-INF/services/org.apache.lucene.codec.Codec file is necessary. "

Review Comment:
   Please don't mention "Codec" here, as the NamedSPILoader is used for many 
more than codecs. Please be generic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on a diff in pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.

2023-05-16 Thread via GitHub


uschindler commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196018225


##
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##
@@ -117,7 +120,10 @@ public static CharArraySet getWordSet(Reader reader, 
String comment, CharArraySe
   String word = null;
   while ((word = br.readLine()) != null) {
 if (word.startsWith(comment) == false) {
-  result.add(word.trim());
+  word = word.trim();
+  // skip blank lines
+  if (word.length() == 0) continue;

Review Comment:
   Use `word.isEmpty()`.



##
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader, 
CharArraySet result) throws
 try (BufferedReader br = getBufferedReader(reader)) {
   String word = null;
   while ((word = br.readLine()) != null) {
-result.add(word.trim());
+word = word.trim();
+// skip blank lines
+if (word.length() == 0) continue;

Review Comment:
   Use `word.isEmpty()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler merged pull request #12296: Seal `IndexReader` and `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler merged PR #12296:
URL: https://github.com/apache/lucene/pull/12296


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12296: Seal `IndexReader` and `IndexReaderContext`

2023-05-16 Thread via GitHub


uschindler commented on PR #12296:
URL: https://github.com/apache/lucene/pull/12296#issuecomment-1550842694

   Thanks @JarvisCraft, I merged it to main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] uschindler commented on pull request #12295: Use `instanceof` pattern-matching where possible

2023-05-16 Thread via GitHub


uschindler commented on PR #12295:
URL: https://github.com/apache/lucene/pull/12295#issuecomment-1550852181

   I tend to leave out the `equals()` methods. Many people in Lucene prefer to 
have a "step-by-step equals", as it is easier to read.
   
   The general problem is that we add "modern" equals and hashCode methods that 
do not use instanceof but compare classes directly. The examples you changed 
are old-style instanceof versions, which are buggy in class hiererachies 
(unless it is a final class).
   
   So I tend to change the `equals` methods with `instanceof` to follow the 
more modern approach consistently. This would be a separate PR. The problem of 
all those methods is that they were created by some IDE without human 
intervention (and all IDEs use different patterns, although modern ones use 
class comparisons instead of instanceof).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org