Re: [I] Potential resource leakage in WordDictionary#loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


xcx1r3 commented on issue #14719:
URL: https://github.com/apache/lucene/issues/14719#issuecomment-2911474550

   if an exception occur, the close() statement will not be executed, leading 
to a potential resource leak.
   ```
   private int loadMainDataFromFile(String dctFilePath) throws IOException {
   int i, cnt, length, total = 0;
   // The file only counted 6763 Chinese characters plus 5 reserved slots 
3756~3760.
   // The 3756th is used (as a header) to store information.
   int[] buffer = new int[3];
   byte[] intBuffer = new byte[4];
   String tmpword;
   DataInputStream dctFile = new 
DataInputStream(Files.newInputStream(Paths.get(dctFilePath)));
   
   dctFile.close();


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use read advice consistently in the knn vector formats [lucene]

2025-05-27 Thread via GitHub


jimczi closed pull request #14076: Use read advice consistently in the knn 
vector formats
URL: https://github.com/apache/lucene/pull/14076


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Update ruff rule PATH103 to enforce modern os.makedirs usage [lucene]

2025-05-27 Thread via GitHub


rmuir merged PR #14710:
URL: https://github.com/apache/lucene/pull/14710


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Cache high-order bits of hashcode to speed up BytesRefHash [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14720:
URL: https://github.com/apache/lucene/pull/14720#issuecomment-2912485390

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Cache high-order bits of hashcode to speed up BytesRefHash [lucene]

2025-05-27 Thread via GitHub


bugmakerr opened a new pull request, #14720:
URL: https://github.com/apache/lucene/pull/14720

   ### Description
   
   
   
   This PR tries to utilize the unused part of the id to cache the high-order 
bits of the hashcode to speed up `BytesRefHash`. I used 1 million 16-byte UUIDs 
to [benchmark this 
change](https://github.com/bugmakerr/lucene/commit/43d2945be75acb2464c36ca1eac6067445687fe2),
 and the results are as follows.
   
![image](https://github.com/user-attachments/assets/da57ae65-366c-4751-af80-586448278258)
   
   The `baselineXXX` version is the current implementation, the `cachedXXX` 
version uses a separate array of ints to cache hash codes, and the candidate 
version is the implementation of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109429583


##
gradle/testing/defaults-tests.gradle:
##
@@ -145,6 +145,7 @@ allprojects {
   ':lucene:core',
   ':lucene:codecs',
   ":lucene:distribution.tests",
+  ':lucene:sandbox',

Review Comment:
   This line allows the sandbox module to call native libraries from tests 
(i.e. `--enable-native-access`), but tests were still being run earlier..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-05-27 Thread via GitHub


benwtrent merged PR #14527:
URL: https://github.com/apache/lucene/pull/14527


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-27 Thread via GitHub


jpountz commented on issue #14630:
URL: https://github.com/apache/lucene/issues/14630#issuecomment-2913813621

   It looks like nightly benchmarks only run every 2 days since May 13th, vs. 
every day before that. Is this because it now takes longer to run the benchmark?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Potential resource leakage in WordDictionary#loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


jpountz commented on issue #14719:
URL: https://github.com/apache/lucene/issues/14719#issuecomment-2913817299

   Good catch, would you like to submit a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Only run the labeller on the main branch of the lucene repository [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14721:
URL: https://github.com/apache/lucene/pull/14721#issuecomment-2913824556

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix comment above OnHeapHnswGraph#getNeighbors. [lucene]

2025-05-27 Thread via GitHub


msokolov commented on PR #14713:
URL: https://github.com/apache/lucene/pull/14713#issuecomment-2913825952

   Thanks @vsop-479 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix comment above OnHeapHnswGraph#getNeighbors. [lucene]

2025-05-27 Thread via GitHub


msokolov merged PR #14713:
URL: https://github.com/apache/lucene/pull/14713


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Only run the labeller on the main branch of the lucene repository [lucene]

2025-05-27 Thread via GitHub


dweiss opened a new pull request, #14721:
URL: https://github.com/apache/lucene/pull/14721

   This prevents this action from running on PR against forks, which I couldn't 
get to work (missing permissions for some reason).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-05-27 Thread via GitHub


benwtrent commented on code in PR #14527:
URL: https://github.com/apache/lucene/pull/14527#discussion_r2109471013


##
.gitignore:
##
@@ -32,3 +32,10 @@ __pycache__
 
 # SDKMAN
 .sdkmanrc
+
+# Java class files
+*.class
+
+# Ignore bin directories
+bin/
+**/bin/

Review Comment:
   If you think these need updated, could you do it in a separate PR? I would 
like to keep this change restricted to HNSW.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109488907


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_START;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexRead;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexSearch;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.hnsw.FlatVectorsReader;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.KnnCollector;
+import org.apache.lucene.store.DataAccessHint;
+import org.apache.lucene.store.FileTypeHint;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * Read per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsReader extends KnnVectorsReader {
+  private final FlatVectorsReader rawVectorsReader;
+  private final IndexInput meta, data;
+  private final Map indexMap;
+  private final Arena arena;
+  private boolean closed;
+
+  public FaissKnnVectorsReader(SegmentReadState state, FlatVectorsReader 
rawVectorsReader)
+  throws IOException {
+this.rawVectorsReader = rawVectorsReader;
+this.indexMap = new HashMap<>();
+this.arena = Arena.ofShared();
+this.closed = false;
+
+boolean failure = true;
+try {
+  meta =
+  openInput(
+  state,
+  META_EXTENSION,
+  META_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context);
+  data =
+  openInput(
+  state,
+  DATA_EXTENSION,
+  DATA_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context.withHints(FileTypeHint.DATA, 
DataAccessHint.RANDOM));
+
+  Map.Entry entry;
+  while ((entry = parseNextField(state)) != null) {
+this.indexMap.put(entry.getKey(), entry.getValue());
+  }
+
+  failure = false;
+} finally {
+  if (failure) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  @SuppressWarnings("SameParameterValue")
+  private IndexInput openInput(
+  SegmentReadState state,
+  String extension,
+  String codecName,
+  int versionStart,
+  int versionEnd,
+  IOContext context)
+  throws IOException {
+
+String fileName =
+IndexFileNames.segmentFileName(state.segmentInfo.name, 
state.segmentSuffix, extension);
+IndexInput input = state.directory.openInput(fileName, context);
+CodecUtil.checkIndexHeader(
+input, codecName, versionStart, versionEnd, state.segmentInfo.getId(), 
state.segmentSuffix);
+return input;
+  }
+
+  private Map.Entry parseNextField(SegmentReadState state) 
throws IOException {
+int fieldNumber = meta.readInt();
+if (fieldNumber == -1) {
+  return null;
+}
+
+FieldInfo fieldInfo = state.

Re: [PR] Fix resource leak in loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14726:
URL: https://github.com/apache/lucene/pull/14726#issuecomment-2914833524

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix resource leak in loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


xcx1r3 opened a new pull request, #14726:
URL: https://github.com/apache/lucene/pull/14726

   Use try-with-resources to auto-close DataInputStream
   ```
   try (DataInputStream dctFile = new 
DataInputStream(Files.newInputStream(Paths.get(dctFilePath {
 ...
   }


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Potential resource leakage in WordDictionary#loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


xcx1r3 commented on issue #14719:
URL: https://github.com/apache/lucene/issues/14719#issuecomment-2914834339

   #14726  sure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix resource leak in loadMainDataFromFile [lucene]

2025-05-27 Thread via GitHub


xcx1r3 closed pull request #14726: Fix resource leak in loadMainDataFromFile
URL: https://github.com/apache/lucene/pull/14726


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109735507


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_START;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexRead;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexSearch;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.hnsw.FlatVectorsReader;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.KnnCollector;
+import org.apache.lucene.store.DataAccessHint;
+import org.apache.lucene.store.FileTypeHint;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * Read per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsReader extends KnnVectorsReader {
+  private final FlatVectorsReader rawVectorsReader;
+  private final IndexInput meta, data;
+  private final Map indexMap;
+  private final Arena arena;
+  private boolean closed;
+
+  public FaissKnnVectorsReader(SegmentReadState state, FlatVectorsReader 
rawVectorsReader)
+  throws IOException {
+this.rawVectorsReader = rawVectorsReader;
+this.indexMap = new HashMap<>();
+this.arena = Arena.ofShared();
+this.closed = false;
+
+boolean failure = true;
+try {
+  meta =
+  openInput(
+  state,
+  META_EXTENSION,
+  META_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context);
+  data =
+  openInput(
+  state,
+  DATA_EXTENSION,
+  DATA_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context.withHints(FileTypeHint.DATA, 
DataAccessHint.RANDOM));
+
+  Map.Entry entry;
+  while ((entry = parseNextField(state)) != null) {
+this.indexMap.put(entry.getKey(), entry.getValue());
+  }
+
+  failure = false;
+} finally {
+  if (failure) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  @SuppressWarnings("SameParameterValue")
+  private IndexInput openInput(
+  SegmentReadState state,
+  String extension,
+  String codecName,
+  int versionStart,
+  int versionEnd,
+  IOContext context)
+  throws IOException {
+
+String fileName =
+IndexFileNames.segmentFileName(state.segmentInfo.name, 
state.segmentSuffix, extension);
+IndexInput input = state.directory.openInput(fileName, context);
+CodecUtil.checkIndexHeader(
+input, codecName, versionStart, versionEnd, state.segmentInfo.getId(), 
state.segmentSuffix);
+return input;
+  }
+
+  private Map.Entry parseNextField(SegmentReadState state) 
throws IOException {
+int fieldNumber = meta.readInt();
+if (fieldNumber == -1) {
+  return null;
+}
+
+FieldInfo fieldInfo = state.

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109760361


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.createIndex;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexWrite;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnFieldVectorsWriter;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatFieldVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatVectorsWriter;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.SegmentWriteState;
+import org.apache.lucene.index.Sorter;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.DocIdSet;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.hnsw.IntToIntFunction;
+
+/**
+ * Write per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsWriter extends KnnVectorsWriter {
+  private final String description, indexParams;
+  private final FlatVectorsWriter rawVectorsWriter;
+  private final IndexOutput meta, data;
+  private final Map> rawFields;
+  private boolean closed, finished;
+
+  public FaissKnnVectorsWriter(
+  String description,
+  String indexParams,
+  SegmentWriteState state,
+  FlatVectorsWriter rawVectorsWriter)
+  throws IOException {
+
+this.description = description;
+this.indexParams = indexParams;
+this.rawVectorsWriter = rawVectorsWriter;
+this.rawFields = new HashMap<>();
+this.closed = false;
+this.finished = false;
+
+boolean failure = true;
+try {
+  this.meta = openOutput(state, META_EXTENSION, META_CODEC_NAME);
+  this.data = openOutput(state, DATA_EXTENSION, DATA_CODEC_NAME);
+  failure = false;
+} finally {
+  if (failure) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  private IndexOutput openOutput(SegmentWriteState state, String extension, 
String codecName)
+  throws IOException {
+String fileName =
+IndexFileNames.segmentFileName(state.segmentInfo.name, 
state.segmentSuffix, extension);
+IndexOutput output = state.directory.createOutput(fileName, state.context);
+CodecUtil.writeIndexHeader(
+output, codecName, VERSION_CURRENT, state.segmentInfo.getId(), 
state.segmentSuffix);
+return output;
+  }
+
+  @Override
+  public void mergeOneField(FieldInfo fieldInfo, MergeState mergeState) throws 
IOException {
+rawVectorsWriter.mergeOneField(fieldInfo, mergeState);
+switch (fieldInfo.getVectorEncoding()) {
+  case BYTE ->
+  // TODO: Support using SQ8 quantization, see:
+  //  - https://github.com/opensearch-project/k-NN/pull/2425
+  throw new UnsupportedOperationException("Byte vectors not 
supported");
+  case FLOAT32 -> {
+FloatVectorValues merged =
+
KnnVectorsWriter.MergedVectorValues.mergeFloatVectorValues(fieldInfo, 
mergeState);
+writeFloatField(fieldInfo, merged, doc -> doc);
+  }
+}
+  }
+
+  @Override
+  public 

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109774033


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.createIndex;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexWrite;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnFieldVectorsWriter;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatFieldVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatVectorsWriter;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.SegmentWriteState;
+import org.apache.lucene.index.Sorter;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.DocIdSet;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.hnsw.IntToIntFunction;
+
+/**
+ * Write per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsWriter extends KnnVectorsWriter {
+  private final String description, indexParams;
+  private final FlatVectorsWriter rawVectorsWriter;
+  private final IndexOutput meta, data;
+  private final Map> rawFields;
+  private boolean closed, finished;
+
+  public FaissKnnVectorsWriter(
+  String description,
+  String indexParams,
+  SegmentWriteState state,
+  FlatVectorsWriter rawVectorsWriter)
+  throws IOException {
+
+this.description = description;
+this.indexParams = indexParams;
+this.rawVectorsWriter = rawVectorsWriter;
+this.rawFields = new HashMap<>();
+this.closed = false;
+this.finished = false;
+
+boolean failure = true;

Review Comment:
   Ah found it (#14633) -- will follow this..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109779193


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsWriter.java:
##
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.createIndex;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexWrite;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnFieldVectorsWriter;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatFieldVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatVectorsWriter;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.MergeState;
+import org.apache.lucene.index.SegmentWriteState;
+import org.apache.lucene.index.Sorter;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.DocIdSet;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.hnsw.IntToIntFunction;
+
+/**
+ * Write per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsWriter extends KnnVectorsWriter {
+  private final String description, indexParams;
+  private final FlatVectorsWriter rawVectorsWriter;
+  private final IndexOutput meta, data;
+  private final Map> rawFields;
+  private boolean closed, finished;
+
+  public FaissKnnVectorsWriter(
+  String description,
+  String indexParams,
+  SegmentWriteState state,
+  FlatVectorsWriter rawVectorsWriter)
+  throws IOException {
+
+this.description = description;
+this.indexParams = indexParams;
+this.rawVectorsWriter = rawVectorsWriter;
+this.rawFields = new HashMap<>();
+this.closed = false;
+this.finished = false;
+
+boolean failure = true;
+try {
+  this.meta = openOutput(state, META_EXTENSION, META_CODEC_NAME);
+  this.data = openOutput(state, DATA_EXTENSION, DATA_CODEC_NAME);
+  failure = false;
+} finally {
+  if (failure) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  private IndexOutput openOutput(SegmentWriteState state, String extension, 
String codecName)
+  throws IOException {
+String fileName =
+IndexFileNames.segmentFileName(state.segmentInfo.name, 
state.segmentSuffix, extension);
+IndexOutput output = state.directory.createOutput(fileName, state.context);
+CodecUtil.writeIndexHeader(
+output, codecName, VERSION_CURRENT, state.segmentInfo.getId(), 
state.segmentSuffix);
+return output;
+  }
+
+  @Override
+  public void mergeOneField(FieldInfo fieldInfo, MergeState mergeState) throws 
IOException {
+rawVectorsWriter.mergeOneField(fieldInfo, mergeState);
+switch (fieldInfo.getVectorEncoding()) {
+  case BYTE ->
+  // TODO: Support using SQ8 quantization, see:
+  //  - https://github.com/opensearch-project/k-NN/pull/2425
+  throw new UnsupportedOperationException("Byte vectors not 
supported");
+  case FLOAT32 -> {
+FloatVectorValues merged =
+
KnnVectorsWriter.MergedVectorValues.mergeFloatVectorValues(fieldInfo, 
mergeState);
+writeFloatField(fieldInfo, merged, doc -> doc);
+  }
+}
+  }
+
+  @Override
+  public 

Re: [PR] Cache high-order bits of hashcode to speed up BytesRefHash [lucene]

2025-05-27 Thread via GitHub


jpountz commented on code in PR #14720:
URL: https://github.com/apache/lucene/pull/14720#discussion_r2110084706


##
lucene/core/src/java/org/apache/lucene/util/BytesRefHash.java:
##
@@ -71,9 +72,13 @@ public BytesRefHash(ByteBlockPool pool) {
 
   /** Creates a new {@link BytesRefHash} */
   public BytesRefHash(ByteBlockPool pool, int capacity, BytesStartArray 
bytesStartArray) {
+if ((capacity & (capacity - 1)) != 0) {

Review Comment:
   Can you use `BitUtil#isZeroOrPowerOfTwo`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-05-27 Thread via GitHub


jpountz commented on PR #14714:
URL: https://github.com/apache/lucene/pull/14714#issuecomment-2913896479

   I wasn't aware of this indeed. OK for passing null then, I agree that there 
may be sub classes that rely on this API in the wild.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Arg001 - no violations found [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14724:
URL: https://github.com/apache/lucene/pull/14724#issuecomment-2914476284

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Arg001 - no violations found [lucene]

2025-05-27 Thread via GitHub


Mariah33 commented on PR #14724:
URL: https://github.com/apache/lucene/pull/14724#issuecomment-2914477522

   on wrong branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Clarify filter fields usage in javadocs [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14660:
URL: https://github.com/apache/lucene/pull/14660#issuecomment-2914507503

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] No ruff violation [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14725:
URL: https://github.com/apache/lucene/pull/14725#issuecomment-2914529256

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] No ruff violation [lucene]

2025-05-27 Thread via GitHub


Mariah33 opened a new pull request, #14725:
URL: https://github.com/apache/lucene/pull/14725

   ### Description
   
   Didn't find these ruff rules in the code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Arg001 - no violations found [lucene]

2025-05-27 Thread via GitHub


Mariah33 opened a new pull request, #14724:
URL: https://github.com/apache/lucene/pull/14724

   ### Description
   
   This rule was not found in the code.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Arg001 - no violations found [lucene]

2025-05-27 Thread via GitHub


Mariah33 closed pull request #14724: Arg001 - no violations found
URL: https://github.com/apache/lucene/pull/14724


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Apply minimal fix for ruff rule PATH103 using Path.resolve [lucene]

2025-05-27 Thread via GitHub


rmuir merged PR #14711:
URL: https://github.com/apache/lucene/pull/14711


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] deps(java): bump org.apache.groovy:groovy-all from 4.0.26 to 4.0.27 [lucene]

2025-05-27 Thread via GitHub


dependabot[bot] opened a new pull request, #14722:
URL: https://github.com/apache/lucene/pull/14722

   Bumps [org.apache.groovy:groovy-all](https://github.com/apache/groovy) from 
4.0.26 to 4.0.27.
   
   Commits
   
   See full diff in https://github.com/apache/groovy/commits";>compare view
   
   
   
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.groovy:groovy-all&package-manager=gradle&previous-version=4.0.26&new-version=4.0.27)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] deps(java): bump com.diffplug.spotless from 7.0.3 to 7.0.4 [lucene]

2025-05-27 Thread via GitHub


dependabot[bot] opened a new pull request, #14723:
URL: https://github.com/apache/lucene/pull/14723

   Bumps com.diffplug.spotless from 7.0.3 to 7.0.4.
   
   
   [![Dependabot compatibility 
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.diffplug.spotless&package-manager=gradle&previous-version=7.0.3&new-version=7.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
   
   Dependabot will resolve any conflicts with this PR as long as you don't 
alter it yourself. You can also trigger a rebase manually by commenting 
`@dependabot rebase`.
   
   [//]: # (dependabot-automerge-start)
   [//]: # (dependabot-automerge-end)
   
   ---
   
   
   Dependabot commands and options
   
   
   You can trigger Dependabot actions by commenting on this PR:
   - `@dependabot rebase` will rebase this PR
   - `@dependabot recreate` will recreate this PR, overwriting any edits that 
have been made to it
   - `@dependabot merge` will merge this PR after your CI passes on it
   - `@dependabot squash and merge` will squash and merge this PR after your CI 
passes on it
   - `@dependabot cancel merge` will cancel a previously requested merge and 
block automerging
   - `@dependabot reopen` will reopen this PR if it is closed
   - `@dependabot close` will close this PR and stop Dependabot recreating it. 
You can achieve the same result by closing it manually
   - `@dependabot show  ignore conditions` will show all of 
the ignore conditions of the specified dependency
   - `@dependabot ignore this major version` will close this PR and stop 
Dependabot creating any more for this major version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this minor version` will close this PR and stop 
Dependabot creating any more for this minor version (unless you reopen the PR 
or upgrade to it yourself)
   - `@dependabot ignore this dependency` will close this PR and stop 
Dependabot creating any more for this dependency (unless you reopen the PR or 
upgrade to it yourself)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump org.apache.groovy:groovy-all from 4.0.26 to 4.0.27 [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14722:
URL: https://github.com/apache/lucene/pull/14722#issuecomment-2914414329

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] deps(java): bump com.diffplug.spotless from 7.0.3 to 7.0.4 [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14723:
URL: https://github.com/apache/lucene/pull/14723#issuecomment-2914414463

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Apply minimal fix for ruff rule PATH103 using Path.resolve [lucene]

2025-05-27 Thread via GitHub


github-actions[bot] commented on PR #14711:
URL: https://github.com/apache/lucene/pull/14711#issuecomment-2914446223

   This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. 
If the PR doesn't need a changelog entry, then add the skip-changelog-check 
label to it and you will stop receiving this reminder on future updates to the 
PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109479695


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import java.io.IOException;
+import java.util.Locale;
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatVectorScorerUtil;
+import org.apache.lucene.codecs.hnsw.FlatVectorsFormat;
+import org.apache.lucene.codecs.lucene99.Lucene99FlatVectorsFormat;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.SegmentWriteState;
+
+/**
+ * A format which uses https://github.com/facebookresearch/faiss";>Faiss to create and
+ * search vector indexes, using {@link LibFaissC} to interact with the native 
library.
+ *
+ * A separate Faiss index is created per-segment, and uses the following 
files:
+ *
+ * 
+ *   .faissm (metadata file): stores field number, offset and 
length of actual
+ *   Faiss index in data file.
+ *   .faissd (data file): stores concatenated Faiss indexes 
for all fields.
+ *   All files required by {@link Lucene99FlatVectorsFormat} for storing 
raw vectors.
+ * 
+ *
+ * Note: Set the {@code $OMP_NUM_THREADS} environment variable to control 
internal threading.

Review Comment:
   Makes sense, I'll add it



##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_START;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexRead;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexSearch;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.hnsw.FlatVectorsReader;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.KnnCollector;
+import org.apache.lucene.store.DataAccessHint;
+import org.apache.lucene.store.FileTypeHint;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * Read per-segment Faiss indexes and ass

Re: [PR] Move HitQueue in TopScoreDocCollector to a LongHeap [lucene]

2025-05-27 Thread via GitHub


gf2121 commented on PR #14714:
URL: https://github.com/apache/lucene/pull/14714#issuecomment-2913036055

   Thanks for the suggestion!
   
   > It's a bit ugly to pass null as a HitQueue in the constructor of 
TopScoreDocCollector. Can we only keep method signatures on TopDocsCollector 
and move the current impls to some other class?
   
   FWIW passing a null PQ is mentioned in `TopScoreDocCollector`'s java doc 
https://github.com/apache/lucene/blob/6b3c3e4803dfe3edba75569e289fe492d8cc5cd2/lucene/core/src/java/org/apache/lucene/search/TopDocsCollector.java#L25-L28.
 
   I agree it is ugly to copy the large `topDocs(int start, int howMany)` so i 
was looking to extract PQ logics to a protected method, but i'm not sure if we 
should touch this public API class as this seems not to break the original 
intention of the design. In case you did not notice the java doc, i'd like to 
ask your suggestion again :)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-05-27 Thread via GitHub


jainankitk commented on PR #14413:
URL: https://github.com/apache/lucene/pull/14413#issuecomment-2913552519

   I submitted talk on this topic (`Profiling Concurrent Search in Lucene: A 
Deep Dive into Parallel Execution`) for ASF conference 
(https://communityovercode.org/schedule/) and it was selected. Would love to 
iterate and get this PR merged before that!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-05-27 Thread via GitHub


weizijun commented on code in PR #14527:
URL: https://github.com/apache/lucene/pull/14527#discussion_r2109476565


##
.gitignore:
##
@@ -32,3 +32,10 @@ __pycache__
 
 # SDKMAN
 .sdkmanrc
+
+# Java class files
+*.class
+
+# Ignore bin directories
+bin/
+**/bin/

Review Comment:
   Oh, sorry, that was a mistake, I'll delete it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109729290


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsReader.java:
##
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.DATA_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_CODEC_NAME;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.META_EXTENSION;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_CURRENT;
+import static 
org.apache.lucene.sandbox.codecs.faiss.FaissKnnVectorsFormat.VERSION_START;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexRead;
+import static org.apache.lucene.sandbox.codecs.faiss.LibFaissC.indexSearch;
+
+import java.io.IOException;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemorySegment;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.lucene.codecs.CodecUtil;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.hnsw.FlatVectorsReader;
+import org.apache.lucene.index.ByteVectorValues;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.FloatVectorValues;
+import org.apache.lucene.index.IndexFileNames;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.VectorSimilarityFunction;
+import org.apache.lucene.search.KnnCollector;
+import org.apache.lucene.store.DataAccessHint;
+import org.apache.lucene.store.FileTypeHint;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.Bits;
+import org.apache.lucene.util.IOUtils;
+
+/**
+ * Read per-segment Faiss indexes and associated metadata.
+ *
+ * @lucene.experimental
+ */
+final class FaissKnnVectorsReader extends KnnVectorsReader {
+  private final FlatVectorsReader rawVectorsReader;
+  private final IndexInput meta, data;
+  private final Map indexMap;
+  private final Arena arena;
+  private boolean closed;
+
+  public FaissKnnVectorsReader(SegmentReadState state, FlatVectorsReader 
rawVectorsReader)
+  throws IOException {
+this.rawVectorsReader = rawVectorsReader;
+this.indexMap = new HashMap<>();
+this.arena = Arena.ofShared();
+this.closed = false;
+
+boolean failure = true;
+try {
+  meta =
+  openInput(
+  state,
+  META_EXTENSION,
+  META_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context);
+  data =
+  openInput(
+  state,
+  DATA_EXTENSION,
+  DATA_CODEC_NAME,
+  VERSION_START,
+  VERSION_CURRENT,
+  state.context.withHints(FileTypeHint.DATA, 
DataAccessHint.RANDOM));
+
+  Map.Entry entry;
+  while ((entry = parseNextField(state)) != null) {
+this.indexMap.put(entry.getKey(), entry.getValue());
+  }
+
+  failure = false;
+} finally {
+  if (failure) {
+IOUtils.closeWhileHandlingException(this);
+  }
+}
+  }
+
+  @SuppressWarnings("SameParameterValue")
+  private IndexInput openInput(
+  SegmentReadState state,
+  String extension,
+  String codecName,
+  int versionStart,
+  int versionEnd,
+  IOContext context)
+  throws IOException {
+
+String fileName =
+IndexFileNames.segmentFileName(state.segmentInfo.name, 
state.segmentSuffix, extension);
+IndexInput input = state.directory.openInput(fileName, context);
+CodecUtil.checkIndexHeader(
+input, codecName, versionStart, versionEnd, state.segmentInfo.getId(), 
state.segmentSuffix);
+return input;
+  }
+
+  private Map.Entry parseNextField(SegmentReadState state) 
throws IOException {
+int fieldNumber = meta.readInt();
+if (fieldNumber == -1) {
+  return null;
+}
+
+FieldInfo fieldInfo = state.

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-27 Thread via GitHub


kaivalnp commented on code in PR #14178:
URL: https://github.com/apache/lucene/pull/14178#discussion_r2109499282


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/codecs/faiss/FaissKnnVectorsFormat.java:
##
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.codecs.faiss;
+
+import java.io.IOException;
+import java.util.Locale;
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.KnnVectorsReader;
+import org.apache.lucene.codecs.KnnVectorsWriter;
+import org.apache.lucene.codecs.hnsw.FlatVectorScorerUtil;
+import org.apache.lucene.codecs.hnsw.FlatVectorsFormat;
+import org.apache.lucene.codecs.lucene99.Lucene99FlatVectorsFormat;
+import org.apache.lucene.index.SegmentReadState;
+import org.apache.lucene.index.SegmentWriteState;
+
+/**
+ * A format which uses https://github.com/facebookresearch/faiss";>Faiss to create and
+ * search vector indexes, using {@link LibFaissC} to interact with the native 
library.
+ *
+ * A separate Faiss index is created per-segment, and uses the following 
files:
+ *
+ * 
+ *   .faissm (metadata file): stores field number, offset and 
length of actual
+ *   Faiss index in data file.
+ *   .faissd (data file): stores concatenated Faiss indexes 
for all fields.
+ *   All files required by {@link Lucene99FlatVectorsFormat} for storing 
raw vectors.
+ * 
+ *
+ * Note: Set the {@code $OMP_NUM_THREADS} environment variable to control 
internal threading.
+ *
+ * @lucene.experimental

Review Comment:
   I do see some 
[references](https://github.com/search?q=repo%3Afacebookresearch%2Ffaiss%20compatibility&type=code)
 of backwards compatibility, and an [old 
comment](https://github.com/facebookresearch/faiss/issues/2373#issuecomment-1175895577)
 which says that newer versions of Faiss can read older indexes -- but I 
couldn't find documentation for it..
   
   Further, we may change some internals of the codec making it incompatible 
with earlier versions -- but I'll add a comment saying there's no guarantee 
today, and a TODO to figure that out



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Create a bot to add milestones to new PRs [lucene]

2025-05-27 Thread via GitHub


stefanvodita commented on issue #14190:
URL: https://github.com/apache/lucene/issues/14190#issuecomment-2913105749

   #14697 is a nice example of the bot modifying the milestone after we moved 
the CHANGES entry to a different section!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-05-27 Thread via GitHub


weizijun commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2914720940

   Here are the statistics of 100w hnsw graphs, with m = 16 and ef = 100:
   Level count = 5:
   ```
   level: 0, node count: 100
   level: 1, node count: 62835
   level: 2, node count: 3926
   level: 3, node count: 235
   level: 4, node count: 12
   ```
   
   Average number of neighbors per level:
   ```
   level: 0, avg neighbor count: 8.026909
   level: 1, avg neighbor count: 7.539985676772499
   level: 2, avg neighbor count: 8.596535914416709
   level: 3, avg neighbor count: 8.353191489361702
   level: 4, avg neighbor count: 3.8335
   ```
   
   The detail of neighbor count:
   level: 0
   ```
   level: 0, neighbor count: 1, node count: 141664
   level: 0, neighbor count: 2, node count: 130484
   level: 0, neighbor count: 3, node count: 111485
   level: 0, neighbor count: 4, node count: 91141
   level: 0, neighbor count: 5, node count: 72929
   level: 0, neighbor count: 6, node count: 59030
   level: 0, neighbor count: 7, node count: 47796
   level: 0, neighbor count: 8, node count: 39864
   level: 0, neighbor count: 9, node count: 33320
   level: 0, neighbor count: 10, node count: 27923
   level: 0, neighbor count: 11, node count: 23972
   level: 0, neighbor count: 12, node count: 20777
   level: 0, neighbor count: 13, node count: 17986
   level: 0, neighbor count: 14, node count: 15510
   level: 0, neighbor count: 15, node count: 13725
   level: 0, neighbor count: 16, node count: 12296
   level: 0, neighbor count: 17, node count: 10947
   level: 0, neighbor count: 18, node count: 9826
   level: 0, neighbor count: 19, node count: 8765
   level: 0, neighbor count: 20, node count: 7947
   level: 0, neighbor count: 21, node count: 7348
   level: 0, neighbor count: 22, node count: 6639
   level: 0, neighbor count: 23, node count: 6045
   level: 0, neighbor count: 24, node count: 5413
   level: 0, neighbor count: 25, node count: 5101
   level: 0, neighbor count: 26, node count: 4569
   level: 0, neighbor count: 27, node count: 4105
   level: 0, neighbor count: 28, node count: 3965
   level: 0, neighbor count: 29, node count: 3564
   level: 0, neighbor count: 30, node count: 3330
   level: 0, neighbor count: 31, node count: 3019
   level: 0, neighbor count: 32, node count: 49515
   ```
   level: 1
   ```
   level: 1, neighbor count: 1, node count: 6760
   level: 1, neighbor count: 2, node count: 6707
   level: 1, neighbor count: 3, node count: 6127
   level: 1, neighbor count: 4, node count: 5277
   level: 1, neighbor count: 5, node count: 4420
   level: 1, neighbor count: 6, node count: 3805
   level: 1, neighbor count: 7, node count: 3321
   level: 1, neighbor count: 8, node count: 2827
   level: 1, neighbor count: 9, node count: 2502
   level: 1, neighbor count: 10, node count: 2093
   level: 1, neighbor count: 11, node count: 1849
   level: 1, neighbor count: 12, node count: 1645
   level: 1, neighbor count: 13, node count: 1521
   level: 1, neighbor count: 14, node count: 1257
   level: 1, neighbor count: 15, node count: 1163
   level: 1, neighbor count: 16, node count: 11561
   ```
   level: 2
   ```
   level: 2, neighbor count: 1, node count: 298
   level: 2, neighbor count: 2, node count: 302
   level: 2, neighbor count: 3, node count: 309
   level: 2, neighbor count: 4, node count: 278
   level: 2, neighbor count: 5, node count: 267
   level: 2, neighbor count: 6, node count: 251
   level: 2, neighbor count: 7, node count: 196
   level: 2, neighbor count: 8, node count: 209
   level: 2, neighbor count: 9, node count: 178
   level: 2, neighbor count: 10, node count: 159
   level: 2, neighbor count: 11, node count: 153
   level: 2, neighbor count: 12, node count: 134
   level: 2, neighbor count: 13, node count: 125
   level: 2, neighbor count: 14, node count: 75
   level: 2, neighbor count: 15, node count: 106
   level: 2, neighbor count: 16, node count: 886
   ```
   level: 3
   ```
   level: 3, neighbor count: 1, node count: 18
   level: 3, neighbor count: 2, node count: 14
   level: 3, neighbor count: 3, node count: 11
   level: 3, neighbor count: 4, node count: 14
   level: 3, neighbor count: 5, node count: 17
   level: 3, neighbor count: 6, node count: 20
   level: 3, neighbor count: 7, node count: 19
   level: 3, neighbor count: 8, node count: 11
   level: 3, neighbor count: 9, node count: 23
   level: 3, neighbor count: 10, node count: 12
   level: 3, neighbor count: 11, node count: 12
   level: 3, neighbor count: 12, node count: 9
   level: 3, neighbor count: 13, node count: 7
   level: 3, neighbor count: 14, node count: 10
   level: 3, neighbor count: 15, node count: 4
   level: 3, neighbor count: 16, node count: 34
   ```
   level: 4
   ```
   level: 4, neighbor count: 1, node count: 1
   level: 4, neighbor count: 2, node count: 2
   level: 4, neighbor count: 3, node count: 5
   level: 4, neighbor count: 5, node count: 2
   level: 4, neighbor count: 8, node count: 2