[GitHub] [lucene] benwtrent commented on pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


benwtrent commented on PR #12529:
URL: https://github.com/apache/lucene/pull/12529#issuecomment-1711551199

   @msokolov what say you? It seems like encapsulating random vector seeking & 
scoring into one thing makes the code simpler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jpountz commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1319777844


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java:
##
@@ -423,8 +422,12 @@ public RandomAccessVectorValues copy() {
 
 @Override
 public float[] vectorValue(int targetOrd) throws IOException {
+  if (lastOrd == targetOrd) {
+return value;
+  }

Review Comment:
   I'm curious, is it common to read the same ord multiple times in a row?



##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/Word2VecSynonymProvider.java:
##
@@ -23,14 +23,11 @@
 import java.io.IOException;
 import java.util.LinkedList;
 import java.util.List;
-import org.apache.lucene.index.VectorEncoding;
 import org.apache.lucene.index.VectorSimilarityFunction;
 import org.apache.lucene.search.KnnCollector;
 import org.apache.lucene.search.TopDocs;
 import org.apache.lucene.util.BytesRef;
-import org.apache.lucene.util.hnsw.HnswGraphBuilder;
-import org.apache.lucene.util.hnsw.HnswGraphSearcher;
-import org.apache.lucene.util.hnsw.OnHeapHnswGraph;
+import org.apache.lucene.util.hnsw.*;

Review Comment:
   Nit: We prefer to avoid star imports.



##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import java.io.IOException;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/** A provider that creates {@link RandomVectorScorer} from an ordinal. */
+public interface RandomVectorScorerProvider {

Review Comment:
   Naming nit: the naming convention seems to be to call things "Supplier"s 
rather than "Provider"s.



##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java:
##
@@ -60,13 +61,17 @@ public int size() {
 
   @Override
   public byte[] vectorValue(int targetOrd) throws IOException {
-readValue(targetOrd);
+if (lastOrd != targetOrd) {
+  readValue(targetOrd);
+  lastOrd = targetOrd;
+}
 return binaryValue;
   }
 
   private void readValue(int targetOrd) throws IOException {
 slice.seek((long) targetOrd * byteSize);
 slice.readBytes(byteBuffer.array(), byteBuffer.arrayOffset(), byteSize);
+lastOrd = targetOrd;

Review Comment:
   we already set `lastOrd` in `vectorValue()`, so it's not needed here?



##
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene94/Lucene94HnswVectorsWriter.java:
##
@@ -47,12 +47,8 @@
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.InfoStream;
 import org.apache.lucene.util.RamUsageEstimator;
-import org.apache.lucene.util.hnsw.HnswGraph;
+import org.apache.lucene.util.hnsw.*;

Review Comment:
   nit: star import



##
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene92/Lucene92HnswVectorsWriter.java:
##
@@ -34,16 +34,12 @@
 import org.apache.lucene.index.FloatVectorValues;
 import org.apache.lucene.index.IndexFileNames;
 import org.apache.lucene.index.SegmentWriteState;
-import org.apache.lucene.index.VectorEncoding;
 import org.apache.lucene.index.VectorSimilarityFunction;
 import org.apache.lucene.search.DocIdSetIterator;
 import org.apache.lucene.store.IndexInput;
 import org.apache.lucene.store.IndexOutput;
 import org.apache.lucene.util.IOUtils;
-import org.apache.lucene.util.hnsw.HnswGraphBuilder;
-import org.apache.lucene.util.hnsw.NeighborArray;
-import org.apache.lucene.util.hnsw.OnHeapHnswGraph;
-import org.apache.lucene.util.hnsw.RandomAccessVectorValues;
+import org.apache.lucene.util.hnsw.*;

Review Comment:
   nit: star import



##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * 

[GitHub] [lucene] mikemccand commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

2023-09-08 Thread via GitHub


mikemccand commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711608900

   Digging into this a bit, I think I found some silly performance bugs in our 
current FST impl:
 * We seem to create a `PagedGrowableWriter` with [page size 128 MB 
here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L34),
 meaning even when building a small FST, we are allocating at least 128 MB 
pages?
 * When we rehash, we create a new `PagedGrowableWriter`, with too small 
estimated `bitsRequired` [since we pass `count` (the number of nodes in the 
hash) instead of the most recently added `long 
node`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java#L182).
  We are actually storing the `long node` values, so it really should be `node` 
not `count`.  The effect of this is we make `PagedGrowableWriter` work harder 
than necessary to reallocate when we store the next `node` that doesn't fit in 
that `bitsRequired`.
   
   I'll try to get the LRU hash working, but if that takes too long, we should 
separately fix these performance bugs (if I'm right that these are really 
bugs!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub


javanna commented on PR #12544:
URL: https://github.com/apache/lucene/pull/12544#issuecomment-1711628290

   thanks @jpountz !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub


javanna merged PR #12544:
URL: https://github.com/apache/lucene/pull/12544


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

2023-09-08 Thread via GitHub


dweiss commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711706053

   With regard to automata/ FSTs - they're nearly the same thing, conceptually. 
Automata are logically transducers producing a constant epsilon value (no 
value). This knowledge can be used to make them smaller but they're the same 
animal, really.
   
   The root of the automaton/fst difference in Lucene is historically the 
source of code: brics package for constructing arbitrary automata, Daciuk/Mihov 
algorithm implementing minimal (at least at first) FST construction directly 
from sorted data (no intermediate "unoptimized" transitions).
   
   > Non-minimal FST means the index is a wee bit bigger, and perhaps lookups 
through the FST are a bit slower since we must have more bytes hot / fewer 
bytes cache local. But it's likely these effects are miniscule relative to the 
RAM savings during construction.
   
   If we allow non-optimal "transition graphs" then in theory we could also 
build FSTs in parallel: just build them independently from different prefix 
blocks. Yes, it wouldn't be optimal, but it if we don't care then so be it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1319990727


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java:
##
@@ -423,8 +422,12 @@ public RandomAccessVectorValues copy() {
 
 @Override
 public float[] vectorValue(int targetOrd) throws IOException {
+  if (lastOrd == targetOrd) {
+return value;
+  }

Review Comment:
   Not on normal search where only the query vector is used multiple times but 
it happens frequently when we build the index during the diversification phase. 
There, we compare single ordinals with all their neighbors sequentially so 
instead of having the optimisation in the graph I moved it to the reader since 
it's straightforward to add and shouldn't add any overhead to check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r132915


##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import java.io.IOException;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/** A provider that creates {@link RandomVectorScorer} from an ordinal. */
+public interface RandomVectorScorerProvider {
+  /**
+   * This creates a {@link RandomVectorScorer} for scoring random nodes in 
batches against the given
+   * ordinal.
+   *
+   * @param ord the ordinal of the node to compare
+   * @return a new {@link RandomVectorScorer}
+   */
+  RandomVectorScorer scorer(int ord) throws IOException;
+
+  /**
+   * Creates a {@link RandomVectorScorerProvider} to compare float vectors.
+   *
+   * WARNING: The {@link RandomAccessVectorValues} given can contain 
stateful buffers. Avoid
+   * using it after calling this function. If you plan to use it again outside 
the returned {@link
+   * RandomVectorScorer}, think about passing a copied version ({@link
+   * RandomAccessVectorValues#copy}).
+   *
+   * @param vectors the underlying storage for vectors
+   * @param similarityFunction the similarity function to score vectors
+   */
+  static RandomVectorScorerProvider createFloats(
+  final RandomAccessVectorValues vectors,
+  final VectorSimilarityFunction similarityFunction)
+  throws IOException {
+final RandomAccessVectorValues vectorsCopy = vectors.copy();
+return queryOrd ->
+(RandomVectorScorer)
+cand ->
+similarityFunction.compare(
+vectors.vectorValue(queryOrd), 
vectorsCopy.vectorValue(cand));

Review Comment:
   I'm trying limit the number of copies we make. In this model, we only make 
one copy for the entire supplier. We depend on the fact that when we call 
vectors.vectorValue twice in a row with the same order, it will use the 
previous value.
   
   Your suggestion would result in making a copy every time the supplier 
creates a scorer. Since we create a scorer every time we diversify a node, I 
believe this would have a noticeable impact.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004511


##
lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java:
##
@@ -60,13 +61,17 @@ public int size() {
 
   @Override
   public byte[] vectorValue(int targetOrd) throws IOException {
-readValue(targetOrd);
+if (lastOrd != targetOrd) {
+  readValue(targetOrd);
+  lastOrd = targetOrd;
+}
 return binaryValue;
   }
 
   private void readValue(int targetOrd) throws IOException {
 slice.seek((long) targetOrd * byteSize);
 slice.readBytes(byteBuffer.array(), byteBuffer.arrayOffset(), byteSize);
+lastOrd = targetOrd;

Review Comment:
   thanks, addressed in 
https://github.com/apache/lucene/pull/12529/commits/31d8b549e62a288358c58cd666767d09bf17f65c



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004724


##
lucene/core/src/java/org/apache/lucene/util/hnsw/RandomVectorScorerProvider.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.util.hnsw;
+
+import java.io.IOException;
+import org.apache.lucene.index.VectorSimilarityFunction;
+
+/** A provider that creates {@link RandomVectorScorer} from an ordinal. */
+public interface RandomVectorScorerProvider {

Review Comment:
   Addressed in 
https://github.com/apache/lucene/pull/12529/commits/31d8b549e62a288358c58cd666767d09bf17f65c



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jimczi commented on a diff in pull request #12529: Introduce a random vector scorer in HNSW builder/searcher

2023-09-08 Thread via GitHub


jimczi commented on code in PR #12529:
URL: https://github.com/apache/lucene/pull/12529#discussion_r1320004018


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene91/Lucene91HnswVectorsReader.java:
##
@@ -42,9 +41,7 @@
 import org.apache.lucene.util.Bits;
 import org.apache.lucene.util.IOUtils;
 import org.apache.lucene.util.RamUsageEstimator;
-import org.apache.lucene.util.hnsw.HnswGraph;
-import org.apache.lucene.util.hnsw.HnswGraphSearcher;
-import org.apache.lucene.util.hnsw.RandomAccessVectorValues;
+import org.apache.lucene.util.hnsw.*;

Review Comment:
   I have to remove this automatic rule from my idea. I upgraded to a new 
version and forgot to copy my previous settings. Thanks for spotting!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #12489: Add support for recursive graph bisection.

2023-09-08 Thread via GitHub


mikemccand commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712097368

   @jpountz did you measure any change to index size with the reordered docids?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #12489: Add support for recursive graph bisection.

2023-09-08 Thread via GitHub


jpountz commented on PR #12489:
URL: https://github.com/apache/lucene/pull/12489#issuecomment-1712166542

   I did. My wikimedium file is sorted by title, which already gives some 
compression compared to random ordering. Disappointedly, recursive graph 
bisection only improved compression of postings (doc) by 1.5%. It significantly 
hurts stored fields though, I suspect it's because the `title` field is stored, 
and stored fields take advantage of splits of the same article being next to 
one another.
   
   | File | before (MB) | after (MB) |
   | - | - | - |
   | terms (tim) | 307 |315 |
   | postings (doc) | 1706 | 1685 |
   | positions (pos) | 2563 | 2540 |
   | points (kdd) | 122 | 126 |
   | doc values (dvd) | 686 | 693 |
   | stored fields (fdt) | 255 | 364 |
   | norms (nvd) | 20 | 20 |
   | total | 5664 |5747 |
   
   It gave me doubts whether the algorithm was correctly implemented in the 
beginning, but the query speedups suggest it is not completely wrong.
   
   I should run on wikibigall too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12544: Close index readers in tests

2023-09-08 Thread via GitHub


javanna opened a new pull request, #12544:
URL: https://github.com/apache/lucene/pull/12544

   There are a few places where tests don't close index readers. This has not 
caused problems so far, but it becomes an issue when the reader gets an 
executor, because its shutdown happens as a closing listener of the reader. 
This has become more evident since we now offload sequential execution to the 
executor. If there's an executor, but it's never used, no threads are created, 
and no threads are leaked. If we do use the executor, and the reader is not 
closed, the test leaks threads.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub


jpountz commented on code in PR #12544:
URL: https://github.com/apache/lucene/pull/12544#discussion_r1319678371


##
lucene/core/src/test/org/apache/lucene/search/TestSort.java:
##
@@ -849,11 +849,10 @@ public void testMultiSort() throws IOException {
   }
 
   public void testRewrite() throws IOException {
-try (Directory dir = newDirectory()) {
-  RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
-  IndexSearcher searcher = newSearcher(writer.getReader());
-  writer.close();
-
+try (Directory dir = newDirectory();
+RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
+DirectoryReader reader = writer.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);

Review Comment:
   Oh good catch. I think `MockDirectoryWrapper` would generally catch such 
problems, except here because the index is empty, so no files get open?



##
lucene/monitor/src/test/org/apache/lucene/monitor/TestForceNoBulkScoringQuery.java:
##
@@ -60,18 +60,18 @@ public void testRewrite() throws IOException {
   iw.addDocument(doc);
   iw.commit();
 
-  IndexReader reader = DirectoryReader.open(dir);
+  try (IndexReader reader = DirectoryReader.open(dir)) {

Review Comment:
   Let's use a `newDirectory()` instead of `new ByteBuffersDirectory()` to get 
additional checks enabled?



##
lucene/queries/src/test/org/apache/lucene/queries/payloads/TestPayloadSpans.java:
##
@@ -60,15 +60,22 @@ public class TestPayloadSpans extends LuceneTestCase {
   protected IndexReader indexReader;
   private IndexReader closeIndexReader;
   private Directory directory;
+  private PayloadHelper helper;
 
   @Override
   public void setUp() throws Exception {
 super.setUp();
-PayloadHelper helper = new PayloadHelper();
+helper = new PayloadHelper();
 searcher = helper.setUp(random(), similarity, 1000);
 indexReader = searcher.getIndexReader();
   }
 
+  @Override
+  public void tearDown() throws Exception {
+super.tearDown();
+helper.tearDown();

Review Comment:
   nit: do `helper.tearDown()` before `super.tearDown()`, ie. release resources 
that are specific to this test case before other resources?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #12544: Close index readers in tests

2023-09-08 Thread via GitHub


javanna commented on code in PR #12544:
URL: https://github.com/apache/lucene/pull/12544#discussion_r1319723656


##
lucene/core/src/test/org/apache/lucene/search/TestSort.java:
##
@@ -849,11 +849,10 @@ public void testMultiSort() throws IOException {
   }
 
   public void testRewrite() throws IOException {
-try (Directory dir = newDirectory()) {
-  RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
-  IndexSearcher searcher = newSearcher(writer.getReader());
-  writer.close();
-
+try (Directory dir = newDirectory();
+RandomIndexWriter writer = new RandomIndexWriter(random(), dir);
+DirectoryReader reader = writer.getReader()) {
+  IndexSearcher searcher = newSearcher(reader);

Review Comment:
   I am not too sure, I caught this by creating an executor at all times. Then 
all tests that use newSearcher and have a cache reader helper need their reader 
closed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org