[I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]

2024-06-02 Thread via GitHub


ChrisHegarty opened a new issue, #13446:
URL: https://github.com/apache/lucene/issues/13446

   This doesn't reproduce for me, even with several 100 iterations of the test. 
But filing this issue to track any observed failures.
   
   ```
   ./gradlew test --tests TestIDVersionPostingsFormat.testGlobalVersions 
-Dtests.seed=6C67E4EA11B6B5D -Dtests.locale=ne-Deva-NP 
-Dtests.timezone=Asia/Dushanbe -Dtests.asserts=true -Dtests.file.encoding=UTF-8 
-Dtests.iters=100
   ```
   
   ```
   org.apache.lucene.sandbox.codecs.idversion.TestIDVersionPostingsFormat > 
testGlobalVersions FAILED
   java.lang.AssertionError: maxSeqNo must be greater or equal to 4204 but 
was 4203
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriterDeleteQueue.close(DocumentsWriterDeleteQueue.java:325)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:662)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:577)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:382)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:356)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:346)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:144)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:52)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167)
   at 
org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213)
   at 
org.apache.lucene.sandbox.codecs.idversion.TestIDVersionPostingsFormat.testGlobalVersions(TestIDVersionPostingsFormat.java:907)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]

2024-06-02 Thread via GitHub


benwtrent commented on issue #13446:
URL: https://github.com/apache/lucene/issues/13446#issuecomment-2143853261

   Closing as duplicate: https://github.com/apache/lucene/issues/13127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]

2024-06-02 Thread via GitHub


benwtrent closed issue #13446: TestIDVersionPostingsFormat.testGlobalVersions 
FAILED maxSeqNo must be greater or equal to 4204 but was 4203
URL: https://github.com/apache/lucene/issues/13446


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve BaseRangeFieldQueryTestCase#verify failure output [lucene]

2024-06-02 Thread via GitHub


github-actions[bot] commented on PR #13382:
URL: https://github.com/apache/lucene/pull/13382#issuecomment-2144077619

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Rewrite LongRange.ValueSourceQuery/MultiValueSourceQuery to FieldExistsQuery on max range [lucene]

2024-06-02 Thread via GitHub


github-actions[bot] commented on PR #13383:
URL: https://github.com/apache/lucene/pull/13383#issuecomment-2144077592

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1623687790


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java:
##
@@ -447,14 +446,8 @@ private static int commonCharacterPositionScore(String s1, 
String s2) {
 return commonScore;
   }
 
-  private static class Weighted> implements 
Comparable> {
-final T word;
-final int score;
-
-Weighted(T word, int score) {
-  this.word = word;
-  this.score = score;
-}
+  private record Weighted>(T word, int score)
+  implements Comparable> {
 
 @Override
 public boolean equals(Object o) {

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688124


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/TermAndBoost.java:
##
@@ -18,14 +18,13 @@
 
 import org.apache.lucene.util.BytesRef;
 
-/** Wraps a term and boost */
-public class TermAndBoost {
-  /** the term */
-  public final BytesRef term;
-
-  /** the boost */
-  public final float boost;
-
+/**
+ * Wraps a term and boost
+ *
+ * @param term the term
+ * @param boost the boost
+ */
+public record TermAndBoost(BytesRef term, float boost) {
   /** Creates a new TermAndBoost */
   public TermAndBoost(BytesRef term, float boost) {

Review Comment:
   Done!.I changed others such occurrences that I could find as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688408


##
lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/OrdsBlockTreeTermsWriter.java:
##
@@ -139,18 +139,17 @@ public final class OrdsBlockTreeTermsWriter extends 
FieldsConsumer {
   final PostingsWriterBase postingsWriter;
   final FieldInfos fieldInfos;
 
-  private static class FieldMetaData {
-public final FieldInfo fieldInfo;
-public final Output rootCode;
-public final long numTerms;
-public final long indexStartFP;
-public final long sumTotalTermFreq;
-public final long sumDocFreq;
-public final int docCount;
-public final BytesRef minTerm;
-public final BytesRef maxTerm;
-
-public FieldMetaData(
+  private record FieldMetaData(

Review Comment:
   addressed in the new revision



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688659


##
lucene/core/src/java/org/apache/lucene/document/NearestNeighbor.java:
##
@@ -34,20 +34,16 @@
 /** KNN search on top of 2D lat/lon indexed points. */
 class NearestNeighbor {
 
-  static class Cell implements Comparable {
-final int readerIndex;
-final byte[] minPacked;
-final byte[] maxPacked;
-final PointTree index;
-
-/**
- * The closest distance from a point in this cell to the query point, 
computed as a sort key
- * through {@link SloppyMath#haversinSortKey}. Note that this is an 
approximation to the closest
- * distance, and there could be a point in the cell that is closer.
- */
-final double distanceSortKey;
-
-public Cell(
+  /**
+   * @param distanceSortKey The closest distance from a point in this cell to 
the query point,
+   * computed as a sort key through {@link SloppyMath#haversinSortKey}. 
Note that this is an
+   * approximation to the closest distance, and there could be a point in 
the cell that is
+   * closer.
+   */
+  record Cell(
+  PointTree index, int readerIndex, byte[] minPacked, byte[] maxPacked, 
double distanceSortKey)
+  implements Comparable {
+Cell(

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on PR #13328:
URL: https://github.com/apache/lucene/pull/13328#issuecomment-2144096151

   Hi @uschindler , thanks for the review. I took another pass at the changes 
and pushed some commits addressing your comments. Please have a look when you 
get a chance. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix test failure on TestPoint#testEqualsAndHashCode [lucene]

2024-06-02 Thread via GitHub


easyice commented on PR #13433:
URL: https://github.com/apache/lucene/pull/13433#issuecomment-2144176070

   Thanks for reviewing, Mike!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix test failure on TestPoint#testEqualsAndHashCode [lucene]

2024-06-02 Thread via GitHub


easyice merged PR #13433:
URL: https://github.com/apache/lucene/pull/13433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-02 Thread via GitHub


vsop-479 commented on code in PR #13359:
URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##
@@ -307,6 +309,30 @@ private boolean setEOF() {
 return true;
   }
 
+  @Override
+  public void prepareSeekExact(BytesRef target) throws IOException {
+if (fr.index == null) {
+  throw new IllegalStateException("terms index was not loaded");
+}
+
+if (fr.size() == 0 || target.compareTo(fr.getMin()) < 0 || 
target.compareTo(fr.getMax()) > 0) {
+  return;
+}
+
+// TODO: should we try to reuse the current state of this terms enum when 
applicable?
+BytesRefFSTEnum indexEnum = new BytesRefFSTEnum<>(fr.index);
+InputOutput output = indexEnum.seekFloor(target);
+if (output != null) { // should never be null since we already checked 
against fr.getMin()?
+  final long code =
+  fr.readVLongOutput(
+  new ByteArrayDataInput(
+  output.output.bytes, output.output.offset, 
output.output.length));
+  final long fpSeek = code >>> 
Lucene90BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS;
+  initIndexInput();
+  in.prefetch(fpSeek, 1); // TODO: could we know the length of the block?

Review Comment:
   Actually I was mistaken, I thought it was `SegmentTermsEnum` we are talking 
about -- sorry;)
   
   > But for a non-leaf blocks, first all leaf blocks under them are written 
(in order), and THEN the non-leaf block is written only when we are done with 
all those recursions and writing any straggler terms that live in the non-leaf 
block.
   
   This means if we subtract the `fp` of a non-leaf block and its next, we will 
get its sub blocks' total length?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Convert more classes to record classes [lucene]

2024-06-02 Thread via GitHub


shubhamvishu commented on code in PR #13328:
URL: https://github.com/apache/lucene/pull/13328#discussion_r1623687790


##
lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java:
##
@@ -447,14 +446,8 @@ private static int commonCharacterPositionScore(String s1, 
String s2) {
 return commonScore;
   }
 
-  private static class Weighted> implements 
Comparable> {
-final T word;
-final int score;
-
-Weighted(T word, int score) {
-  this.word = word;
-  this.score = score;
-}
+  private record Weighted>(T word, int score)
+  implements Comparable> {
 
 @Override
 public boolean equals(Object o) {

Review Comment:
   Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-06-02 Thread via GitHub


vsop-479 commented on code in PR #13359:
URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##
@@ -307,6 +309,30 @@ private boolean setEOF() {
 return true;
   }
 
+  @Override
+  public void prepareSeekExact(BytesRef target) throws IOException {
+if (fr.index == null) {
+  throw new IllegalStateException("terms index was not loaded");
+}
+
+if (fr.size() == 0 || target.compareTo(fr.getMin()) < 0 || 
target.compareTo(fr.getMax()) > 0) {
+  return;
+}
+
+// TODO: should we try to reuse the current state of this terms enum when 
applicable?
+BytesRefFSTEnum indexEnum = new BytesRefFSTEnum<>(fr.index);
+InputOutput output = indexEnum.seekFloor(target);
+if (output != null) { // should never be null since we already checked 
against fr.getMin()?
+  final long code =
+  fr.readVLongOutput(
+  new ByteArrayDataInput(
+  output.output.bytes, output.output.offset, 
output.output.length));
+  final long fpSeek = code >>> 
Lucene90BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS;
+  initIndexInput();
+  in.prefetch(fpSeek, 1); // TODO: could we know the length of the block?

Review Comment:
   Actually I was mistaken, I thought it was `SegmentTermsEnum` we are talking 
about -- sorry about that;)
   
   > But for a non-leaf blocks, first all leaf blocks under them are written 
(in order), and THEN the non-leaf block is written only when we are done with 
all those recursions and writing any straggler terms that live in the non-leaf 
block.
   
   This means if we subtract the `fp` of a non-leaf block and its next, we will 
get its sub blocks' total length?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub


RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144392824

   Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) 
for the POC related to above issue to share my understanding. Please note that 
this is not the final PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub


RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144396439

   Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) 
for the POC related to above issue to share my understanding. Please note that 
this is not the final PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-06-02 Thread via GitHub


jpountz commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144402163

   > However, implementing this approach would lead to significant overhead on 
the client side (such as OpenSearch) both in the terms of code changes and 
operational overhead like metadata management.
   
   Can you give more details? The main difference that comes to mind is that 
using multiple `IndexWriter`s requires multiple `Directory`s as well and 
OpenSearch may have a strong assumption that there is a 1:1 mapping between 
shards and folders on disk. But this could be worked around with a filter 
`Directory` that flags each index file with a prefix that identifies the group 
that each index file belongs to?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org