[I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]
ChrisHegarty opened a new issue, #13446: URL: https://github.com/apache/lucene/issues/13446 This doesn't reproduce for me, even with several 100 iterations of the test. But filing this issue to track any observed failures. ``` ./gradlew test --tests TestIDVersionPostingsFormat.testGlobalVersions -Dtests.seed=6C67E4EA11B6B5D -Dtests.locale=ne-Deva-NP -Dtests.timezone=Asia/Dushanbe -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.iters=100 ``` ``` org.apache.lucene.sandbox.codecs.idversion.TestIDVersionPostingsFormat > testGlobalVersions FAILED java.lang.AssertionError: maxSeqNo must be greater or equal to 4204 but was 4203 at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriterDeleteQueue.close(DocumentsWriterDeleteQueue.java:325) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:662) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:577) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:382) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:356) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:346) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:144) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:52) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) at org.apache.lucene.sandbox.codecs.idversion.TestIDVersionPostingsFormat.testGlobalVersions(TestIDVersionPostingsFormat.java:907) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]
benwtrent commented on issue #13446: URL: https://github.com/apache/lucene/issues/13446#issuecomment-2143853261 Closing as duplicate: https://github.com/apache/lucene/issues/13127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 [lucene]
benwtrent closed issue #13446: TestIDVersionPostingsFormat.testGlobalVersions FAILED maxSeqNo must be greater or equal to 4204 but was 4203 URL: https://github.com/apache/lucene/issues/13446 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve BaseRangeFieldQueryTestCase#verify failure output [lucene]
github-actions[bot] commented on PR #13382: URL: https://github.com/apache/lucene/pull/13382#issuecomment-2144077619 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Rewrite LongRange.ValueSourceQuery/MultiValueSourceQuery to FieldExistsQuery on max range [lucene]
github-actions[bot] commented on PR #13383: URL: https://github.com/apache/lucene/pull/13383#issuecomment-2144077592 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on code in PR #13328: URL: https://github.com/apache/lucene/pull/13328#discussion_r1623687790 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java: ## @@ -447,14 +446,8 @@ private static int commonCharacterPositionScore(String s1, String s2) { return commonScore; } - private static class Weighted> implements Comparable> { -final T word; -final int score; - -Weighted(T word, int score) { - this.word = word; - this.score = score; -} + private record Weighted>(T word, int score) + implements Comparable> { @Override public boolean equals(Object o) { Review Comment: Removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on code in PR #13328: URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688124 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/word2vec/TermAndBoost.java: ## @@ -18,14 +18,13 @@ import org.apache.lucene.util.BytesRef; -/** Wraps a term and boost */ -public class TermAndBoost { - /** the term */ - public final BytesRef term; - - /** the boost */ - public final float boost; - +/** + * Wraps a term and boost + * + * @param term the term + * @param boost the boost + */ +public record TermAndBoost(BytesRef term, float boost) { /** Creates a new TermAndBoost */ public TermAndBoost(BytesRef term, float boost) { Review Comment: Done!.I changed others such occurrences that I could find as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on code in PR #13328: URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688408 ## lucene/codecs/src/java/org/apache/lucene/codecs/blocktreeords/OrdsBlockTreeTermsWriter.java: ## @@ -139,18 +139,17 @@ public final class OrdsBlockTreeTermsWriter extends FieldsConsumer { final PostingsWriterBase postingsWriter; final FieldInfos fieldInfos; - private static class FieldMetaData { -public final FieldInfo fieldInfo; -public final Output rootCode; -public final long numTerms; -public final long indexStartFP; -public final long sumTotalTermFreq; -public final long sumDocFreq; -public final int docCount; -public final BytesRef minTerm; -public final BytesRef maxTerm; - -public FieldMetaData( + private record FieldMetaData( Review Comment: addressed in the new revision -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on code in PR #13328: URL: https://github.com/apache/lucene/pull/13328#discussion_r1623688659 ## lucene/core/src/java/org/apache/lucene/document/NearestNeighbor.java: ## @@ -34,20 +34,16 @@ /** KNN search on top of 2D lat/lon indexed points. */ class NearestNeighbor { - static class Cell implements Comparable { -final int readerIndex; -final byte[] minPacked; -final byte[] maxPacked; -final PointTree index; - -/** - * The closest distance from a point in this cell to the query point, computed as a sort key - * through {@link SloppyMath#haversinSortKey}. Note that this is an approximation to the closest - * distance, and there could be a point in the cell that is closer. - */ -final double distanceSortKey; - -public Cell( + /** + * @param distanceSortKey The closest distance from a point in this cell to the query point, + * computed as a sort key through {@link SloppyMath#haversinSortKey}. Note that this is an + * approximation to the closest distance, and there could be a point in the cell that is + * closer. + */ + record Cell( + PointTree index, int readerIndex, byte[] minPacked, byte[] maxPacked, double distanceSortKey) + implements Comparable { +Cell( Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on PR #13328: URL: https://github.com/apache/lucene/pull/13328#issuecomment-2144096151 Hi @uschindler , thanks for the review. I took another pass at the changes and pushed some commits addressing your comments. Please have a look when you get a chance. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix test failure on TestPoint#testEqualsAndHashCode [lucene]
easyice commented on PR #13433: URL: https://github.com/apache/lucene/pull/13433#issuecomment-2144176070 Thanks for reviewing, Mike! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix test failure on TestPoint#testEqualsAndHashCode [lucene]
easyice merged PR #13433: URL: https://github.com/apache/lucene/pull/13433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; } + @Override + public void prepareSeekExact(BytesRef target) throws IOException { +if (fr.index == null) { + throw new IllegalStateException("terms index was not loaded"); +} + +if (fr.size() == 0 || target.compareTo(fr.getMin()) < 0 || target.compareTo(fr.getMax()) > 0) { + return; +} + +// TODO: should we try to reuse the current state of this terms enum when applicable? +BytesRefFSTEnum indexEnum = new BytesRefFSTEnum<>(fr.index); +InputOutput output = indexEnum.seekFloor(target); +if (output != null) { // should never be null since we already checked against fr.getMin()? + final long code = + fr.readVLongOutput( + new ByteArrayDataInput( + output.output.bytes, output.output.offset, output.output.length)); + final long fpSeek = code >>> Lucene90BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS; + initIndexInput(); + in.prefetch(fpSeek, 1); // TODO: could we know the length of the block? Review Comment: Actually I was mistaken, I thought it was `SegmentTermsEnum` we are talking about -- sorry;) > But for a non-leaf blocks, first all leaf blocks under them are written (in order), and THEN the non-leaf block is written only when we are done with all those recursions and writing any straggler terms that live in the non-leaf block. This means if we subtract the `fp` of a non-leaf block and its next, we will get its sub blocks' total length? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Convert more classes to record classes [lucene]
shubhamvishu commented on code in PR #13328: URL: https://github.com/apache/lucene/pull/13328#discussion_r1623687790 ## lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/GeneratingSuggester.java: ## @@ -447,14 +446,8 @@ private static int commonCharacterPositionScore(String s1, String s2) { return commonScore; } - private static class Weighted> implements Comparable> { -final T word; -final int score; - -Weighted(T word, int score) { - this.word = word; - this.score = score; -} + private record Weighted>(T word, int score) + implements Comparable> { @Override public boolean equals(Object o) { Review Comment: Removed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]
vsop-479 commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1623732526 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,30 @@ private boolean setEOF() { return true; } + @Override + public void prepareSeekExact(BytesRef target) throws IOException { +if (fr.index == null) { + throw new IllegalStateException("terms index was not loaded"); +} + +if (fr.size() == 0 || target.compareTo(fr.getMin()) < 0 || target.compareTo(fr.getMax()) > 0) { + return; +} + +// TODO: should we try to reuse the current state of this terms enum when applicable? +BytesRefFSTEnum indexEnum = new BytesRefFSTEnum<>(fr.index); +InputOutput output = indexEnum.seekFloor(target); +if (output != null) { // should never be null since we already checked against fr.getMin()? + final long code = + fr.readVLongOutput( + new ByteArrayDataInput( + output.output.bytes, output.output.offset, output.output.length)); + final long fpSeek = code >>> Lucene90BlockTreeTermsReader.OUTPUT_FLAGS_NUM_BITS; + initIndexInput(); + in.prefetch(fpSeek, 1); // TODO: could we know the length of the block? Review Comment: Actually I was mistaken, I thought it was `SegmentTermsEnum` we are talking about -- sorry about that;) > But for a non-leaf blocks, first all leaf blocks under them are written (in order), and THEN the non-leaf block is written only when we are done with all those recursions and writing any straggler terms that live in the non-leaf block. This means if we subtract the `fp` of a non-leaf block and its next, we will get its sub blocks' total length? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144392824 Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) for the POC related to above issue to share my understanding. Please note that this is not the final PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144396439 Attaching a [preliminary PR](https://github.com/apache/lucene/pull/13409) for the POC related to above issue to share my understanding. Please note that this is not the final PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]
jpountz commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2144402163 > However, implementing this approach would lead to significant overhead on the client side (such as OpenSearch) both in the terms of code changes and operational overhead like metadata management. Can you give more details? The main difference that comes to mind is that using multiple `IndexWriter`s requires multiple `Directory`s as well and OpenSearch may have a strong assumption that there is a 1:1 mapping between shards and folders on disk. But this could be worked around with a filter `Directory` that flags each index file with a prefix that identifies the group that each index file belongs to? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org