ChrisHegarty opened a new issue, #13426: URL: https://github.com/apache/lucene/issues/13426
A recent change, #13406 , added an assertion that may be incorrect. The assertion asserts that the number of entries matches that of the number of inputs processed. This may not be the case then a duplicate entry is passed in. For example, Add a duplicate entry: ``` $ git diff diff --git a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt index 045b64eaa07..4513885a36b 100644 --- a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt +++ b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt @@ -5,6 +5,7 @@ C샤프 세종시 세종 시 대한민국날씨 대한민국 +대한민국 날씨 21세기대한민국 세기 \ No newline at end of file ``` ``` $ ./gradlew :lucene:analysis:nori:test --tests "org.apache.lucene.analysis.ko.TestKoreanTokenizer" ``` ``` reproduce with: gradlew test --tests TestKoreanTokenizer.testPartOfSpeechsWithCompound -Dtests.seed=6445594235429961 -Dtests.locale=bs-BA -Dtests.timezone=Atlantic/Faeroe -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > java.lang.AssertionError > at __randomizedtesting.SeedInfo.seed([6445594235429961:9CEA448B4AB5C1F2]:0) > at org.apache.lucene.analysis.ko.dict.UserDictionary.<init>(UserDictionary.java:137) > at org.apache.lucene.analysis.ko.dict.UserDictionary.open(UserDictionary.java:69) > at org.apache.lucene.analysis.ko.TestKoreanTokenizer.readDict(TestKoreanTokenizer.java:51) > at org.apache.lucene.analysis.ko.TestKoreanTokenizer.setUp(TestKoreanTokenizer.java:63) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) > at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) ... ``` I encountered this assertion firing when testing a snapshot of the Lucene branch with Elasticsearch. The testNoriAnalyzerDuplicateUserDictRuleWithLegacyVersion test fails (hits the assertion), see https://github.com/elastic/elasticsearch/blob/main/plugins/analysis-nori/src/test/java/org/elasticsearch/plugin/analysis/nori/NoriAnalysisTests.java#L132C5-L145C1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org