ChrisHegarty opened a new issue, #13426:
URL: https://github.com/apache/lucene/issues/13426

   A recent change, #13406 , added an assertion that may be incorrect.
   
   The assertion asserts that the number of entries matches that of the number 
of inputs processed. This may not be the case then a duplicate entry is passed 
in. For example,
   
   
   Add a duplicate entry:
   ```
   $ git diff
   diff --git 
a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt 
b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
   index 045b64eaa07..4513885a36b 100644
   --- 
a/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
   +++ 
b/lucene/analysis/nori/src/test/org/apache/lucene/analysis/ko/userdict.txt
   @@ -5,6 +5,7 @@ C샤프
    세종시 세종 시
    대한민국날씨
    대한민국
   +대한민국
    날씨
    21세기대한민국
    세기
   \ No newline at end of file
   ```
   
   ```
   $ ./gradlew :lucene:analysis:nori:test --tests 
"org.apache.lucene.analysis.ko.TestKoreanTokenizer"
   ```
   
   ```
   reproduce with: gradlew test --tests 
TestKoreanTokenizer.testPartOfSpeechsWithCompound -Dtests.seed=6445594235429961 
-Dtests.locale=bs-BA -Dtests.timezone=Atlantic/Faeroe -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
      >     java.lang.AssertionError
      >         at 
__randomizedtesting.SeedInfo.seed([6445594235429961:9CEA448B4AB5C1F2]:0)
      >         at 
org.apache.lucene.analysis.ko.dict.UserDictionary.<init>(UserDictionary.java:137)
      >         at 
org.apache.lucene.analysis.ko.dict.UserDictionary.open(UserDictionary.java:69)
      >         at 
org.apache.lucene.analysis.ko.TestKoreanTokenizer.readDict(TestKoreanTokenizer.java:51)
      >         at 
org.apache.lucene.analysis.ko.TestKoreanTokenizer.setUp(TestKoreanTokenizer.java:63)
      >         at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
      >         at java.base/java.lang.reflect.Method.invoke(Method.java:580)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:980)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
      >         at 
junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
      >         at 
randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
      >         at 
org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   ...
   ```
   
   I encountered this assertion firing when testing a snapshot of the Lucene 
branch with Elasticsearch. The 
testNoriAnalyzerDuplicateUserDictRuleWithLegacyVersion test fails (hits the 
assertion), see 
https://github.com/elastic/elasticsearch/blob/main/plugins/analysis-nori/src/test/java/org/elasticsearch/plugin/analysis/nori/NoriAnalysisTests.java#L132C5-L145C1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to