[ https://issues.apache.org/jira/browse/LUCENE-10361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-10361: ----------------------------------- Labels: random-chains (was: ) > KoreanNumberFilter messes up offsets > ------------------------------------ > > Key: LUCENE-10361 > URL: https://issues.apache.org/jira/browse/LUCENE-10361 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Priority: Major > Labels: random-chains > > It is a tokenfilter, tries to change offsets, so of course TestRandomChains > finds bugs in it: > {noformat} > NOTE: reproduce with: gradlew test --tests TestRandomChains.testRandomChains > -Dtests.seed=12BC606B774693E4 -Dtests.nightly=true -Dtests.slow=true > -Dtests.locale=om-Latn-ET -Dtests.timezone=Australia/Yancowinna > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {noformat} > {noformat} > org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved > to > /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt, > copied below: > 2> stage 0: 뱅<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 履<[6-7] +1> jEqyzUT<[8-15] > +1> > 2> stage 1: 000000<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 000000<[6-7] +1> > 154300<[8-15] +1> 454300<[8-15] +0> > 2> last stage: 0<[0-1] +1> Ƒ<[1-2] +1> ė<[3-4] +1> 000000<[6-7] +1> > 454300<[8-15] +0> > 2> TEST FAIL: useCharFilter=false > text='\ubc45\u0191(\u0117\ud8ad\udf0a\uf9df jEqyzUT ' > 2> Exception from random analyzer: > 2> charfilters= > 2> > org.apache.lucene.analysis.cjk.CJKWidthCharFilter(java.io.StringReader@17af5384) > 2> > org.apache.lucene.analysis.charfilter.MappingCharFilter(org.apache.lucene.analysis.charfilter.NormalizeCharMap@33e5bdbb, > org.apache.lucene.analysis.cjk.CJKWidthCharFilter@1aafd271) > 2> tokenizer= > 2> > org.apache.lucene.analysis.icu.segmentation.ICUTokenizer(org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig@4e6f4690) > 2> filters= > 2> > Conditional:org.apache.lucene.analysis.phonetic.DaitchMokotoffSoundexFilter(OneTimeWrapper@34215eb7 > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common, > false) > 2> > org.apache.lucene.analysis.ko.KoreanNumberFilter(ValidatingTokenFilter@7b4a2a5b > > term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,script=Common,keyword=false) > > java.lang.IllegalStateException: last stage: inconsistent > startOffset at pos=3: 6 vs 8; token=454300 > > at > __randomizedtesting.SeedInfo.seed([12BC606B774693E4:2F5D490A30548E24]:0) > > at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:138) > > at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130) > > at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:1028) > > at > org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:922) > > at > org.apache.lucene.analysis.tests@10.0.0-SNAPSHOT/org.apache.lucene.analysis.tests.TestRandomChains.testRandomChains(TestRandomChains.java:915) > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org