[ 
https://issues.apache.org/jira/browse/LUCENE-10362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-10362:
-----------------------------------
    Labels: random-chains  (was: )

> JapaneseNumberFilter messes up offsets
> --------------------------------------
>
>                 Key: LUCENE-10362
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10362
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Priority: Major
>              Labels: random-chains
>
> It is a tokenfilter, tries to change offsets, so of course TestRandomChains 
> finds bugs in it
> {noformat}
>  2> NOTE: reproduce with: gradlew test --tests 
> TestRandomChains.testRandomChains -Dtests.seed=CE566FFD0024BDB0 
> -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=en-PG 
> -Dtests.timezone=CST -Dtests.asserts=true -Dtests.file.encoding=UTF-8
> {noformat}
> {noformat}
> org.apache.lucene.analysis.tests.TestRandomChains > test suite's output saved 
> to 
> /home/rmuir/workspace/lucene/lucene/analysis/integration.tests/build/test-results/test_16/outputs/OUTPUT-org.apache.lucene.analysis.tests.TestRandomChains.txt,
>  copied below:
>   2> stage 0: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> 
> yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t 
> <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> 
> 64<[8-10] +1>
>   2> stage 1: m<[0-1] +1> mi<[0-2] +1> i<[1-2] +1> iy<[1-3] +1> y<[2-3] +1> 
> yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> k<[4-5] +1> kt<[4-6] +1> t<[5-6] +1> t 
> <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 2<[7-8] +1> 26<[7-9] +1> 6<[8-9] +1> 
> 64<[8-10] +1>
>   2> stage 2: n<[3-4] +1> nk<[3-5] +1> <HIRAGANA>word<[3-5] +0> k<[4-5] +1> 
> <HIRAGANA>word<[4-5] +0> kt<[4-6] +1> <HIRAGANA>word<[4-6] +0> t<[5-6] +1> 
> <HIRAGANA>word<[5-6] +0> t <[5-7] +1>  <[6-7] +1>  2<[6-8] +1> 
> <HIRAGANA>word<[6-8] +0> 2<[7-8] +1> <HIRAGANA>word<[7-8] +0> 26<[7-9] +1> 
> <HIRAGANA>word<[7-9] +0> 6<[8-9] +1> 64<[8-10] +1> <HIRAGANA>word<[8-10] +0>
>   2> last stage: yn<[2-4] +1> n<[3-4] +1> nk<[3-5] +1> <HIRAGANA>word<[3-5] 
> +0> k<[4-5] +1> <HIRAGANA>word<[4-5] +0> kt<[4-6] +1> <HIRAGANA>word<[4-6] 
> +0> t<[5-6] +1> <HIRAGANA>word<[5-6] +0> t <[5-7] +1>  <[6-7] +1>  2<[6-8] 
> +1> <HIRAGANA>word<[6-8] +0> 2<[7-8] +1> <HIRAGANA>word<[7-8] +0> 26<[7-9] 
> +1> <HIRAGANA>word<[7-9] +0> 6<[8-9] +1> <HIRAGANA>word<[8-10] +0>
>   2> TEST FAIL: useCharFilter=false text='miynkt 264957329&#'
>   2> Exception from random analyzer:
>   2> charfilters=
>   2> tokenizer=
>   2>   org.apache.lucene.analysis.ngram.NGramTokenizer()
>   2> filters=
>   2>   
> Conditional:org.apache.lucene.analysis.icu.ICUNormalizer2Filter(OneTimeWrapper@3b5fdc7f
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,
>  com.ibm.icu.impl.Norm2AllModes$ComposeNormalizer2@5ef6381c)
>   2>   
> Conditional:org.apache.lucene.analysis.miscellaneous.TypeAsSynonymFilter(OneTimeWrapper@3e803db2
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,
>  <HIRAGANA>)
>   2>   
> Conditional:org.apache.lucene.analysis.ja.JapaneseNumberFilter(OneTimeWrapper@20de0223
>  
> term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,flags=0,keyword=false)
>    >     java.lang.IllegalStateException: last stage: inconsistent endOffset 
> at pos=17: 9 vs 10; token=<HIRAGANA>word
>    >         at 
> __randomizedtesting.SeedInfo.seed([CE566FFD0024BDB0:F3B7469C4736A070]:0)
>    >         at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:164)
>    >         at 
> org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:1130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to