[ https://issues.apache.org/jira/browse/LUCENE-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-4983: ---------------------------------- Labels: random-chains (was: ) > CommonGramsFilter assumes all input tokens have a length of 1 > ------------------------------------------------------------- > > Key: LUCENE-4983 > URL: https://issues.apache.org/jira/browse/LUCENE-4983 > Project: Lucene - Core > Issue Type: Bug > Reporter: Adrien Grand > Priority: Major > Labels: random-chains > > CommonGramsFilter set posLenAttribute to 2 for bi-grams, no matter the length > of the input tokens. Here is an example seed that produces a failure: > {noformat} > [junit4:junit4] <JUnit4> says Привет! Master seed: 3296009A5B3B7A05 > [junit4:junit4] Executing 1 suite with 1 JVM. > [junit4:junit4] > [junit4:junit4] Started J0 PID(23946@RD-38). > [junit4:junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains > [junit4:junit4] 2> TEST FAIL: useCharFilter=true text='apuqdgtr wjco mpc ' > [junit4:junit4] 2> Exception from random analyzer: > [junit4:junit4] 2> charfilters= > [junit4:junit4] 2> > org.apache.lucene.analysis.pattern.PatternReplaceCharFilter(a, <APOSTROPHE>, > java.io.StringReader@699f982d) > [junit4:junit4] 2> > org.apache.lucene.analysis.charfilter.HTMLStripCharFilter(org.apache.lucene.analysis.pattern.PatternReplaceCharFilter@6cbfe887, > [<NUM>]) > [junit4:junit4] 2> tokenizer= > [junit4:junit4] 2> > org.apache.lucene.analysis.core.LetterTokenizer(LUCENE_44, > org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory@4fb3c3d9, > > org.apache.lucene.analysis.core.TestRandomChains$CheckThatYouDidntReadAnythingReaderWrapper@2b3b2ed8) > [junit4:junit4] 2> filters= > [junit4:junit4] 2> > org.apache.lucene.analysis.util.ElisionFilter(org.apache.lucene.analysis.ValidatingTokenFilter@0, > [iez]) > [junit4:junit4] 2> > org.apache.lucene.analysis.MockGraphTokenFilter(java.util.Random@3a807d14, > org.apache.lucene.analysis.ValidatingTokenFilter@20) > [junit4:junit4] 2> > org.apache.lucene.analysis.commongrams.CommonGramsFilter(LUCENE_44, > org.apache.lucene.analysis.ValidatingTokenFilter@37caea, [bbtzjxco, , > jafehvlp, kujsm, znpfw, xqfni]) > [junit4:junit4] 2> > org.apache.lucene.analysis.bg.BulgarianStemFilter(org.apache.lucene.analysis.ValidatingTokenFilter@6c1927b) > [junit4:junit4] 2> offsetsAreCorrect=true > [junit4:junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TestRandomChains -Dtests.method=testRandomChains > -Dtests.seed=3296009A5B3B7A05 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=ar_YE -Dtests.timezone=Europe/London > -Dtests.file.encoding=US-ASCII > [junit4:junit4] ERROR 14.7s | TestRandomChains.testRandomChains <<< > [junit4:junit4] > Throwable #1: java.lang.IllegalStateException: stage 3: > inconsistent endOffset at pos=2: 13 vs 8; token=[㑮ٯb_ > [junit4:junit4] > at > __randomizedtesting.SeedInfo.seed([3296009A5B3B7A05:F7729FB1C2967C5]:0) > [junit4:junit4] > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:135) > [junit4:junit4] > at > org.apache.lucene.analysis.bg.BulgarianStemFilter.incrementToken(BulgarianStemFilter.java:48) > [junit4:junit4] > at > org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:78) > [junit4:junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:635) > [junit4:junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:546) > [junit4:junit4] > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:447) > [junit4:junit4] > at > org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:944) > [junit4:junit4] > at java.lang.Thread.run(Thread.java:679) > [junit4:junit4] 2> NOTE: test params are: codec=Lucene42: > {dummy=PostingsFormat(name=Direct)}, docValues:{}, sim=DefaultSimilarity, > locale=ar_YE, timezone=Europe/London > [junit4:junit4] 2> NOTE: Linux 3.5.0-27-generic amd64/Sun Microsystems Inc. > 1.6.0_27 (64-bit)/cpus=2,threads=1,free=96085824,total=223412224 > [junit4:junit4] 2> NOTE: All tests run in this JVM: [TestRandomChains] > [junit4:junit4] Completed in 16.32s, 1 test, 1 error <<< FAILURES! > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org