[ https://issues.apache.org/jira/browse/LUCENE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061547#comment-17061547 ]
Alan Woodward commented on LUCENE-9283: --------------------------------------- {code} 20:39:23 [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains 20:39:23 [junit4] 2> TEST FAIL: useCharFilter=false text='\u3121\u3121\u312f\u3111\u3104\u3116 \u6412 \ud847\udd35\ud85d\ude23 tatd lcdqpn - ve v my \ud800\udd9a\ud800\uddcc imie jzi \ufbf2\u0128 \u034e\u035c\u0368 nlocx wklihk' 20:39:23 [junit4] 2> Exception from random analyzer: 20:39:23 [junit4] 2> charfilters= 20:39:23 [junit4] 2> tokenizer= 20:39:23 [junit4] 2> org.apache.lucene.analysis.core.LetterTokenizer() 20:39:23 [junit4] 2> filters= 20:39:23 [junit4] 2> org.apache.lucene.analysis.standard.ClassicFilter(ValidatingTokenFilter@244ccec3 term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1) 20:39:23 [junit4] 2> Conditional:org.apache.lucene.analysis.cz.CzechStemFilter(OneTimeWrapper@2ae1194a term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false) 20:39:23 [junit4] 2> org.apache.lucene.analysis.boost.DelimitedBoostTokenFilter(ValidatingTokenFilter@7590871e term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,boost=1.0, ?) 20:39:23 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRandomChains -Dtests.method=testRandomChainsWithLargeStrings -Dtests.seed=ACB1BAA709A1F2B2 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ur-IN -Dtests.timezone=PNT -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 20:39:23 [junit4] ERROR 0.03s J0 | TestRandomChains.testRandomChainsWithLargeStrings <<< 20:39:23 [junit4] > Throwable #1: java.lang.NumberFormatException: For input string: "Ĩ" 20:39:23 [junit4] > at __randomizedtesting.SeedInfo.seed([ACB1BAA709A1F2B2:C6EA05B650EFD241]:0) 20:39:23 [junit4] > at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054) 20:39:23 [junit4] > at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122) 20:39:23 [junit4] > at java.base/java.lang.Float.parseFloat(Float.java:455) 20:39:23 [junit4] > at org.apache.lucene.analysis.boost.DelimitedBoostTokenFilter.incrementToken(DelimitedBoostTokenFilter.java:52) 20:39:23 [junit4] > at org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:77) 20:39:23 [junit4] > at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:716) 20:39:23 [junit4] > at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:630) 20:39:23 [junit4] > at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:558) 20:39:23 [junit4] > at org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:899) 20:39:23 [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 20:39:23 [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 20:39:23 [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 20:39:23 [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:566) 20:39:23 [junit4] > at java.base/java.lang.Thread.run(Thread.java:834) 20:39:23 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene84): {dummy=PostingsFormat(name=LuceneVarGapDocFreqInterval)}, docValues:{}, maxPointsInLeafNode=1932, maxMBSortInHeap=6.861957584994992, sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e215be6), locale=ur-IN, timezone=PNT {code} > DelimitedBoostTokenFilter can fail testRandomChains > --------------------------------------------------- > > Key: LUCENE-9283 > URL: https://issues.apache.org/jira/browse/LUCENE-9283 > Project: Lucene - Core > Issue Type: Bug > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > > DelimitedBoostTokenFilter expects tokens of the form `token` or > `token|number` and throws a NumberFormatException if the `number` part can't > be parsed. This can cause test failures when we build random chains and > throw random data through them. > We can either exclude DelimiteBoostTokenFilter when building a random > analyzer, or add a flag to ignore badly-formed tokens. I lean towards doing > the former, as I don't really want to make leniency the default here. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org