[jira] [Commented] (LUCENE-9283) DelimitedBoostTokenFilter can fail testRandomChains

Alan Woodward (Jira) Wed, 18 Mar 2020 02:39:49 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061547#comment-17061547
 ]


Alan Woodward commented on LUCENE-9283:
---------------------------------------

{code}
20:39:23    [junit4] Suite: org.apache.lucene.analysis.core.TestRandomChains
20:39:23    [junit4]   2> TEST FAIL: useCharFilter=false 
text='\u3121\u3121\u312f\u3111\u3104\u3116 \u6412  \ud847\udd35\ud85d\ude23  
tatd lcdqpn - ve v my \ud800\udd9a\ud800\uddcc  imie jzi \ufbf2\u0128 
\u034e\u035c\u0368  nlocx  wklihk'
20:39:23    [junit4]   2> Exception from random analyzer: 
20:39:23    [junit4]   2> charfilters=
20:39:23    [junit4]   2> tokenizer=
20:39:23    [junit4]   2>   org.apache.lucene.analysis.core.LetterTokenizer()
20:39:23    [junit4]   2> filters=
20:39:23    [junit4]   2>   
org.apache.lucene.analysis.standard.ClassicFilter(ValidatingTokenFilter@244ccec3
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1)
20:39:23    [junit4]   2>   
Conditional:org.apache.lucene.analysis.cz.CzechStemFilter(OneTimeWrapper@2ae1194a
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false)
20:39:23    [junit4]   2>   
org.apache.lucene.analysis.boost.DelimitedBoostTokenFilter(ValidatingTokenFilter@7590871e
 
term=,bytes=[],startOffset=0,endOffset=0,positionIncrement=1,positionLength=1,type=word,termFrequency=1,keyword=false,boost=1.0,
 ?)
20:39:23    [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestRandomChains -Dtests.method=testRandomChainsWithLargeStrings 
-Dtests.seed=ACB1BAA709A1F2B2 -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=ur-IN -Dtests.timezone=PNT -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
20:39:23    [junit4] ERROR   0.03s J0 | 
TestRandomChains.testRandomChainsWithLargeStrings <<<
20:39:23    [junit4]    > Throwable #1: java.lang.NumberFormatException: For 
input string: "Ĩ"
20:39:23    [junit4]    >       at 
__randomizedtesting.SeedInfo.seed([ACB1BAA709A1F2B2:C6EA05B650EFD241]:0)
20:39:23    [junit4]    >       at 
java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
20:39:23    [junit4]    >       at 
java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
20:39:23    [junit4]    >       at 
java.base/java.lang.Float.parseFloat(Float.java:455)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.boost.DelimitedBoostTokenFilter.incrementToken(DelimitedBoostTokenFilter.java:52)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:77)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:716)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:630)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:558)
20:39:23    [junit4]    >       at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChainsWithLargeStrings(TestRandomChains.java:899)
20:39:23    [junit4]    >       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
20:39:23    [junit4]    >       at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
20:39:23    [junit4]    >       at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
20:39:23    [junit4]    >       at 
java.base/java.lang.reflect.Method.invoke(Method.java:566)
20:39:23    [junit4]    >       at 
java.base/java.lang.Thread.run(Thread.java:834)
20:39:23    [junit4]   2> NOTE: test params are: codec=Asserting(Lucene84): 
{dummy=PostingsFormat(name=LuceneVarGapDocFreqInterval)}, docValues:{}, 
maxPointsInLeafNode=1932, maxMBSortInHeap=6.861957584994992, 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e215be6),
 locale=ur-IN, timezone=PNT
{code}

> DelimitedBoostTokenFilter can fail testRandomChains
> ---------------------------------------------------
>
>                 Key: LUCENE-9283
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9283
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>
> DelimitedBoostTokenFilter expects tokens of the form `token` or 
> `token|number` and throws a NumberFormatException if the `number` part can't 
> be parsed.  This can cause test failures when we build random chains and 
> throw random data through them.
> We can either exclude DelimiteBoostTokenFilter when building a random 
> analyzer, or add a flag to ignore badly-formed tokens. I lean towards doing 
> the former, as I don't really want to make leniency the default here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9283) DelimitedBoostTokenFilter can fail testRandomChains

Reply via email to