[ https://issues.apache.org/jira/browse/LUCENE-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolás Lichtmaier updated LUCENE-8723: --------------------------------------- Affects Version/s: 8.3 > Bad interaction bewteen WordDelimiterGraphFilter, StopFilter and > FlattenGraphFilter > ----------------------------------------------------------------------------------- > > Key: LUCENE-8723 > URL: https://issues.apache.org/jira/browse/LUCENE-8723 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 7.7.1, 8.0, 8.3 > Reporter: Nicolás Lichtmaier > Priority: Major > > I was debugging an issue (missing tokens after analysis) and when I enabled > Java assertions I uncovered a bug when using WordDelimiterGraphFilter + > StopFilter + FlattenGraphFilter. > I could reproduce the issue in a small piece of code. This code gives an > assertion failure when assertions are enabled (-ea java option): > {code:java} > Builder builder = CustomAnalyzer.builder(); > builder.withTokenizer(StandardTokenizerFactory.class); > builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, > "preserveOriginal", "1"); > builder.addTokenFilter(StopFilterFactory.class); > builder.addTokenFilter(FlattenGraphFilterFactory.class); > Analyzer analyzer = builder.build(); > > TokenStream ts = analyzer.tokenStream("*", new StringReader("x7in")); > ts.reset(); > while(ts.incrementToken()) > ; > {code} > This gives: > {code} > Exception in thread "main" java.lang.AssertionError: 2 > at > org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195) > at > org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258) > at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32) > {code} > Maybe removing stop words after WordDelimiterGraphFilter is wrong, I don't > know. However is the only way to process stop-words generated by that filter. > In any case, it should not eat tokens or produce assertions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org