[ https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189288#comment-17189288 ]
Michael McCandless commented on LUCENE-8985: -------------------------------------------- Thank you [~nppoly] for the thorough PR – I will try to review again soon. Holes are a challenge for graph token filters. I have long felt that stop words (and other holey tokens) should not be removed from the token stream, but rather a new {{DeletedAttribute}} would mark the token as deleted (and not to be indexed) but still the token would remain (and be {{incrementToken}}'d) to record the metadata about that token for the sake of future token filter stages. E.g. this would allow analyzers that mark stopwords for deletion but then e.g. a {{SynonymGraphFilter}} could still apply over the stopwords, e.g. {{lord of the rings}} could still match properly even if {{of}} and {{the}} were marked as deleted. But, that is a much larger change, and no need to hold up this first approach for that! > SynonymGraphFilter cannot handle input stream with tokens filtered. > ------------------------------------------------------------------- > > Key: LUCENE-8985 > URL: https://issues.apache.org/jira/browse/LUCENE-8985 > Project: Lucene - Core > Issue Type: Bug > Reporter: Chongchen Chen > Priority: Major > Fix For: 8.3 > > Attachments: SGF_SF_interaction.patch.txt > > Time Spent: 4h > Remaining Estimate: 0h > > [~janhoy] find the bug. > In an analyzer with e.g. stopFilter where tokens are removed from the stream > and replaced with a “hole”, synonymgraphfilter will not preserve these holes > but remove them, resulting in certain phrase queries failing. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org