[jira] [Commented] (LUCENE-8985) SynonymGraphFilter cannot handle input stream with tokens filtered.

Michael McCandless (Jira) Wed, 02 Sep 2020 07:55:54 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189288#comment-17189288
 ]


Michael McCandless commented on LUCENE-8985:
--------------------------------------------

Thank you [~nppoly] for the thorough PR – I will try to review again soon.

Holes are a challenge for graph token filters.  I have long felt that stop 
words (and other holey tokens) should not be removed from the token stream, but 
rather a new {{DeletedAttribute}} would mark the token as deleted (and not to 
be indexed) but still the token would remain (and be {{incrementToken}}'d) to 
record the metadata about that token for the sake of future token filter 
stages.  E.g. this would allow analyzers that mark stopwords for deletion but 
then e.g. a {{SynonymGraphFilter}} could still apply over the stopwords, e.g. 
{{lord of the rings}} could still match properly even if {{of}} and {{the}} 
were marked as deleted.  But, that is a much larger change, and no need to hold 
up this first approach for that!

> SynonymGraphFilter cannot handle input stream with tokens filtered.
> -------------------------------------------------------------------
>
>                 Key: LUCENE-8985
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8985
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Chongchen Chen
>            Priority: Major
>             Fix For: 8.3
>
>         Attachments: SGF_SF_interaction.patch.txt
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> [~janhoy] find the bug.
> In an analyzer with e.g. stopFilter where tokens are removed from the stream 
> and replaced with a “hole”, synonymgraphfilter will not preserve these holes 
> but remove them, resulting in certain phrase queries failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8985) SynonymGraphFilter cannot handle input stream with tokens filtered.

Reply via email to