Geoffrey Lawson created LUCENE-9585:
---------------------------------------

             Summary: Make preserving original token in 
CompoundWordTokenFilterBase configurable
                 Key: LUCENE-9585
                 URL: https://issues.apache.org/jira/browse/LUCENE-9585
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
    Affects Versions: 8.5.1
            Reporter: Geoffrey Lawson


When using a subclass of CompoundWordTokenFilterBase the filter will always 
output the original input token along with the decomposed tokens if there are 
any. This will result in documents that originally had the compound form to 
have both the compound and decomposed form while documents that originally had 
the decomposed form will only have the decomposed form. Only queries in the 
decomposed forms will match more documents when using this filter.

If the filter can also be run at query time compound forms can be decomposed 
and match additional documents. To do this the filter needs to be able to 
return only the decomposed form if there is a decomposed form. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to