Geoffrey Lawson created LUCENE-9585:
---------------------------------------
Summary: Make preserving original token in
CompoundWordTokenFilterBase configurable
Key: LUCENE-9585
URL: https://issues.apache.org/jira/browse/LUCENE-9585
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 8.5.1
Reporter: Geoffrey Lawson
When using a subclass of CompoundWordTokenFilterBase the filter will always
output the original input token along with the decomposed tokens if there are
any. This will result in documents that originally had the compound form to
have both the compound and decomposed form while documents that originally had
the decomposed form will only have the decomposed form. Only queries in the
decomposed forms will match more documents when using this filter.
If the filter can also be run at query time compound forms can be decomposed
and match additional documents. To do this the filter needs to be able to
return only the decomposed form if there is a decomposed form.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]