[
https://issues.apache.org/jira/browse/LUCENE-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223362#comment-17223362
]
Geoffrey Lawson commented on LUCENE-9585:
-----------------------------------------
Looking at LUCENE-5620 there is already a discussion about how to handle Token
filters where the user way want to preserve the original token or not.
Following the preserve/restore pattern described has issues with
CompoundWordTokenFilterBase. This filter already preserves the original token
so we would first need to change that behavior. No longer preserving the
original token would be a large change in behavior for existing users.
> Make preserving original token in CompoundWordTokenFilterBase configurable
> --------------------------------------------------------------------------
>
> Key: LUCENE-9585
> URL: https://issues.apache.org/jira/browse/LUCENE-9585
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Affects Versions: 8.5.1
> Reporter: Geoffrey Lawson
> Priority: Minor
>
> When using a subclass of CompoundWordTokenFilterBase the filter will always
> output the original input token along with the decomposed tokens if there are
> any. This will result in documents that originally had the compound form to
> have both the compound and decomposed form while documents that originally
> had the decomposed form will only have the decomposed form. Only queries in
> the decomposed forms will match more documents when using this filter.
> If the filter can also be run at query time compound forms can be decomposed
> and match additional documents. To do this the filter needs to be able to
> return only the decomposed form if there is a decomposed form.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]