Geoffrey Lawson created LUCENE-9585: ---------------------------------------
Summary: Make preserving original token in CompoundWordTokenFilterBase configurable Key: LUCENE-9585 URL: https://issues.apache.org/jira/browse/LUCENE-9585 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 8.5.1 Reporter: Geoffrey Lawson When using a subclass of CompoundWordTokenFilterBase the filter will always output the original input token along with the decomposed tokens if there are any. This will result in documents that originally had the compound form to have both the compound and decomposed form while documents that originally had the decomposed form will only have the decomposed form. Only queries in the decomposed forms will match more documents when using this filter. If the filter can also be run at query time compound forms can be decomposed and match additional documents. To do this the filter needs to be able to return only the decomposed form if there is a decomposed form. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org