[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961985#comment-16961985 ]
David Smiley commented on LUCENE-9018: -------------------------------------- Thanks for contributing! * Factory: see method {{getChar}} instead of simply {{get}} * I think we should use the same factory parameter name for this that ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator". Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" but that filter is more niche so I prefer to standardize on the choice made by the more common filter. * I think the semantics of both "preserveSep" (a boolean) and "separator" (the char) as you have defined it, is confusing. You've made the separator preservation an OR between those two. I think it's clearer to keep preserveSep as the toggle that decides if we need to preserve a separator at all, and use "separator" to be the setting that determines _what_ the separator char should be (only honored when preserveSep==true). The latter should simply default to SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly simpler conditional, and moreover something I find easier to understand. The documentation on preserveSep would need a slight adjustment to point to separator setting since the separator won't always be SEP_LABEL anymore. * I think ConcatenateGraphFilter could have just one Character separator that may be null. This would replace preserveSep. > Separator for ConcatenateGraphFilterFactory > ------------------------------------------- > > Key: LUCENE-9018 > URL: https://issues.apache.org/jira/browse/LUCENE-9018 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: Stanislav Mikulchik > Assignee: David Smiley > Priority: Minor > Attachments: LUCENE-9018.patch > > > I would like to have an option to choose a separator to use for token > concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" > symbol. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org