[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961985#comment-16961985
 ] 

David Smiley commented on LUCENE-9018:
--------------------------------------

Thanks for contributing!
* Factory: see method {{getChar}} instead of simply {{get}}
* I think we should use the same factory parameter name for this that 
ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator".  
Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" 
but that filter is more niche so I prefer to standardize on the choice made by 
the more common filter.
* I think the semantics of both "preserveSep" (a boolean) and "separator" (the 
char) as you have defined it, is confusing. You've made the separator 
preservation an OR between those two.  I think it's clearer to keep preserveSep 
as the toggle that decides if we need to preserve a separator at all, and use 
"separator" to be the setting that determines _what_ the separator char should 
be (only honored when preserveSep==true).  The latter should simply default to 
SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly 
simpler conditional, and moreover something I find easier to understand.  The 
documentation on preserveSep would need a slight adjustment to point to 
separator setting since the separator won't always be SEP_LABEL anymore.
* I think ConcatenateGraphFilter could have just one Character separator that 
may be null.  This would replace preserveSep.

> Separator for ConcatenateGraphFilterFactory
> -------------------------------------------
>
>                 Key: LUCENE-9018
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9018
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Stanislav Mikulchik
>            Assignee: David Smiley
>            Priority: Minor
>         Attachments: LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to