rmuir commented on a change in pull request #26: URL: https://github.com/apache/lucene/pull/26#discussion_r598268124
########## File path: lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseAnalyzer.java ########## @@ -39,21 +41,28 @@ private final Mode mode; private final Set<String> stoptags; private final UserDictionary userDict; + private final boolean charNormalization; public JapaneseAnalyzer() { this( null, JapaneseTokenizer.DEFAULT_MODE, DefaultSetHolder.DEFAULT_STOP_SET, - DefaultSetHolder.DEFAULT_STOP_TAGS); + DefaultSetHolder.DEFAULT_STOP_TAGS, + true); } public JapaneseAnalyzer( - UserDictionary userDict, Mode mode, CharArraySet stopwords, Set<String> stoptags) { + UserDictionary userDict, + Mode mode, + CharArraySet stopwords, + Set<String> stoptags, + boolean charNormalization) { Review comment: I think this a bit confusing, if set to `false`, character normalization is still performed, just a different place in the chain. Do we really need this parameter? I think it would be better to document it well in CHANGES.txt. If the user wants different behavior they can make a Analyzer from the different components very easily? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org