[GitHub] [lucene] rmuir commented on a change in pull request #26: LUCENE-9853: Use CJKWidthCharFilter as the default character width normalizer in JapaneseAnalyzer

GitBox Mon, 22 Mar 2021 05:26:07 -0700


rmuir commented on a change in pull request #26:
URL: https://github.com/apache/lucene/pull/26#discussion_r598268124




##########
File path: 
lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseAnalyzer.java
##########
@@ -39,21 +41,28 @@
   private final Mode mode;
   private final Set<String> stoptags;
   private final UserDictionary userDict;
+  private final boolean charNormalization;
 
   public JapaneseAnalyzer() {
     this(
         null,
         JapaneseTokenizer.DEFAULT_MODE,
         DefaultSetHolder.DEFAULT_STOP_SET,
-        DefaultSetHolder.DEFAULT_STOP_TAGS);
+        DefaultSetHolder.DEFAULT_STOP_TAGS,
+        true);
   }
 
   public JapaneseAnalyzer(
-      UserDictionary userDict, Mode mode, CharArraySet stopwords, Set<String> 
stoptags) {
+      UserDictionary userDict,
+      Mode mode,
+      CharArraySet stopwords,
+      Set<String> stoptags,
+      boolean charNormalization) {

Review comment:
       I think this a bit confusing, if set to `false`, character normalization 
is still performed, just a different place in the chain. 
   
   Do we really need this parameter? I think it would be better to document it 
well in CHANGES.txt. If the user wants different behavior they can make a 
Analyzer from the different components very easily?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on a change in pull request #26: LUCENE-9853: Use CJKWidthCharFilter as the default character width normalizer in JapaneseAnalyzer

Reply via email to