jackjlli commented on code in PR #9708:
URL: https://github.com/apache/pinot/pull/9708#discussion_r1012074635
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java:
##########
@@ -55,10 +59,15 @@ public class LuceneTextIndexCreator implements
TextIndexCreator {
private int _nextDocId = 0;
- public static final CharArraySet ENGLISH_STOP_WORDS_SET = new
CharArraySet(Arrays
- .asList("a", "an", "and", "are", "as", "at", "be", "but", "by", "for",
"if", "in", "into", "is", "it", "no",
- "not", "of", "on", "or", "such", "that", "the", "their", "then",
"than", "there", "these", "they", "this",
- "to", "was", "will", "with", "those"), true);
+ public static HashSet<String> getDefaultEnglishStopWordsSet() {
+ return new HashSet<>(
Review Comment:
nit: `ImmutableSet.of("a", "an", ...)` as the default set won't be changed?
also make it `final`.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java:
##########
@@ -82,16 +91,20 @@ public class LuceneTextIndexCreator implements
TextIndexCreator {
* no need to commit the index from the realtime side. So when
the realtime segment
* is destroyed (which is after the realtime segment has been
committed and converted
* to offline), we close this lucene index writer to release
resources but don't commit.
- * This is the reason to have commit flag part of the
constructor.
+ * @param stopWordsInclude the words to include in addition to the default
stop word list
Review Comment:
What if `stopWordsInclude` and `stopWordsExclude` have the same word, will
the code throw any exception on that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]