deemoliu commented on code in PR #12392:
URL: https://github.com/apache/pinot/pull/12392#discussion_r1501241895


##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -570,6 +572,107 @@ public static String[] split(String input, String 
delimiter, int limit) {
     return StringUtils.splitByWholeSeparator(input, delimiter, limit);
   }
 
+  /**
+   * @param input an input string for prefix strings generations.
+   * @param maxlength the max length of the prefix strings for the string.
+   * @return generate an array of prefix strings of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] prefixes(String input, int maxlength) {
+    ObjectSet<String> prefixSet = new ObjectLinkedOpenHashSet<>();
+    for (int prefixLength = 1; prefixLength <= maxlength && prefixLength <= 
input.length(); prefixLength++) {
+      prefixSet.add(input.substring(0, prefixLength));
+    }
+    return prefixSet.toArray(new String[0]);
+  }
+
+  /**
+   * @param input an input string for prefix strings generations.
+   * @param maxlength the max length of the prefix strings for the string.
+   * @param regexChar the character for regex matching to be added to prefix 
strings generated. e.g. '^'
+   * @return generate an array of prefix matchers of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] prefixMatchers(String input, int maxlength, String 
regexChar) {

Review Comment:
   sg. regex matching is one of the subset of the function. there are other 
cases like key-value pair matching etc.



##########
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java:
##########
@@ -570,6 +572,107 @@ public static String[] split(String input, String 
delimiter, int limit) {
     return StringUtils.splitByWholeSeparator(input, delimiter, limit);
   }
 
+  /**
+   * @param input an input string for prefix strings generations.
+   * @param maxlength the max length of the prefix strings for the string.
+   * @return generate an array of prefix strings of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] prefixes(String input, int maxlength) {
+    ObjectSet<String> prefixSet = new ObjectLinkedOpenHashSet<>();
+    for (int prefixLength = 1; prefixLength <= maxlength && prefixLength <= 
input.length(); prefixLength++) {
+      prefixSet.add(input.substring(0, prefixLength));
+    }
+    return prefixSet.toArray(new String[0]);
+  }
+
+  /**
+   * @param input an input string for prefix strings generations.
+   * @param maxlength the max length of the prefix strings for the string.
+   * @param regexChar the character for regex matching to be added to prefix 
strings generated. e.g. '^'
+   * @return generate an array of prefix matchers of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] prefixMatchers(String input, int maxlength, String 
regexChar) {
+    if (regexChar == null) {
+      return prefixes(input, maxlength);
+    }
+    ObjectSet<String> prefixSet = new ObjectLinkedOpenHashSet<>();
+    for (int prefixLength = 1; prefixLength <= maxlength && prefixLength <= 
input.length(); prefixLength++) {
+      prefixSet.add(regexChar + input.substring(0, prefixLength));
+    }
+    return prefixSet.toArray(new String[0]);
+  }
+
+  /**
+   * @param input an input string for suffix strings generations.
+   * @param maxlength the max length of the suffix strings for the string.
+   * @return generate an array of suffix strings of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] suffixes(String input, int maxlength) {
+    ObjectSet<String> suffixSet = new ObjectLinkedOpenHashSet<>();
+    for (int suffixLength = 1; suffixLength <= maxlength && suffixLength <= 
input.length(); suffixLength++) {
+      suffixSet.add(input.substring(input.length() - suffixLength));
+    }
+    return suffixSet.toArray(new String[0]);
+  }
+
+  /**
+   * @param input an input string for suffix strings generations.
+   * @param maxlength the max length of the suffix strings for the string.
+   * @param regexChar the character for regex matching to be added to suffix 
strings generated. e.g. '$'
+   * @return generate an array of suffix matchers of the string that are 
shorter than the specified length.
+   */
+  @ScalarFunction
+  public static String[] suffixMatchers(String input, int maxlength, String 
regexChar) {
+    if (regexChar == null) {
+      return suffixes(input, maxlength);
+    }
+    ObjectSet<String> suffixSet = new ObjectLinkedOpenHashSet<>();
+    for (int suffixLength = 1; suffixLength <= maxlength && suffixLength <= 
input.length(); suffixLength++) {
+      suffixSet.add(input.substring(input.length() - suffixLength) + 
regexChar);
+    }
+    return suffixSet.toArray(new String[0]);
+  }
+
+  /**
+   * @param input an input string for ngram generations.
+   * @param length the max length of the ngram for the string.
+   * @return generate an array of ngram of the string that length are exactly 
matching the specified length.
+   */
+  @ScalarFunction
+  public static String[] ngrams(String input, int length) {

Review Comment:
   sg.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to