ankitsultana commented on code in PR #12392: URL: https://github.com/apache/pinot/pull/12392#discussion_r1486692865
########## pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java: ########## @@ -570,6 +572,81 @@ public static String[] split(String input, String delimiter, int limit) { return StringUtils.splitByWholeSeparator(input, delimiter, limit); } + /** + * @param input an input string for prefix strings generations. + * @param length the max length of the prefix strings for the string. + * @param regexChar the character for regex matching to be added to prefix strings generated. e.g. '^' + * @return generate an array of prefix strings of the string that are shorter than the specified length. + */ + @ScalarFunction + public static String[] prefix(String input, int length, String regexChar) { + ObjectSet<String> prefixSet = new ObjectLinkedOpenHashSet<>(); + for (int prefixLength = 1; prefixLength <= length && prefixLength <= input.length(); prefixLength++) { + if (regexChar != null) { + prefixSet.add(regexChar + input.substring(0, prefixLength)); + } else { + prefixSet.add(input.substring(0, prefixLength)); + } + } + return prefixSet.toArray(new String[0]); + } + + /** + * @param input an input string for suffix strings generations. + * @param length the max length of the suffix strings for the string. + * @param regexChar the character for regex matching to be added to suffix strings generated. e.g. '$' + * @return generate an array of suffix strings of the string that are shorter than the specified length. + */ + @ScalarFunction + public static String[] suffix(String input, int length, String regexChar) { + ObjectSet<String> suffixSet = new ObjectLinkedOpenHashSet<>(); + for (int suffixLength = 1; suffixLength <= length && suffixLength <= input.length(); suffixLength++) { + if (regexChar != null) { + suffixSet.add(input.substring(input.length() - suffixLength) + regexChar); + } else { + suffixSet.add(input.substring(input.length() - suffixLength)); + } + } + return suffixSet.toArray(new String[0]); + } + + /** + * @param input an input string for ngram generations. + * @param length the max length of the ngram for the string. + * @return generate an array of ngram of the string that length are exactly matching the specified length. + */ + @ScalarFunction + public static String[] ngram(String input, int length) { Review Comment: Suggest renaming this to `ngrams` to be consistent with CH: https://clickhouse.com/docs/en/sql-reference/functions/splitting-merging-functions#ngrams ########## pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java: ########## @@ -570,6 +572,81 @@ public static String[] split(String input, String delimiter, int limit) { return StringUtils.splitByWholeSeparator(input, delimiter, limit); } + /** + * @param input an input string for prefix strings generations. + * @param length the max length of the prefix strings for the string. + * @param regexChar the character for regex matching to be added to prefix strings generated. e.g. '^' + * @return generate an array of prefix strings of the string that are shorter than the specified length. + */ + @ScalarFunction + public static String[] prefix(String input, int length, String regexChar) { Review Comment: The name of the function may lead users to think that this is equivalent to `input.substring(0, arg)`. But instead this is returning all prefixes <= given length. ########## pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java: ########## @@ -570,6 +572,81 @@ public static String[] split(String input, String delimiter, int limit) { return StringUtils.splitByWholeSeparator(input, delimiter, limit); } + /** + * @param input an input string for prefix strings generations. + * @param length the max length of the prefix strings for the string. + * @param regexChar the character for regex matching to be added to prefix strings generated. e.g. '^' Review Comment: What's the role of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org