deemoliu opened a new pull request, #12392: URL: https://github.com/apache/pinot/pull/12392
`feature`: Adding ngram, prefix, postfix UDFs Context: We are onboarding a use case and trying the inrease query throughput. We tested the QPS cannot further improved with the existing REGEXP_LIKE queries or text_match queries. The queries as follows; `select col1, col2 from table where REPEXP_LIKE(col3, '^data*')` `select col1, col2 from table where REGEXP_LIKE(col3, 'data$')` `select col1, col2 from table where REGEXP_LIKE(col3, '*data*') `select col1, col2 from table where TEXT_MATCH(col3, '/data*/') ... ` The plan is to generated the derived columns that persisted prefix, postfix, and ngram to use inverted indexes to filter the result fast, and add the text match indexes to do validation after filtering to avoid false positive result. This patch is created to generate prefix, postfix, and ngrams for a field. it can be used by the following transformation config ``` { "columnName": "col_prefix", "transformFunction": "prefix(col, 3, null)" }, { "columnName": "col_prefix", "transformFunction": "suffix(col, 3, null)" }, { "columnName": "col_prefix", "transformFunction": "ngram(col, 3)" }, { "columnName": "col_prefix", "transformFunction": "ngram(col, 1, 3)" }, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org