deemoliu opened a new pull request, #12392:
URL: https://github.com/apache/pinot/pull/12392

   `feature`: Adding ngram, prefix, postfix UDFs
   
   Context:
   
   We are onboarding a use case and trying the inrease query throughput. We 
tested the QPS cannot further improved with the existing REGEXP_LIKE queries or 
text_match queries. The queries as follows;
   `select col1, col2 from table where REPEXP_LIKE(col3, '^data*')`
   `select col1, col2 from table where REGEXP_LIKE(col3, 'data$')`
   `select col1, col2 from table where REGEXP_LIKE(col3, '*data*')
   `select col1, col2 from table where TEXT_MATCH(col3, '/data*/') 
   ...
   `
   
   The plan is to generated the derived columns that persisted prefix, postfix, 
and ngram to use inverted indexes to filter the result fast, and add the text 
match indexes to do validation after filtering to avoid false positive result. 
   
   This patch is created to generate prefix, postfix, and ngrams for a field.
   it can be used by the following transformation config
   ```
          {
             "columnName": "col_prefix",
             "transformFunction": "prefix(col, 3, null)"
           },
          {
             "columnName": "col_prefix",
             "transformFunction": "suffix(col, 3, null)"
           },
          {
             "columnName": "col_prefix",
             "transformFunction": "ngram(col, 3)"
           },
          {
             "columnName": "col_prefix",
             "transformFunction": "ngram(col, 1, 3)"
           },
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to