kirkrodrigues opened a new pull request, #11006:
URL: https://github.com/apache/pinot/pull/11006

   tags: feature
   
   This adds a query rewriter to make it easier to call `clpDecode`. Recall 
that the [`CLPLogMessageDecoder`](https://github.com/apache/pinot/pull/9942) 
encodes unstructured log fields into three columns, each with a common prefix 
(and different suffixes). We refer to this common prefix as the column-group 
name. This PR adds support for calling `clpDecode` using the column-group name 
rather than specifying the three individual columns.
   
   E.g., if the `message` field was encoded with CLP, users currently have to 
call `clpDecode` as follows to reconstruct the field's original value:
   
   `clpDecode(message_logtype, message_dictionaryVars, message_encodedVars)`
   
   After this PR, users can call `clpDecode` as follows:
   
   `clpDecode(message)`
   
   To use the rewriter, users need to change their broker config to add 
`org.apache.pinot.sql.parsers.rewriter.CLPDecodeRewriter` to 
`pinot.broker.query.rewriter.class.names`. Assuming the default set of query 
rewriters, that would look like:
   
   ```
   
org.apache.pinot.sql.parsers.rewriter.CompileTimeFunctionsInvoker,org.apache.pinot.sql.parsers.rewriter.SelectionsRewriter,org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter,org.apache.pinot.sql.parsers.rewriter.OrdinalsUpdater,org.apache.pinot.sql.parsers.rewriter.CLPDecodeRewriter,org.apache.pinot.sql.parsers.rewriter.AliasApplier,org.apache.pinot.sql.parsers.rewriter.NonAggregationGroupByToDistinctQueryRewriter
   ```
   Note that we added it before the `AliasApplier` so that any aliasing of 
`message` happens only after the `clpDecode` rewrite.
   
   This is part of the change requested in #9819 and described in this [design 
doc](https://docs.google.com/document/d/1nHZb37re4mUwEA258x3a2pgX13EWLWMJ0uLEDk1dUyU/edit#heading=h.x12tsj9ok16d).
   
   Note also that this is a precursor to `clpMatch` which will be a much more 
involved query rewriter.
   
   # Testing performed
   * Added new unit tests.
   * Validated fields encoded with CLP could be decoded correctly, using the 
column-group name.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to