kirkrodrigues opened a new pull request, #11006: URL: https://github.com/apache/pinot/pull/11006
tags: feature This adds a query rewriter to make it easier to call `clpDecode`. Recall that the [`CLPLogMessageDecoder`](https://github.com/apache/pinot/pull/9942) encodes unstructured log fields into three columns, each with a common prefix (and different suffixes). We refer to this common prefix as the column-group name. This PR adds support for calling `clpDecode` using the column-group name rather than specifying the three individual columns. E.g., if the `message` field was encoded with CLP, users currently have to call `clpDecode` as follows to reconstruct the field's original value: `clpDecode(message_logtype, message_dictionaryVars, message_encodedVars)` After this PR, users can call `clpDecode` as follows: `clpDecode(message)` To use the rewriter, users need to change their broker config to add `org.apache.pinot.sql.parsers.rewriter.CLPDecodeRewriter` to `pinot.broker.query.rewriter.class.names`. Assuming the default set of query rewriters, that would look like: ``` org.apache.pinot.sql.parsers.rewriter.CompileTimeFunctionsInvoker,org.apache.pinot.sql.parsers.rewriter.SelectionsRewriter,org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter,org.apache.pinot.sql.parsers.rewriter.OrdinalsUpdater,org.apache.pinot.sql.parsers.rewriter.CLPDecodeRewriter,org.apache.pinot.sql.parsers.rewriter.AliasApplier,org.apache.pinot.sql.parsers.rewriter.NonAggregationGroupByToDistinctQueryRewriter ``` Note that we added it before the `AliasApplier` so that any aliasing of `message` happens only after the `clpDecode` rewrite. This is part of the change requested in #9819 and described in this [design doc](https://docs.google.com/document/d/1nHZb37re4mUwEA258x3a2pgX13EWLWMJ0uLEDk1dUyU/edit#heading=h.x12tsj9ok16d). Note also that this is a precursor to `clpMatch` which will be a much more involved query rewriter. # Testing performed * Added new unit tests. * Validated fields encoded with CLP could be decoded correctly, using the column-group name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org