rajagopr opened a new pull request, #15359:
URL: https://github.com/apache/pinot/pull/15359

   Make https://github.com/apache/pinot/pull/15267 prod ready.
   
   Added support to apply enrichments to incoming records before the record is 
passed on to the Complex transformer. With this change, a record undergoes 
transformations in the following order within the transform pipeline:
   - Apply record pre-enricher transformations
   - Apply complex type transformations
   - Apply plain records transformations including enrichments
   
   ## Problem
   The ComplexType transformer helps un-nest and flatten records when desired. 
However, it's required that a valid field in the incoming data is specified 
within the complex type config. Often times, the incoming record does not have 
a fixed field name and has dynamic data (schema-less data).
   
   ## Solution
   The solution was to enrich the record before passing the record onto the 
complex transformer.
   
   ## Example
   
   Consider the following json input.
   ```
   {
     "cscores": {
       "ctype1": {
         "score": 7.93,
         "score_type": "modelName"
       },
       "ctype2": {
         "score": 56.0,
         "score_type": "modelName"
       },
       "ctype3": {
         "score": 8.5,
         "score_type": "modelName"
       }
     }
   }
   ```
   
   The enrichment configuration.
   
   ```
   {
     "enrichmentConfigs": [
       {
         "enricherType": "generateColumn",
         "properties": {
           "fieldToFunctionMap": {
             "c_score_array": "Groovy({ def outputList = cscores.collect { 
dimension, data -> return [dimension: dimension, score: data.score, score_type: 
data.score_type ]}}, cscores)"
           }
         }
       }
     ],
     "complexTypeConfig": {
       "fieldsToUnnest": [
         "c_score_array"
       ],
       "delimiter": ".",
       "collectionNotUnnestedToJson": "NON_PRIMITIVE"
     }
   }
   ```
   
   The table schema.
   
   ```
   {
     "schemaName": "cscores001",
     "dimensionFieldSpecs": [
       {
         "name": "c_score_array.dimension",
         "dataType": "STRING",
         "fieldType": "DIMENSION"
       },
       {
         "name": "c_score_array.score",
         "dataType": "FLOAT",
         "fieldType": "DIMENSION"
       },
       {
         "name": "c_score_array.score_type",
         "dataType": "STRING",
         "fieldType": "DIMENSION"
       }
     ]
   }
   ```
   
   Query results.
   <img width="1276" alt="image" 
src="https://github.com/user-attachments/assets/66efc2b1-475e-4029-a0bc-3a061e8fc44e";
 />
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to