gortiz opened a new pull request, #11907:
URL: https://github.com/apache/pinot/pull/11907

   As said in the title, this PR modifies `TimeValidationTransformer` to mark 
rows as invalid in case primary time column is out of range.
   
   `TimeValidationTransformer` has always verified that the received row 
contains a value for the primary time column that is between 1971 (inclusive) 
and 2071 (exclusive). In case the value is outside this range, 
`TimeValidationTransformer` does:
   1. set the field to null, which will later be set by `NullValueTransformer` 
to the millis since epoch at ingestion time
   2. if `tableConfig.ingestionConfig.continueOnError` is true, aborts the 
execution
   3. otherwise log a message in debug level
   
   The log in point 3 is not useful. In case it is disabled, no log is present. 
In case it is enabled, a log is printed for each invalid row. In cases where 
the error is present in most rows (like for example when the row contains 
seconds from epoch but schema is defined as millis from epoch) this log is very 
spammy.
   
   We already supported a way to mark a row as incorrect which is used by most 
transformers but not `TimeValidationTransformer`. This PR modifies 
`TimeValidationTransformer` to do so.
   
   cc @Jackie-Jiang @swaminathanmanish @snleee 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to