gortiz opened a new pull request, #13877:
URL: https://github.com/apache/pinot/pull/13877

   `RealtimeSegmentConverter` cannot use the realtime segment because it 
contains some virtual columns. For example the `$segmentName` and `$docId` will 
be different in the sealed segment. Therefore `RealtimeSegmentConverter` copies 
the schema, removing the virtual columns in the process.
   
   That copy was done manually in `RealtimeSegmentConverter` and was a partial 
copy. For example, the generated schema doesn't keep the schema name. By chance 
the fact that this copy was partial didn't affect the sealing process. But when 
https://github.com/apache/pinot/pull/11960 was added the partial copy in 
`RealtimeSegmentConverter` had an important side effect: column based null 
handling was lost.
   
   That means that the realtime segment contains null columns, but once it is 
sealed these vectors are ignored.
   
   This PR fixes that issue and adds some regression tests, but given Schema is 
mutable it is very difficult to verify that there are no more incorrect copies 
in the code. A refactor of the Schema class to make it more secure is needed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to