gortiz opened a new pull request, #13877: URL: https://github.com/apache/pinot/pull/13877
`RealtimeSegmentConverter` cannot use the realtime segment because it contains some virtual columns. For example the `$segmentName` and `$docId` will be different in the sealed segment. Therefore `RealtimeSegmentConverter` copies the schema, removing the virtual columns in the process. That copy was done manually in `RealtimeSegmentConverter` and was a partial copy. For example, the generated schema doesn't keep the schema name. By chance the fact that this copy was partial didn't affect the sealing process. But when https://github.com/apache/pinot/pull/11960 was added the partial copy in `RealtimeSegmentConverter` had an important side effect: column based null handling was lost. That means that the realtime segment contains null columns, but once it is sealed these vectors are ignored. This PR fixes that issue and adds some regression tests, but given Schema is mutable it is very difficult to verify that there are no more incorrect copies in the code. A refactor of the Schema class to make it more secure is needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org