timsants commented on a change in pull request #6046: URL: https://github.com/apache/incubator-pinot/pull/6046#discussion_r497970567
########## File path: pinot-plugins/pinot-input-format/pinot-orc/src/main/java/org/apache/pinot/plugin/inputformat/orc/ORCRecordReader.java ########## @@ -72,7 +73,7 @@ private int _nextRowId; @Override - public void init(File dataFile, Set<String> fieldsToRead, @Nullable RecordReaderConfig recordReaderConfig) + public void init(File dataFile, @Nullable Set<String> fieldsToRead, @Nullable RecordReaderConfig recordReaderConfig) Review comment: I had the same initial thought and asked Neha the same thing. Its because ORC's columnar format doesn't quite fit the `RecordExtractor` interface. The method `GenericRow extract(T from, GenericRow to)` expects one record/row to be extracted but the ORC record reader is unique in how it reads rows in batches. In addition, ColumnVectors have an optimization in the case of repeating values in which the first row in the row batch contains the repeating value. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org