[GitHub] [incubator-pinot] timsants commented on a change in pull request #6046: Deep Extraction Support for ORC, Thrift, and ProtoBuf Records

GitBox Wed, 30 Sep 2020 21:02:56 -0700


timsants commented on a change in pull request #6046:
URL: https://github.com/apache/incubator-pinot/pull/6046#discussion_r497970567




##########
File path: 
pinot-plugins/pinot-input-format/pinot-orc/src/main/java/org/apache/pinot/plugin/inputformat/orc/ORCRecordReader.java
##########
@@ -72,7 +73,7 @@
   private int _nextRowId;
 
   @Override
-  public void init(File dataFile, Set<String> fieldsToRead, @Nullable 
RecordReaderConfig recordReaderConfig)
+  public void init(File dataFile, @Nullable Set<String> fieldsToRead, 
@Nullable RecordReaderConfig recordReaderConfig)

Review comment:
       I had the same initial thought and asked Neha the same thing.
   
   Its because ORC's columnar format doesn't quite fit the `RecordExtractor` 
interface. The method `GenericRow extract(T from, GenericRow to)` expects one 
record/row to be extracted but the ORC record reader is unique in how it reads 
rows in batches. In addition, ColumnVectors have an optimization in the case of 
repeating values in which the first row in the row batch contains the repeating 
value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [incubator-pinot] timsants commented on a change in pull request #6046: Deep Extraction Support for ORC, Thrift, and ProtoBuf Records

Reply via email to