mayankshriv commented on a change in pull request #5873:
URL: https://github.com/apache/incubator-pinot/pull/5873#discussion_r471181999



##########
File path: 
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentCreationMapper.java
##########
@@ -353,8 +368,71 @@ protected void 
addAdditionalSegmentGeneratorConfigs(SegmentGeneratorConfig segme
       int sequenceId) {
   }
 
+  public void validateSchema(SegmentGeneratorConfig segmentGeneratorConfig, 
RecordReader recordReader) {
+    if (recordReader instanceof AvroRecordReader) {

Review comment:
       Seems like we will have to either write pairwise validators (pinot-avro, 
pinot-orc, pinot-json, etc). Or can write pair-wise schema converters 
(avro->pinot, orc->pinot, json->pinot), and then the schema validator will only 
compare two pinot schemas (one provided as input, other derived from format). 
At this point, I see pros/cons in both, but leaning towards former as it 
provides dedicated validation between formats.
   
   However, in either of the approaches, I'd recommend creating 
interfaces/impls. For example, an interface for validator (with pair-wise 
impls), or an interaface for converter (with pair-wise converters, and 
validator just works over interface).
   
   

##########
File path: 
pinot-plugins/pinot-batch-ingestion/v0_deprecated/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mappers/SegmentCreationMapper.java
##########
@@ -243,14 +257,15 @@ protected void map(LongWritable key, Text value, Context 
context)
     addAdditionalSegmentGeneratorConfigs(segmentGeneratorConfig, 
hdfsInputFile, sequenceId);
 
     _logger.info("Start creating segment with sequence id: {}", sequenceId);
-    SegmentIndexCreationDriver driver = new SegmentIndexCreationDriverImpl();
+    SegmentIndexCreationDriverImpl driver = new 
SegmentIndexCreationDriverImpl();

Review comment:
       Seems like we are breaking interface here, what' the reasoning for that? 
Either the api should be justified to be part of the interface, or the design 
is broken somehow.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to