pvary commented on code in PR #10449: URL: https://github.com/apache/iceberg/pull/10449#discussion_r1635167684
########## mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java: ########## @@ -125,11 +125,9 @@ public List<InputSplit> getSplits(JobContext context) { } String schemaStr = conf.get(InputFormatConfig.READ_SCHEMA); if (schemaStr != null) { - scan.project(SchemaParser.fromJson(schemaStr)); - } - String[] selectedColumns = conf.getStrings(InputFormatConfig.SELECTED_COLUMNS); - if (selectedColumns != null) { - scan.select(selectedColumns); + scan = scan.project(SchemaParser.fromJson(schemaStr)); + } else if (conf.getStrings(InputFormatConfig.SELECTED_COLUMNS) != null) { Review Comment: This is a questionable decision in my mind. We allow the user to set columns and schema, but behind the scenes we decide to use only one of them? If this would be a new code, I would ask for a validation, and we should throw an exception if both of them are set. OTOH it is a bit questionable what we should do now, as some users might expect to get away with setting both, as they were ignored before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org