Re: [PR] Flink 1.20: Support default values in Parquet reader [iceberg]

via GitHub Thu, 23 Jan 2025 08:55:14 -0800


amogh-jahagirdar commented on code in PR #11839:
URL: https://github.com/apache/iceberg/pull/11839#discussion_r1927328119



##########
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##########
@@ -199,41 +204,53 @@ public void testTwoLevelList() throws IOException {
     }
   }
 
-  private void writeAndValidate(Iterable<Record> iterable, Schema schema) 
throws IOException {
+  private void writeAndValidate(
+      Iterable<Record> iterable, Schema writeSchema, Schema expectedSchema) 
throws IOException {
     File testFile = File.createTempFile("junit", null, temp.toFile());
     assertThat(testFile.delete()).isTrue();
 
     try (FileAppender<Record> writer =
         Parquet.write(Files.localOutput(testFile))
-            .schema(schema)
+            .schema(writeSchema)
             .createWriterFunc(GenericParquetWriter::buildWriter)
             .build()) {
       writer.addAll(iterable);
     }
 
     try (CloseableIterable<RowData> reader =
         Parquet.read(Files.localInput(testFile))
-            .project(schema)
-            .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
+            .project((expectedSchema != null) ? expectedSchema : writeSchema)
+            .createReaderFunc(
+                type ->
+                    FlinkParquetReaders.buildReader(
+                        (expectedSchema != null) ? expectedSchema : 
writeSchema, type))

Review Comment:
   I feel like it's a bit easier to read if the caller of this 
`writeAndValidate` helper explicitly passes in the same `writeSchema` and 
`expectedSchema` (specifically for the case where there's no expected 
difference) instead of this helper having the null check and handling it 
internally. Not blocking on my side though



##########
flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java:
##########
@@ -199,41 +204,53 @@ public void testTwoLevelList() throws IOException {
     }
   }
 
-  private void writeAndValidate(Iterable<Record> iterable, Schema schema) 
throws IOException {
+  private void writeAndValidate(
+      Iterable<Record> iterable, Schema writeSchema, Schema expectedSchema) 
throws IOException {
     File testFile = File.createTempFile("junit", null, temp.toFile());
     assertThat(testFile.delete()).isTrue();
 
     try (FileAppender<Record> writer =
         Parquet.write(Files.localOutput(testFile))
-            .schema(schema)
+            .schema(writeSchema)
             .createWriterFunc(GenericParquetWriter::buildWriter)
             .build()) {
       writer.addAll(iterable);
     }
 
     try (CloseableIterable<RowData> reader =
         Parquet.read(Files.localInput(testFile))
-            .project(schema)
-            .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
+            .project((expectedSchema != null) ? expectedSchema : writeSchema)
+            .createReaderFunc(
+                type ->
+                    FlinkParquetReaders.buildReader(
+                        (expectedSchema != null) ? expectedSchema : 
writeSchema, type))
             .build()) {
       Iterator<Record> expected = iterable.iterator();
       Iterator<RowData> rows = reader.iterator();
-      LogicalType rowType = FlinkSchemaUtil.convert(schema);
+      LogicalType rowType = FlinkSchemaUtil.convert(writeSchema);
       for (int i = 0; i < NUM_RECORDS; i += 1) {
         assertThat(rows).hasNext();
-        TestHelpers.assertRowData(schema.asStruct(), rowType, expected.next(), 
rows.next());
+        TestHelpers.assertRowData(writeSchema.asStruct(), rowType, 
expected.next(), rows.next());
       }
       assertThat(rows).isExhausted();
     }
   }
 
   @Override
   protected void writeAndValidate(Schema schema) throws IOException {
-    writeAndValidate(RandomGenericData.generate(schema, NUM_RECORDS, 19981), 
schema);
+    writeAndValidate(RandomGenericData.generate(schema, NUM_RECORDS, 19981), 
schema, null);

Review Comment:
   In relation to my comment above, in this case I feel like we can just pass 
in `schema` twice. The helper then doesn't need to do the nullcheck/fallback



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink 1.20: Support default values in Parquet reader [iceberg]

Reply via email to