Re: [PR] DRAFT - Issue 10275 - Reward support for nulls [iceberg]

via GitHub Wed, 11 Sep 2024 12:11:09 -0700


slessard commented on code in PR #10953:
URL: https://github.com/apache/iceberg/pull/10953#discussion_r1720293723



##########
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java:
##########
@@ -262,6 +263,89 @@ public void testReadColumnFilter2() throws Exception {
         scan, NUM_ROWS_PER_MONTH, 12 * NUM_ROWS_PER_MONTH, 
ImmutableList.of("timestamp"));
   }
 
+  @Test
+  public void testReadColumnThatDoesNotExistInParquetSchema() throws Exception 
{
+    setMaxStackTraceElementsDisplayed(15);
+    rowsWritten = Lists.newArrayList();
+    tables = new HadoopTables();
+
+    Schema schema =
+        new Schema(
+            Types.NestedField.required(1, "a", Types.IntegerType.get()),
+            Types.NestedField.optional(2, "b", Types.IntegerType.get()));
+
+    PartitionSpec spec = PartitionSpec.builderFor(schema).build();
+    Table table1 = tables.create(schema, spec, tableLocation);
+
+    // Add one record to the table
+    GenericRecord rec = GenericRecord.create(schema);
+    rec.setField("a", 1);
+    List<GenericRecord> genericRecords = Lists.newArrayList();
+    genericRecords.add(rec);
+
+    AppendFiles appendFiles = table1.newAppend();
+    appendFiles.appendFile(writeParquetFile(table1, genericRecords));
+    appendFiles.commit();
+
+    // Alter the table schema by adding a new, optional column.
+    // Do not add any data for this new column in the one existing row in the 
table
+    // and do not insert any new rows into the table.
+    Table table = tables.load(tableLocation);
+    table.updateSchema().addColumn("z", Types.IntegerType.get()).commit();
+
+    // Select all columns, all rows from the table
+    TableScan scan = table.newScan().select("*");
+
+    List<String> columns = ImmutableList.of("a", "b", "z");
+    // Read the data and verify that the returned ColumnarBatches match 
expected rows.
+    int rowIndex = 0;
+    try (VectorizedTableScanIterable itr = new 
VectorizedTableScanIterable(scan, 1, false)) {
+      for (ColumnarBatch batch : itr) {
+        List<GenericRecord> expectedRows = rowsWritten.subList(rowIndex, 
rowIndex + 1);
+
+        Map<String, Integer> columnNameToIndex = Maps.newHashMap();
+        for (int i = 0; i < columns.size(); i++) {
+          columnNameToIndex.put(columns.get(i), i);
+        }
+        Set<String> columnSet = columnNameToIndex.keySet();
+
+        assertThat(batch.numRows()).isEqualTo(1);
+        assertThat(batch.numCols()).isEqualTo(columns.size());
+
+        checkColumnarArrayValues(
+            1,
+            expectedRows,
+            batch,
+            0,
+            columnSet,
+            "a",
+            (records, i) -> records.get(i).getField("a"),
+            ColumnVector::getInt);
+        checkColumnarArrayValues(
+            1,
+            expectedRows,
+            batch,
+            1,
+            columnSet,
+            "b",
+            (records, i) -> records.get(i).getField("b"),
+            (array, i) -> array.isNullAt(i) ? null : array.getInt(i));

Review Comment:
   This column does exist in the parquet schema and the column's value is null. 
But I do not know the correct way to read a null value from an optional int 
column.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] DRAFT - Issue 10275 - Reward support for nulls [iceberg]

Reply via email to