szehon-ho commented on code in PR #7120:
URL: https://github.com/apache/iceberg/pull/7120#discussion_r1146566963
##########
spark/v3.1/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java:
##########
@@ -566,12 +578,14 @@ private static class WriterFactory implements
DataWriterFactory, StreamingDataWr
protected WriterFactory(
Broadcast<Table> tableBroadcast,
FileFormat format,
+ PartitionSpec outputSpec,
Review Comment:
How about passing spec id and looking up PartitionSpec on the last part? It
will be less to serailize (I think WriterFactory is serialized and sent to
executors)
##########
spark/v3.1/spark/src/test/java/org/apache/iceberg/spark/sql/TestPartitionedWrites.java:
##########
@@ -157,4 +158,64 @@ public void testViewsReturnRecentResults() {
ImmutableList.of(row(1L, "a"), row(1L, "a")),
sql("SELECT * FROM tmp"));
}
+
+ @Test
+ public void testWriteWithOutputSpec() throws NoSuchTableException {
+ Table table = validationCatalog.loadTable(tableIdent);
+
+ final int originalSpecId = table.spec().specId();
+ table.updateSpec().addField("data").commit();
+
+ table.refresh();
+ sql("REFRESH TABLE %s", tableName);
+
+ // By default, we write to the current spec.
+ sql("INSERT INTO TABLE %s VALUES (10, 'a')", tableName);
+
+ List<Object[]> expected = ImmutableList.of(row(10L, "a",
table.spec().specId()));
+ assertEquals(
+ "Rows must match",
+ expected,
+ sql("SELECT id, data, _spec_id FROM %s WHERE id >= 10 ORDER BY id",
tableName));
+
+ // Output spec ID should be respected when present.
+ List<SimpleRecord> data =
+ ImmutableList.of(new SimpleRecord(11, "b"), new SimpleRecord(12, "c"));
+ spark
+ .createDataFrame(data, SimpleRecord.class)
+ .toDF()
+ .writeTo(tableName)
+ .option("output-spec-id", Integer.toString(originalSpecId))
+ .append();
+
+ expected =
+ ImmutableList.of(
+ row(10L, "a", table.spec().specId()),
+ row(11L, "b", originalSpecId),
+ row(12L, "c", originalSpecId));
+ assertEquals(
+ "Rows must match",
+ expected,
+ sql("SELECT id, data, _spec_id FROM %s WHERE id >= 10 ORDER BY id",
tableName));
Review Comment:
Can we check partitions table to be sure right partitions are added?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]