robinsinghstudios opened a new issue, #9653: URL: https://github.com/apache/iceberg/issues/9653
### Query engine Iceberg Java API 1.4.3 ### Question For context, I am new to Java and might be missing something simple but, after being stuck on this issue for a long while, I decided to post my question here. I am using Iceberg 1.4.3 and Java 20. While using Iceberg's PartitionedFanoutWriter, I am able to dynamically generate the partitions and write the files successfully in the respective partitions. However, when I try to read the data, the partition column values are always null. My code looks like this: ``` org.apache.iceberg.Table table; PartitionSpec spec; SortOrder srt; if (catalog.tableExists(name)) { table = catalog.loadTable(name); } else { spec = PartitionSpec.builderFor(schema) .identity("temp") .build(); srt = SortOrder.builderFor(schema) .asc(keyColumn) .build(); Map<String, String> tblProps = new HashMap<>(); tblProps.put("write.parquet.compression-codec", "uncompressed"); tblProps.put("write.distribution-mode", "range"); tblProps.put("format-version", "2"); table = catalog.buildTable(name, schema).withPartitionSpec(spec).withProperties(tblProps).withSortOrder(srt).create(); } GenericAppenderFactory appenderFactory = new GenericAppenderFactory(table.schema()); int partitionId = 1, taskId = 1; OutputFileFactory outputFileFactory = OutputFileFactory.builderFor(table, partitionId, taskId).format(FileFormat.PARQUET).build(); appenderFactory.setAll(table.properties()); final PartitionKey partitionKey = new PartitionKey(table.spec(), table.spec().schema()); // partitionedFanoutWriter will auto partitioned record and create the partitioned writer PartitionedFanoutWriter<Record> partitionedFanoutWriter = new PartitionedFanoutWriter<>(table.spec(), FileFormat.PARQUET, appenderFactory, outputFileFactory, table.io(), TARGET_FILE_SIZE_IN_BYTES) { @Override protected PartitionKey partition(Record record) { partitionKey.partition(record); return partitionKey; } }; GenericRecord genericRecord = GenericRecord.create(table.schema()); // FanoutDataWriter value.forEach(val -> { try { GenericRecord record = genericRecord.copy(); val.toMap().forEach(record::setField); partitionedFanoutWriter.write(record); LOGGER.info(val.toString()); } catch (IOException e) { throw new RuntimeException(e); } }); AppendFiles appendFiles = table.newAppend(); // submit datafiles to the table try { Arrays.stream(partitionedFanoutWriter.dataFiles()).forEach(appendFiles::appendFile); } catch (IOException e) { throw new RuntimeException(e); } // submit snapshot appendFiles.apply(); appendFiles.commit(); ``` And I am reading the table like this: ``` table.refresh(); IcebergGenerics.ScanBuilder scanBuilder = IcebergGenerics.read(table); CloseableIterable<Record> result = scanBuilder.build(); for (Record r : result) { LOGGER.info(r.toString()); } ``` Table is being partitioned on "temp" column like I want but when I read the rows, this is the result: ``` Record(1022, first_name, last_name, annii...@noanswer.org, null, false) ``` I see null on the partitioned column. I would be great if anyone could help out with this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org