ahmedabu98 opened a new issue, #11900: URL: https://github.com/apache/iceberg/issues/11900
### Apache Iceberg version 1.7.1 (latest release) ### Query engine None ### Please describe the bug 🐞 Part of our workflow in Apache Beam's Iceberg connector requires recreating DataFiles, but this process isn't smooth when the file is partitioned by month or hour. See the following reproducible code: ```java org.apache.iceberg.Schema schema = new org.apache.iceberg.Schema( Types.NestedField.required(1, "month", Types.TimestampType.withoutZone()), Types.NestedField.required(2, "hour", Types.TimestampType.withoutZone())); PartitionSpec spec = PartitionSpec.builderFor(schema).month("month").hour("hour").build(); Table table = catalog.createTable(TableIdentifier.parse("db.table"), schema, spec); LocalDateTime val = LocalDateTime.parse("2024-10-08T13:18:20.053"); Record rec = GenericRecord.create(schema).copy( ImmutableMap.of( "month", val, "hour", val)); Record partitionableRec = getPartitionableRecord(rec, spec, schema); PartitionKey pk = new PartitionKey(spec, schema); pk.partition(partitionableRec); DataWriter<Record> writer = Parquet.writeData( table .io() .newOutputFile(table.locationProvider().newDataLocation(spec, pk, "test_file"))) .createWriterFunc(GenericParquetWriter::buildWriter) .schema(table.schema()) .withSpec(table.spec()) .withPartition(pk) .overwrite() .build(); writer.write(rec); writer.close(); DataFile file = writer.toDataFile(); // recreate data file using the original file DataFiles.builder(spec) .withPath(file.path().toString()) .withFormat(file.format()) .withPartition(file.partition()) .withFileSizeInBytes(file.fileSizeInBytes()) .withRecordCount(file.recordCount()) .withPartitionPath(spec.partitionToPath(file.partition())) .build(); ``` The last bit fails with the following error: ``` java.lang.NumberFormatException: For input string: "2024-10" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:652) at java.base/java.lang.Integer.valueOf(Integer.java:983) at org.apache.iceberg.types.Conversions.fromPartitionString(Conversions.java:51) at org.apache.iceberg.DataFiles.fillFromPath(DataFiles.java:86) at org.apache.iceberg.DataFiles$Builder.withPartitionPath(DataFiles.java:266) ``` I would expect that the result of `spec.partitionToPath(file.partition())` could be naturally used when recreating the DataFile, but the [logic here](https://github.com/apache/iceberg/blob/e3f50e5c62d01f3f31239d197ef281fc36cf31fa/core/src/main/java/org/apache/iceberg/DataFiles.java#L78-L87) doesn't seem to be robust enough. We've been able to use this [work around](https://github.com/apache/beam/blob/18ec3317e500a6fee72fc8c24552c21808437bef/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java#L237-L259), replicated below: <details> <summary><b>Work around</b></summary> ```java static String getPartitionDataPath( String partitionPath, Map<String, PartitionField> partitionFieldMap) { if (partitionPath.isEmpty() || partitionFieldMap.isEmpty()) { return partitionPath; } List<String> resolved = new ArrayList<>(); for (String partition : Splitter.on('/').splitToList(partitionPath)) { List<String> nameAndValue = Splitter.on('=').splitToList(partition); String name = nameAndValue.get(0); String value = nameAndValue.get(1); String transformName = Preconditions.checkArgumentNotNull(partitionFieldMap.get(name)).transform().toString(); if (Transforms.month().toString().equals(transformName)) { int month = YearMonth.parse(value).getMonthValue(); value = String.valueOf(month); } else if (Transforms.hour().toString().equals(transformName)) { long hour = ChronoUnit.HOURS.between(EPOCH, LocalDateTime.parse(value, HOUR_FORMATTER)); value = String.valueOf(hour); } resolved.add(name + "=" + value); } return String.join("/", resolved); } ``` </details> But I would expect the Iceberg API to take care of this by itself. ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org