dramaticlly commented on code in PR #7190:
URL: https://github.com/apache/iceberg/pull/7190#discussion_r1160365118
##########
core/src/main/java/org/apache/iceberg/PartitionsTable.java:
##########
@@ -96,27 +94,63 @@ private static StaticDataTask.Row
convertPartition(Partition partition) {
}
private static Iterable<Partition> partitions(Table table, StaticTableScan
scan) {
- CloseableIterable<FileScanTask> tasks = planFiles(scan);
Types.StructType normalizedPartitionType =
Partitioning.partitionType(table);
PartitionMap partitions = new PartitionMap();
// cache a position map needed by each partition spec to normalize
partitions to final schema
Map<Integer, int[]> normalizedPositionsBySpec =
Maps.newHashMapWithExpectedSize(table.specs().size());
- for (FileScanTask task : tasks) {
- PartitionData original = (PartitionData) task.file().partition();
- int[] normalizedPositions =
- normalizedPositionsBySpec.computeIfAbsent(
- task.spec().specId(),
- specId -> normalizedPositions(table, specId,
normalizedPartitionType));
+ int[] normalizedPositions =
+ normalizedPositionsBySpec.computeIfAbsent(
+ table.spec().specId(),
+ specId -> normalizedPositions(table, specId,
normalizedPartitionType));
+
+ CloseableIterable<DataFile> datafiles = planDataFiles(scan);
+
+ for (DataFile dataFile : datafiles) {
+ PartitionData original = (PartitionData) dataFile.partition();
PartitionData normalized =
normalizePartition(original, normalizedPartitionType,
normalizedPositions);
- partitions.get(normalized).update(task.file());
+ partitions.get(normalized).update(dataFile);
}
+
return partitions.all();
}
+ @VisibleForTesting
+ static CloseableIterable<DataFile> planDataFiles(StaticTableScan scan) {
Review Comment:
Thank you @ajantha-bhat. I guess as Szehon mentioned, to read delete files I
guess majority of code can be reused by it would be
`ManifestReader.readDeleteManifest()` instead and I assume the file count and
file size will be aggregated differently so I didn't move this to ContentFile
and keep the scope as refactoring.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]