sweetpythoncode opened a new issue, #7027:
URL: https://github.com/apache/iceberg/issues/7027
### Apache Iceberg version
1.1.0 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
source structure example:
`s3://bucket/data/id=123/name=test/date=321/result.orc`
```
CALL iceberg_catalog.system.add_files(
table => 'test.test_name',
source_table => '`orc`.`s3://bucket/data/`',
partition_filter => map('id', '3')
check_duplicate_files => false
```
`partition_filter` option does not handle the order of partition, which
produces nested folders scanning until finding the first match. Should we run
filter by partition in order before run nested `Listing leaf files and
directories`?
**Example of current flow:**
```
s3://bucket/data/id=1/name=test/date=321/result.orc -> Listing leaf files
and directories on each sub folder
s3://bucket/data/id=2/name=test/date=321/result.orc -> Listing leaf files
and directories on each sub folder
s3://bucket/data/id=3/name=test/date=321/result.orc -> Match needed
partition_filter ignore next folders
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]