[GitHub] [iceberg] sweetpythoncode opened a new issue, #7027: Iceberg add_files procedure with partition_filter scan non needed folders

via GitHub Mon, 06 Mar 2023 07:54:34 -0800


sweetpythoncode opened a new issue, #7027:
URL: https://github.com/apache/iceberg/issues/7027


   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   source structure example: 
`s3://bucket/data/id=123/name=test/date=321/result.orc`
   ```
   CALL iceberg_catalog.system.add_files(
       table => 'test.test_name',
       source_table => '`orc`.`s3://bucket/data/`',
       partition_filter => map('id', '3')
       check_duplicate_files => false
   ```
   `partition_filter` option does not handle the order of partition, which 
produces nested folders scanning until finding the first match. Should we run 
filter by partition in order before run nested `Listing leaf files and 
directories`?
   
   **Example of current flow:**
    
   ```
   s3://bucket/data/id=1/name=test/date=321/result.orc -> Listing leaf files 
and directories on each sub folder 
   s3://bucket/data/id=2/name=test/date=321/result.orc -> Listing leaf files 
and directories on each sub folder
   s3://bucket/data/id=3/name=test/date=321/result.orc -> Match needed 
partition_filter ignore next folders
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] sweetpythoncode opened a new issue, #7027: Iceberg add_files procedure with partition_filter scan non needed folders

Reply via email to