[PR] feat: make file scan task serializable [iceberg-rust]

via GitHub Tue, 21 May 2024 01:20:25 -0700


ZENOTME opened a new pull request, #377:
URL: https://github.com/apache/iceberg-rust/pull/377


   There is a user case of file scan task for the compute engine: 
   1. compute the file scan task and shuffle them to the compute node 
   2. The compute node does the scan work in parallel 
   
   In this case, it required the compute engine could:
   1. Access the `FileScanTask` directly 
   2. Serialize and Deserialize the `FileScanTask`
   
   I draft this PR to try to make them accessible. Serialize and Deserialize 
`ManifestEntry` needs to take more work and I find that the reader only needs 
the file path in ManifestEntry. Seems the metadata in ManifestEntry is used in 
the planning phase to prune the file. After the plan is complete, lots of 
metadata is not needed. We can add the metadata we need for scanning in the 
future. So I make the FileScanTask to contain the data file path only. Please 
let me know if this assumption is wrong. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] feat: make file scan task serializable [iceberg-rust]

Reply via email to