pvary commented on issue #13438:
URL: https://github.com/apache/iceberg/issues/13438#issuecomment-3088940093
Having a detailed document at this point would be premature.
The steps which are needed:
1. Create an API layer for readers and writers - #12298
2. Create a test framework to verify that the readers and writers are
working by the specification
- After this point if a file format (like Lance) decides that it wants
to support Iceberg tables, it can create a test implementation which could be
used for functional and performance testing
3. The community needs to decide if/how it wants to support new file
formats. I see the following (not exclusive) possibilities myself:
a. Add the new certified format to the FileFormat enum
b. Change the File Format enum to a String, and enable "any" file formats
c. Add an intermediate layer to allow testing of file formats which
implement this intermediate layer
The benefit of integrating a new file format could be:
1. Performance benefits, if the reading/writing of the format is more
effective than Parquet/ORC/Avro
2. Compatibility benefit, as more engine could read the data. This is more
pronounced with 3.c, since there is no need implement the readers and writers
for every FileFormat+Engine combination
3. Catalog integration benefit, as tables could be organized by a single
catalog.
About @jackye1995's document:
I like the integration solution proposed by him. There is one point, where
we need to enhance the current File Format API proposal for this. Currently the
readers and a writers are working on a single data stream. To enable the
integration mentioned in the proposal, we need to push the FileIO to the
readers and writers so they can open additional data/index files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]