pvary commented on PR #12298: URL: https://github.com/apache/iceberg/pull/12298#issuecomment-3000257013
> Let's say we want to add support for Vortex file format. Here are the steps needed after this effort > > [api] add vortex to FileFormat enum > [vortex] implement VortexFileAccessFactory > [flink/spark] implement vortex row reader and writer > [flink/spark] add the factory implementation and register to the global factory (if keeping the registry model) > > Peter thought 2-4 can be consolidated to the `iceberg-vortex` module. But it means that `iceberg-vortex` module would depend on `iceberg-flink` and `iceberg-spark` modules. > > From my perspective, it does not seem right to have file format module (like `iceberg-vortex`) depends on engine module. The reverse dependency model (engine depends on file format module) makes more sense to me. The engine integration code for file format (readers for engine data type like Flink `RowData`) should exist in the engine module. I know it means a new file format support would requires changes in engine modules. But that would be the same for engines living outside iceberg repo. There is a part of the code where both Vortex/Spark and Vortex/Flink is needed. We can't get away from it. We have: 1. Iceberg File Format API + Vortex specific code for the readers/writers 2. Vortex + Spark specific code for the Spark readers 3. Vortex + Flink specific code for the Flink readers We can add 2 to Spark, and add 3 to Flink, or we can decouple Spark and Flink and have an independent module for 2 and 3. Another option is to merge 1, 2, 3 together. Maybe the best would be keeping all of them separate, but if the size is not too big, I would prefer to put all of them to a single module and share in a single jar. If we keep the Vortex/Spark, Vortex/Flink, or for that matter ORC/Spark and ORC/Flink in their separate modules, then the Registry based solution enables us to avoid changing the Spark/Flink code when we decide to add Vortex, or remove ORC support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org