pvary commented on PR #12298:
URL: https://github.com/apache/iceberg/pull/12298#issuecomment-3000257013

   > Let's say we want to add support for Vortex file format. Here are the 
steps needed after this effort
   > 
   > [api] add vortex to FileFormat enum
   > [vortex] implement VortexFileAccessFactory
   > [flink/spark] implement vortex row reader and writer
   > [flink/spark] add the factory implementation and register to the global 
factory (if keeping the registry model)
   >
   > Peter thought 2-4 can be consolidated to the `iceberg-vortex` module. But 
it means that `iceberg-vortex` module would depend on `iceberg-flink` and 
`iceberg-spark` modules.
   >
   > From my perspective, it does not seem right to have file format module 
(like `iceberg-vortex`) depends on engine module. The reverse dependency model 
(engine depends on file format module) makes more sense to me. The engine 
integration code for file format (readers for engine data type like Flink 
`RowData`) should exist in the engine module. I know it means a new file format 
support would requires changes in engine modules. But that would be the same 
for engines living outside iceberg repo.
   
   There is a part of the code where both Vortex/Spark and Vortex/Flink is 
needed. We can't get away from it.
   We have:
   1. Iceberg File Format API + Vortex specific code for the readers/writers
   2. Vortex + Spark specific code for the Spark readers
   3. Vortex + Flink specific code for the Flink readers
   
   We can add 2 to Spark, and add 3 to Flink, or we can decouple Spark and 
Flink and have an independent module for 2 and 3. Another option is to merge 1, 
2, 3 together.
   
   Maybe the best would be keeping all of them separate, but if the size is not 
too big, I would prefer to put all of them to a single module and share in a 
single jar.
   
   If we keep the Vortex/Spark, Vortex/Flink, or for that matter ORC/Spark and 
ORC/Flink in their separate modules, then the Registry based solution enables 
us to avoid changing the Spark/Flink code when we decide to add Vortex, or 
remove ORC support.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to