stevenzwu commented on PR #12298: URL: https://github.com/apache/iceberg/pull/12298#issuecomment-2998113103
Right now, there are two questions that Peter and I are discussing offline. 1) is it necessary to have a globally shared `FileAccessFactoryRegistry` in the iceberg-data module and have engine to register the factories class (e.g. `FlinkObjectModels`)? The usage pattern in the engine (like Flink) is not much different in both models (with or without registry). here is the usage pattern for with registry model. ``` WriteBuilder<?, RowType, RowData> builder = FileAccessFactoryRegistry.writeBuilder( format, FlinkObjectModels.FLINK_OBJECT_MODEL, EncryptedFiles.plainAsEncryptedOutput(outputFile)); ``` Here is the usage pattern for without registry model (from Peter's testing PR #13257). ``` WriteBuilder<?, RowType, RowData> builder = FlinkFileAccessor.INSTANCE.writeBuilder( format, EncryptedFiles.plainAsEncryptedOutput(outputFile)); ``` From my perspective, I don't see much value for sharing the Flink and Spark factory registration globally. Each engine knows the factory it should use. 2) Peter was thinking about consolidating engine integrations in the file format module. Let's say we want to add support for `Vortex` file format. Here are the steps needed after this effort 1. [api] add `vortex` to `FileFormat` enum 2. [vortex] implement `VortexFileAccessFactory` 3. [flink/spark] implement vortex row reader and writer 4. [flink/spark] add the factory implementation and register to the global factory (if keeping the registry model) Peter thought 2-4 can be consolidated to the `iceberg-vortex` module. But it means that `iceberg-vortex` module would depend on `iceberg-flink` and `iceberg-spark` modules. From my perspective, it is not right to have file format module (like `iceberg-vortex`) depends on engine module. The reverse dependency model (engine depends on file format module) makes more sense to me. The engine integration code for file format (readers for engine data type like Flink `RowData`) should exist in the engine module. I know it means a new file format support would requires changes in engine modules. But that would be the same for engines living outside iceberg repo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org