Fokko opened a new issue, #7332: URL: https://github.com/apache/iceberg/issues/7332
### Feature Request / Improvement Playing around with `pyflink` and noticed that the Hadoop dependency is required when using the REST catalog: ```python ➜ ~ python3.9 Python 3.9.16 (main, Dec 7 2022, 10:06:04) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> >>> from pyflink.datastream import StreamExecutionEnvironment >>> >>> env = StreamExecutionEnvironment.get_execution_environment() >>> iceberg_flink_runtime_jar = "/Users/fokkodriesprong/Desktop/iceberg/flink/v1.17/flink-runtime/build/libs/iceberg-flink-runtime-1.17-1.3.0-SNAPSHOT.jar" >>> >>> env.add_jars("file://{}".format(iceberg_flink_runtime_jar)) >>> >>> from pyflink.table import StreamTableEnvironment >>> >>> table_env = StreamTableEnvironment.create(env) >>> >>> table_env.execute_sql(""" ... CREATE CATALOG tabular WITH ( ... 'type'='iceberg', ... 'catalog-type'='rest', ... 'uri'='https://api.tabular.io/ws', ... 'credential'='t-tcEe4Ihp4eM:pyTlx_4ayKV7N54gXuBmMotVFLU' ... ) ... """) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/homebrew/lib/python3.9/site-packages/pyflink/table/table_environment.py", line 837, in execute_sql return TableResult(self._j_tenv.executeSql(stmt)) File "/opt/homebrew/lib/python3.9/site-packages/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/opt/homebrew/lib/python3.9/site-packages/pyflink/util/exceptions.py", line 146, in deco return f(*a, **kw) File "/opt/homebrew/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o23.executeSql. : java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.iceberg.flink.FlinkCatalogFactory.clusterHadoopConf(FlinkCatalogFactory.java:211) at org.apache.iceberg.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:139) at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:414) at org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1466) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1212) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282) at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79) at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) ... 17 more ``` When using a Hadoop or Hive catalog, this makes perfect sense but would be nice to make it optional when using the REST catalog. ### Query engine Flink -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org