Fokko opened a new issue, #7332:
URL: https://github.com/apache/iceberg/issues/7332

   ### Feature Request / Improvement
   
   Playing around with `pyflink` and noticed that the Hadoop dependency is 
required when using the REST catalog:
   
   ```python
   ➜  ~ python3.9                             
   Python 3.9.16 (main, Dec  7 2022, 10:06:04) 
   [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import os
   >>> 
   >>> from pyflink.datastream import StreamExecutionEnvironment
   >>> 
   >>> env = StreamExecutionEnvironment.get_execution_environment()
   >>> iceberg_flink_runtime_jar = 
"/Users/fokkodriesprong/Desktop/iceberg/flink/v1.17/flink-runtime/build/libs/iceberg-flink-runtime-1.17-1.3.0-SNAPSHOT.jar"
   >>> 
   >>> env.add_jars("file://{}".format(iceberg_flink_runtime_jar))
   >>> 
   >>> from pyflink.table import StreamTableEnvironment
   >>> 
   >>> table_env = StreamTableEnvironment.create(env)
   >>> 
   >>> table_env.execute_sql("""
   ... CREATE CATALOG tabular WITH (
   ...     'type'='iceberg', 
   ...     'catalog-type'='rest',
   ...     'uri'='https://api.tabular.io/ws',
   ...     'credential'='t-tcEe4Ihp4eM:pyTlx_4ayKV7N54gXuBmMotVFLU'
   ... )
   ... """)
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/opt/homebrew/lib/python3.9/site-packages/pyflink/table/table_environment.py", 
line 837, in execute_sql
       return TableResult(self._j_tenv.executeSql(stmt))
     File "/opt/homebrew/lib/python3.9/site-packages/py4j/java_gateway.py", 
line 1322, in __call__
       return_value = get_return_value(
     File 
"/opt/homebrew/lib/python3.9/site-packages/pyflink/util/exceptions.py", line 
146, in deco
       return f(*a, **kw)
     File "/opt/homebrew/lib/python3.9/site-packages/py4j/protocol.py", line 
326, in get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling o23.executeSql.
   : java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at 
org.apache.iceberg.flink.FlinkCatalogFactory.clusterHadoopConf(FlinkCatalogFactory.java:211)
        at 
org.apache.iceberg.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:139)
        at 
org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:414)
        at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.createCatalog(TableEnvironmentImpl.java:1466)
        at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1212)
        at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:765)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at 
org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at 
org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at 
org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
        at 
org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at 
org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
        at 
org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.conf.Configuration
        at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 17 more
   ```
   
   When using a Hadoop or Hive catalog, this makes perfect sense but would be 
nice to make it optional when using the REST catalog.
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to