minwoo-jung opened a new issue, #15215: URL: https://github.com/apache/pinot/issues/15215
Hello, We are actively using Apache Pinot in the (open-source Pinpoint project)[https://github.com/pinpoint-apm/pinpoint], where we process large-scale data in real time. Since we store over 100TB of data, we rely heavily on the HDFS as Deep Storage feature. However, after upgrading Pinot from 1.2.0 to 1.3.0, we encountered an issue. ### Issue After upgrading Pinot from 1.2.0 to 1.3.0, the [HDFS as Deep Storage](https://docs.pinot.apache.org/basics/getting-started/hdfs-as-deepstorage) feature stopped functioning correctly. **Error Message** ``` [2025-03-06 11:08:48.037] WARN [DataStreamer] [Thread-85] DataStreamer Exception java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto tried to access method 'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' (org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed module of loader 'app') at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<init>(ClientNamenodeProtocolProtos.java:18353) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<clinit>(ClientNamenodeProtocolProtos.java:19955) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:486) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at jdk.proxy2/jdk.proxy2.$Proxy55.addBlock(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1143) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:2035) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1830) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:752) [pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] ``` ``` [2025-03-06 11:08:48.041] ERROR [LLCSegmentCompletionHandlers] [grizzly-http-server-7] Caught exception while uploading segment: systemMetricDouble__52__1760__20250305T2048Z from instance: SERVER_INSTANCE_NAME_SERVER_INSTANCE_NAME_SERVER_INSTANCE_NAME__SERVER_INSTANCE_NAME java.io.IOException: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto tried to access method 'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' (org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed module of loader 'app') at org.apache.hadoop.hdfs.ExceptionLastSeen.set(ExceptionLastSeen.java:45) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:862) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] Caused by: java.lang.IllegalAccessError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto tried to access method 'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' (org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed module of loader 'app') at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<init>(ClientNamenodeProtocolProtos.java:18353) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<clinit>(ClientNamenodeProtocolProtos.java:19955) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:486) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?] at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at jdk.proxy2/jdk.proxy2.$Proxy55.addBlock(Unknown Source) ~[?:?] at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1143) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:2035) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1830) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:752) ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134] ``` ### Root Cause - Pinot uses shaded dependencies, including `hadoop-common:3.4.1` and `hadoop-shaded-protobuf_3_21:1.2.0`. - The `hadoop-shaded-protobuf_3_21:1.2.0` library is explicitly declared in Pinot’s pom.xml. - However, using `hadoop-shaded-protobuf_3_21:1.2.0` blocks access to the LazyStringArrayList.emptyList() method, causing failures. - To avoid this issue, the Pinot package should use `hadoop-shaded-protobuf_3_25:1.3.0`, which is required by `hadoop-common:3.4.1`. ### Solution - Since Pinot uses a shaded packaging approach, users cannot directly modify dependencies. Therefore, Pinot should adjust its dependencies to resolve this issue. ### Additional Notes - The [HDFS as Deep Storage](https://docs.pinot.apache.org/basics/getting-started/hdfs-as-deepstorage) documentation needs to be updated—please review it. - Previously, Pinot did not include HDFS-related libraries in its package, requiring users to manually add them to CLASSPATH_PREFIX. - However, it seems that the current Pinot package includes HDFS-related libraries. The documentation should be updated accordingly. - I have looked into the related commits for this issue. - https://github.com/apache/pinot/commit/aeaf7cf8131014107e783da7d9ce2ef80923e728 - https://github.com/apache/pinot/commit/7879e07b172d7cabe48da496c748273b1a690446 ### I will submit a PR myself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org