minwoo-jung opened a new issue, #15215:
URL: https://github.com/apache/pinot/issues/15215

   Hello,
   
   We are actively using Apache Pinot in the (open-source Pinpoint 
project)[https://github.com/pinpoint-apm/pinpoint], where we process 
large-scale data in real time.
   Since we store over 100TB of data, we rely heavily on the HDFS as Deep 
Storage feature.
   However, after upgrading Pinot from 1.2.0 to 1.3.0, we encountered an issue.
   
   ### Issue
   
   After upgrading Pinot from 1.2.0 to 1.3.0, the [HDFS as Deep 
Storage](https://docs.pinot.apache.org/basics/getting-started/hdfs-as-deepstorage)
 feature stopped functioning correctly.
   
   **Error Message**
   ```
   [2025-03-06 11:08:48.037] WARN [DataStreamer] [Thread-85] DataStreamer 
Exception
   java.lang.IllegalAccessError: class 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 tried to access method 
'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList 
org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' 
(org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed 
module of loader 'app')
           at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<init>(ClientNamenodeProtocolProtos.java:18353)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<clinit>(ClientNamenodeProtocolProtos.java:19955)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:486)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
 ~[?:?]
           at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at jdk.proxy2/jdk.proxy2.$Proxy55.addBlock(Unknown Source) ~[?:?]
           at 
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1143) 
~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:2035)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1830)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:752) 
[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
   ```
   
   ```
   [2025-03-06 11:08:48.041] ERROR [LLCSegmentCompletionHandlers] 
[grizzly-http-server-7] Caught exception while uploading segment: 
systemMetricDouble__52__1760__20250305T2048Z from instance: 
SERVER_INSTANCE_NAME_SERVER_INSTANCE_NAME_SERVER_INSTANCE_NAME__SERVER_INSTANCE_NAME
   java.io.IOException: java.lang.IllegalAccessError: class 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 tried to access method 
'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList 
org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' 
(org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed 
module of loader 'app')
           at 
org.apache.hadoop.hdfs.ExceptionLastSeen.set(ExceptionLastSeen.java:45) 
~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:862) 
~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
   Caused by: java.lang.IllegalAccessError: class 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 tried to access method 
'org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList 
org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList.emptyList()' 
(org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto
 and org.apache.hadoop.thirdparty.protobuf.LazyStringArrayList are in unnamed 
module of loader 'app')
           at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<init>(ClientNamenodeProtocolProtos.java:18353)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AddBlockRequestProto.<clinit>(ClientNamenodeProtocolProtos.java:19955)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:486)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
 ~[?:?]
           at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at jdk.proxy2/jdk.proxy2.$Proxy55.addBlock(Unknown Source) ~[?:?]
           at 
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1143) 
~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:2035)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1830)
 ~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
           at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:752) 
~[pinot-orc-1.3.0-shaded.jar:1.3.0-c0023da298126af6a01b802a04b66da34ba16134]
   ```
   
   
   ### Root Cause
   - Pinot uses shaded dependencies, including `hadoop-common:3.4.1` and 
`hadoop-shaded-protobuf_3_21:1.2.0`.
   - The `hadoop-shaded-protobuf_3_21:1.2.0` library is explicitly declared in 
Pinot’s pom.xml.
   - However, using `hadoop-shaded-protobuf_3_21:1.2.0` blocks access to the 
LazyStringArrayList.emptyList() method, causing failures.
   - To avoid this issue, the Pinot package should use 
`hadoop-shaded-protobuf_3_25:1.3.0`, which is required by `hadoop-common:3.4.1`.
   
   ### Solution
   - Since Pinot uses a shaded packaging approach, users cannot directly modify 
dependencies.
   Therefore, Pinot should adjust its dependencies to resolve this issue.
   
   ### Additional Notes
   - The [HDFS as Deep 
Storage](https://docs.pinot.apache.org/basics/getting-started/hdfs-as-deepstorage)
 documentation needs to be updated—please review it.
   - Previously, Pinot did not include HDFS-related libraries in its package, 
requiring users to manually add them to CLASSPATH_PREFIX.
   - However, it seems that the current Pinot package includes HDFS-related 
libraries. The documentation should be updated accordingly.
   
   - I have looked into the related commits for this issue.
        - 
https://github.com/apache/pinot/commit/aeaf7cf8131014107e783da7d9ce2ef80923e728
        - 
https://github.com/apache/pinot/commit/7879e07b172d7cabe48da496c748273b1a690446
   
   
   ### I will submit a PR myself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to