Hi, In the last 24 hours our node managers keep crashing due to SIGSEGV.
The only info I could find was in the hs_err_XXXX.pid files which includes the following java stack: Stack: [0x00007f756a30f000,0x00007f756a410000], sp=0x00007f756a40dea0, free space=1019k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libleveldbjni-64-1-5625225739273738004.8+0x2aaac] leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned long)+0x7c Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fu sesource/leveldbjni/internal/NativeSlice;)J+0 j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesourc e/leveldbjni/internal/NativeSlice;)V+11 j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesour ce/leveldbjni/internal/NativeBuffer;)V+18 j org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36 j org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28 j org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10 j org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeDeletionTask(ILorg/apache/hadoop/yarn/proto/YarnServerNodemanagerRecoveryProtos$De letionServiceDeleteTaskProto;)V+32 j org.apache.hadoop.yarn.server.nodemanager.DeletionService.recordDeletionTaskInStateStore(Lorg/apache/hadoop/yarn/server/nodemanager/DeletionService$FileDeletionTask; )V+245 j org.apache.hadoop.yarn.server.nodemanager.DeletionService.delete(Ljava/lang/String;Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/fs/Path;)V+44 j org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run()V+271 v ~StubRoutines::call_stub The culprit seems to be [libleveldbjni-64-1-5625225739273738004.8+0x2aaac] leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned long)+0x7c Any ideas on what that is and how to solve it ? Thank you. Daniel
