[
https://issues.apache.org/jira/browse/MAPREDUCE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490348#comment-17490348
]
Raman Chodźka commented on MAPREDUCE-4950:
------------------------------------------
I am also experiencing this same issue.
The culprit seems to be an exception which happens earlier. For example, in my
case there is an exception thrown inside eventHandlingThread in
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
{code}
2022-02-10 12:21:58,913 ERROR [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error writing
History Event:
org.apache.hadoop.mapreduce.jobhistory.MapAttemptFinishedEvent@5da2cfca
java.io.IOException: All datanodes
[DatanodeInfoWithStorage[195.201.110.185:50010,DS-fe52ee42-b47a-4ad1-8d4c-8400d6c95b18,DISK]]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1537)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1472)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1244)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:663)
{code}
The exception is thrown in eventHandlingThread when writing to EventWriter
which sends event to DatumWriter<Event> writer. JsonEncoder is used along with
DatumWriter. JsonEncoder uses Parser for some kind of validation during
serialization.
Apparently the aforementioned IOException leaves Parser in an invalid state
(also eventHandlingThread, probably, finishes execution).
Finally, when all tasks are complete, JobHistoryEventHandler in serviceStop()
tries to write an event via EventWriter which results in
{code}
2022-02-10 12:21:58,994 WARN [Thread-71]
org.apache.hadoop.service.CompositeService: When stopping the service
JobHistoryEventHandler : org.apache.avro.AvroTypeException: Attempt to process
a enum when a item-end was expected.
org.apache.avro.AvroTypeException: Attempt to process a enum when a item-end
was expected.
at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:234)
at
org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:59)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:67)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at
org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:95)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:1607)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:645)
at
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:443)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:104)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1855)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:222)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1293)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:653)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:732)
{code}
In my case I increased replication factor from 1 to 2 (I have such a small
replication factor because those datanodes belong to a QA environment) which
made "IOException: All datanodes .., are bad." error less likely.
One might also try setting {{mapreduce.jobhistory.jhist.format}} to {{binary}}
since BinaryEncoder doesn't seem to perform validation during serialization.
But I didn't check whether it works. Even if it does, if an exception is thrown
during writing an event to hdfs, the event might end up being partially written
potentially leaving events file in corrupt state.
> MR App Master fails to write the history due to AvroTypeException
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-4950
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4950
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver, mr-am
> Reporter: Devaraj Kavali
> Priority: Critical
>
> {code:xml}
> 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: In stop,
> writing event MAP_ATTEMPT_STARTED
> 2013-01-19 19:31:27,269 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.yarn.service.CompositeService: Error stopping
> JobHistoryEventHandler
> org.apache.avro.AvroTypeException: Attempt to process a enum when a
> array-start was expected.
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:93)
> at org.apache.avro.io.JsonEncoder.writeEnum(JsonEncoder.java:210)
> at
> org.apache.avro.specific.SpecificDatumWriter.writeEnum(SpecificDatumWriter.java:54)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
> at
> org.apache.hadoop.mapreduce.jobhistory.EventWriter.write(EventWriter.java:66)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$MetaInfo.writeEvent(JobHistoryEventHandler.java:825)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:517)
> at
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.stop(JobHistoryEventHandler.java:346)
> at
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
> at
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:445)
> at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler.handle(MRAppMaster.java:406)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:662)
> 2013-01-19 19:31:27,271 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory
> hdfs://hacluster /root/staging-dir/root/.staging/job_1358603069474_0135
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]