steveloughran commented on PR #4352:
URL: https://github.com/apache/hadoop/pull/4352#issuecomment-1189482342

   
   I now have a branch of spark set up to use this, albeit not production ready
   
   
https://github.com/steveloughran/hadoop/tree/s3/HADOOP-17461-iostatisticsContext
   
   * TaskMetrics includes an IOStatisticsSnapshot in its serialized data
   * IOStatisticsContext is retrieved and reset at start of read/write work; 
updated at end.
   
   Based on where the read bytes/write bytes counters were read.
   
   Having done that, I've realised that the rdds and writers are not quite the 
correct place as the IOStats collect all IO on the thread, and are both the 
readers and writers will be working on that thread.
   
   The place to do it is actually in the ResultTask, with IOStatisticsSnapshot 
being one of the accumulators which is sent back. the spark driver would keep 
the core IOStatisticsSnapshot up to date with results from success and failure 
tasks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to