This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new a74d50b95970 [SPARK-53156][CORE] Track Driver Memory Metrics when the 
Application ends
a74d50b95970 is described below

commit a74d50b959705a9f3d85ff72e915cfb76dfaa5c3
Author: ForVic <[email protected]>
AuthorDate: Sun Aug 31 23:56:48 2025 -0500

    [SPARK-53156][CORE] Track Driver Memory Metrics when the Application ends
    
    ### What changes were proposed in this pull request?
    Report a heartbeat on the driver when the application stops.
    
    ### Why are the changes needed?
    When the application proactively terminates due to some memory issues at 
the driver (SparkOOM, result size too large, etc...), due to metric sampling 
issues we will often miss this resourcing problem in the memory metrics and in 
the event log. We will abort the job before we capture accurate metrics for the 
driver. If we report an additional heartbeat (metric collection at the driver) 
on application termination than we will be able to better reflect the memory 
usage in the event log,  [...]
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested
    Tested with a custom job that collected a large amount of data to the 
driver, and otherwise had very low driver memory usage, (low # partitions no 
other data structures used at driver), without the change we witnessed that the 
peak memory usage at the driver was low <~100MiB, with this change we witness 
the higher memory usage reflected.
    <img width="1723" height="230" alt="image" 
src="https://github.com/user-attachments/assets/fb442550-a262-453e-b6e2-f47e1e9f11b1";
 />
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #51882 from ForVic/vsunderl/report_driver_heartbeat.
    
    Lead-authored-by: ForVic <[email protected]>
    Co-authored-by: Victor Sunderland <[email protected]>
    Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
---
 core/src/main/scala/org/apache/spark/Heartbeater.scala  | 7 +++++++
 core/src/main/scala/org/apache/spark/SparkContext.scala | 8 +++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/Heartbeater.scala 
b/core/src/main/scala/org/apache/spark/Heartbeater.scala
index 090458eecf18..8302aa17a994 100644
--- a/core/src/main/scala/org/apache/spark/Heartbeater.scala
+++ b/core/src/main/scala/org/apache/spark/Heartbeater.scala
@@ -48,6 +48,13 @@ private[spark] class Heartbeater(
     heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, 
TimeUnit.MILLISECONDS)
   }
 
+  /**
+   * Reports a heartbeat.
+   */
+  def doReportHeartbeat(): Unit = {
+    reportHeartbeat()
+  }
+
   /** Stops the heartbeat thread. */
   def stop(): Unit = {
     heartbeater.shutdown()
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index b38504c241c7..d65b5f297dad 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -2934,8 +2934,14 @@ class SparkContext(config: SparkConf) extends Logging {
     _driverLogger.foreach(_.startSync(_hadoopConfiguration))
   }
 
-  /** Post the application end event */
+  /** Post the application end event and report the final heartbeat */
   private def postApplicationEnd(exitCode: Int): Unit = {
+    try {
+      _heartbeater.doReportHeartbeat()
+    } catch {
+      case t: Throwable =>
+        logInfo("Unable to report driver heartbeat metrics when stopping spark 
context", t);
+    }
     listenerBus.post(SparkListenerApplicationEnd(System.currentTimeMillis, 
Some(exitCode)))
   }
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to