(spark) branch master updated: [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the ML cache after each test

ruifengz Wed, 16 Apr 2025 20:13:30 -0700

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 19440a146a4c [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the 
ML cache after each test
19440a146a4c is described below

commit 19440a146a4c000d513eb926670d8f157089e5cb
Author: Ruifeng Zheng <ruife...@apache.org>
AuthorDate: Thu Apr 17 11:12:59 2025 +0800

    [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the ML cache after each 
test
    
    ### What changes were proposed in this pull request?
    Force to clean up the ML cache after each test
    
    ### Why are the changes needed?
    to make sure ml cache is clean for next test
    
    ### Does this PR introduce _any_ user-facing change?
    no, test-only
    
    ### How was this patch tested?
    manually check with
    
    ```
    In [1]: from pyspark.ml.linalg import Vectors
       ...: from pyspark.ml.classification import *
       ...: from pyspark.ml.regression import *
       ...:
       ...: df = spark.createDataFrame([
       ...:     (1.0, 2.0, Vectors.dense(1.0)),
       ...:     (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", 
"features"])
       ...:
       ...: lr = LinearRegression(regParam=0.0, solver="normal")
       ...: model = lr.fit(df)
    25/04/17 09:18:02 WARN Instrumentation: [a5693be2] regParam is zero, which 
might cause numerical instability and overfitting.
    
[--------------------------------------------------------------------------------]
 0.00% Complete (0 Tasks running, 2s, Scanned 0.0 
[--------------------------------------------------------------------------------]
 0.00% Complete (0 Tasks running, 2s, Scanned 0.0 25/04/17 09:18:03 WARN 
InstanceBuilder: Failed to load implementation 
from:dev.ludovic.netlib.lapack.JNILAPACK
    
[********************************************************************************]
 100.00% Complete (0 Tasks running, 2s, Scanned 
0.[********************************************************************************]
 100.00% Complete (0 Tasks running, 2s, Scanned 0.
    In [2]: spark.client
    Out[2]: <pyspark.sql.connect.client.core.SparkConnectClient at 0x11854e510>
    
    In [3]: model
    LinearRegressionModel: uid=LinearRegression_a03f0711070f, numFeatures=1
    
    In [4]: spark.client._cleanup_ml()
    
    In [5]: model
    ---------------------------------------------------------------------------
    SparkException                            Traceback (most recent call last)
    
    ...
    
    SparkException: [CONNECT_ML.CACHE_INVALID] Generic Spark Connect ML error. 
Cannot retrieve object c34300da-302f-4ec4-afd3-c82bba321eff from the ML cache. 
It is probably because the entry has been evicted. SQLSTATE: XX000
    
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #50614 from zhengruifeng/ml_test_cleanup.
    
    Authored-by: Ruifeng Zheng <ruife...@apache.org>
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
---
 python/pyspark/testing/connectutils.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/testing/connectutils.py 
b/python/pyspark/testing/connectutils.py
index e1e0356f4d42..20043bf9bba0 100644
--- a/python/pyspark/testing/connectutils.py
+++ b/python/pyspark/testing/connectutils.py
@@ -16,7 +16,6 @@
 #
 import shutil
 import tempfile
-import typing
 import os
 import functools
 import unittest
@@ -188,6 +187,10 @@ class ReusedConnectTestCase(unittest.TestCase, 
SQLTestUtils, PySparkErrorTestUti
         shutil.rmtree(cls.tempdir.name, ignore_errors=True)
         cls.spark.stop()
 
+    def tearDown(self) -> None:
+        # force to clean up the ML cache after each test
+        self.spark.client._cleanup_ml()
+
     def test_assert_remote_mode(self):
         from pyspark.sql import is_remote
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the ML cache after each test

Reply via email to