This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 19440a146a4c [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the ML cache after each test 19440a146a4c is described below commit 19440a146a4c000d513eb926670d8f157089e5cb Author: Ruifeng Zheng <ruife...@apache.org> AuthorDate: Thu Apr 17 11:12:59 2025 +0800 [SPARK-51824][ML][CONNECT][TESTS] Force to clean up the ML cache after each test ### What changes were proposed in this pull request? Force to clean up the ML cache after each test ### Why are the changes needed? to make sure ml cache is clean for next test ### Does this PR introduce _any_ user-facing change? no, test-only ### How was this patch tested? manually check with ``` In [1]: from pyspark.ml.linalg import Vectors ...: from pyspark.ml.classification import * ...: from pyspark.ml.regression import * ...: ...: df = spark.createDataFrame([ ...: (1.0, 2.0, Vectors.dense(1.0)), ...: (0.0, 2.0, Vectors.sparse(1, [], []))], ["label", "weight", "features"]) ...: ...: lr = LinearRegression(regParam=0.0, solver="normal") ...: model = lr.fit(df) 25/04/17 09:18:02 WARN Instrumentation: [a5693be2] regParam is zero, which might cause numerical instability and overfitting. [--------------------------------------------------------------------------------] 0.00% Complete (0 Tasks running, 2s, Scanned 0.0 [--------------------------------------------------------------------------------] 0.00% Complete (0 Tasks running, 2s, Scanned 0.0 25/04/17 09:18:03 WARN InstanceBuilder: Failed to load implementation from:dev.ludovic.netlib.lapack.JNILAPACK [********************************************************************************] 100.00% Complete (0 Tasks running, 2s, Scanned 0.[********************************************************************************] 100.00% Complete (0 Tasks running, 2s, Scanned 0. In [2]: spark.client Out[2]: <pyspark.sql.connect.client.core.SparkConnectClient at 0x11854e510> In [3]: model LinearRegressionModel: uid=LinearRegression_a03f0711070f, numFeatures=1 In [4]: spark.client._cleanup_ml() In [5]: model --------------------------------------------------------------------------- SparkException Traceback (most recent call last) ... SparkException: [CONNECT_ML.CACHE_INVALID] Generic Spark Connect ML error. Cannot retrieve object c34300da-302f-4ec4-afd3-c82bba321eff from the ML cache. It is probably because the entry has been evicted. SQLSTATE: XX000 ``` ### Was this patch authored or co-authored using generative AI tooling? no Closes #50614 from zhengruifeng/ml_test_cleanup. Authored-by: Ruifeng Zheng <ruife...@apache.org> Signed-off-by: Ruifeng Zheng <ruife...@apache.org> --- python/pyspark/testing/connectutils.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/python/pyspark/testing/connectutils.py b/python/pyspark/testing/connectutils.py index e1e0356f4d42..20043bf9bba0 100644 --- a/python/pyspark/testing/connectutils.py +++ b/python/pyspark/testing/connectutils.py @@ -16,7 +16,6 @@ # import shutil import tempfile -import typing import os import functools import unittest @@ -188,6 +187,10 @@ class ReusedConnectTestCase(unittest.TestCase, SQLTestUtils, PySparkErrorTestUti shutil.rmtree(cls.tempdir.name, ignore_errors=True) cls.spark.stop() + def tearDown(self) -> None: + # force to clean up the ML cache after each test + self.spark.client._cleanup_ml() + def test_assert_remote_mode(self): from pyspark.sql import is_remote --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org