This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new d5c8a3180cfb [SPARK-54784][ML][DOCS] Document the security policy on 
ml models
d5c8a3180cfb is described below

commit d5c8a3180cfbf83ae0b5d0e1e78f52f631898736
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Feb 12 11:14:23 2026 -0800

    [SPARK-54784][ML][DOCS] Document the security policy on ml models
    
    ### What changes were proposed in this pull request?
    Document the security policy on ml models
    
    ### Why are the changes needed?
    for security purpose
    
    ### Does this PR introduce _any_ user-facing change?
    yes, doc-only change
    
    ### How was this patch tested?
    manually check
    
    <img width="1038" height="1206" alt="image" 
src="https://github.com/user-attachments/assets/f47b89dc-b8c4-4ff8-93b2-38070544aa4d";
 />
    
    ### Was this patch authored or co-authored using generative AI tooling?
    with ChatGPT
    
    Closes #54246 from zhengruifeng/doc_ml_security.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 docs/_includes/nav-left-wrapper-ml.html |  1 +
 docs/ml-security.md                     | 74 +++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/docs/_includes/nav-left-wrapper-ml.html 
b/docs/_includes/nav-left-wrapper-ml.html
index 00ac6cc0dbc7..9240cd6ac545 100644
--- a/docs/_includes/nav-left-wrapper-ml.html
+++ b/docs/_includes/nav-left-wrapper-ml.html
@@ -4,5 +4,6 @@
         {% include nav-left.html nav=include.nav-ml %}
         <h3><a href="mllib-guide.html">MLlib: RDD-based API Guide</a></h3>
         {% include nav-left.html nav=include.nav-mllib %}
+        <h3><a href="ml-security.html">ML Model Security</a></h3>
     </div>
 </div>
\ No newline at end of file
diff --git a/docs/ml-security.md b/docs/ml-security.md
new file mode 100644
index 000000000000..e295f83ebf37
--- /dev/null
+++ b/docs/ml-security.md
@@ -0,0 +1,74 @@
+---
+layout: global
+title: "ML Model Security"
+displayTitle: "Spark ML Model Security"
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# Overview
+
+In Apache Spark, loading a machine learning (ML) model is fundamentally 
equivalent to loading and executing code.
+Spark ML models often contain serialized objects, transformation logic, and 
execution graphs that are evaluated by the Spark runtime
+during model loading and inference.
+The principle is not unique to Spark, it applies equally to scikit-learn, 
PyTorch, TensorFlow, and other modern ML ecosystems.
+As a result, loading a model from an untrusted source introduces the same 
security risks as executing untrusted software.
+
+# Why Loading an ML Model Is Equivalent to Loading Code?
+
+Spark ML frameworks serialize not only data (such as weights and parameters) 
but also executable structures and behaviors.
+Because of this, model loading is not merely data parsing. It involves 
interpreting and executing instructions, which means a malicious model can:
+
+* Execute arbitrary commands
+* Access or exfiltrate data
+* Modify system state
+* Install backdoors or malware
+
+In practice, loading a model from an untrusted source is equivalent to running 
a program downloaded from the internet.
+
+# Security Implications
+
+Because Spark ML models can embed executable logic, loading untrusted models 
can lead to:
+
+* Remote code execution (RCE)
+* Data exfiltration from Spark jobs
+* Compromise of cluster nodes
+* Privilege escalation within Spark environments
+* Supply-chain attacks through model distribution
+
+These risks are amplified in distributed environments, where a malicious model 
may execute across multiple cluster nodes.
+
+# Responsibility of End Users
+
+Because loading ML models is equivalent to loading executable code, the 
responsibility for security ultimately lies with the end user or deploying 
organization.
+End users are responsible for ensuring that ML models are subject to the same 
security assessment, validation, and operational controls as any third-party 
software.
+This includes:
+
+* Verifying the source and authenticity of the model
+* Ensuring integrity and provenance
+* Applying organizational security policies
+* Performing risk assessments before deployment
+
+Frameworks and libraries can provide safeguards, but they cannot guarantee 
security when loading arbitrary third-party models.
+
+# Best Practices
+
+* Load models only from trusted and verified sources
+* Validate cryptographic hashes or digital signatures
+* Execute models in isolated environments
+* Restrict filesystem, network, and credential access
+* Keep Spark, ML libraries, and dependencies fully patched
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to