(spark) branch master updated: [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J and PySpark library when initial lookup of Python Data Sources

gurwls223 Tue, 23 Jan 2024 19:57:40 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 2ae7575b9ed1 [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J 
and PySpark library when initial lookup of Python Data Sources
2ae7575b9ed1 is described below

commit 2ae7575b9ed16aadaeed0e8279df6d42d1eb813d
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Wed Jan 24 12:57:19 2024 +0900

    [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J and PySpark 
library when initial lookup of Python Data Sources
    
    ### What changes were proposed in this pull request?
    
    This PR partially reverts 
https://github.com/apache/spark/commit/d6334a3ba87c39fff6ace04e43e760d86674551e 
(with 
https://github.com/apache/spark/commit/b303eced7f8639887278db34e0080ffa0c19bd0c)
 by removing the check of Py4J and PySpark libraries when initial lookup of 
Python Data Sources.
    
    ### Why are the changes needed?
    
    We actually guard the case by try-catch already so we don't need to check 
the existence.
    
    Some people might want to use system-installed Py4J instead. In addition, 
PySpark source location might vary as well.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Virtually, no. The main change has not been released yet. It will relax the 
condition of loading initial Python Data Sources.
    
    ### How was this patch tested?
    
    Manually tested.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44858 from HyukjinKwon/SPARK-46530-followup2.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../apache/spark/sql/execution/datasources/DataSourceManager.scala | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
index ef18a3c67cf4..f63157b91efb 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
@@ -17,10 +17,8 @@
 
 package org.apache.spark.sql.execution.datasources
 
-import java.io.File
 import java.util.Locale
 import java.util.concurrent.ConcurrentHashMap
-import java.util.regex.Pattern
 
 import org.apache.spark.api.python.PythonUtils
 import org.apache.spark.internal.Logging
@@ -94,10 +92,7 @@ object DataSourceManager extends Logging {
   // Visible for testing
   private[spark] var dataSourceBuilders: Option[Map[String, 
UserDefinedPythonDataSource]] = None
   private lazy val shouldLoadPythonDataSources: Boolean = {
-    Utils.checkCommandAvailable(PythonUtils.defaultPythonExec) &&
-      // Make sure PySpark zipped files also exist.
-      PythonUtils.sparkPythonPath
-        .split(Pattern.quote(File.separator)).forall(new File(_).exists())
+    Utils.checkCommandAvailable(PythonUtils.defaultPythonExec)
   }
 
   private def normalize(name: String): String = name.toLowerCase(Locale.ROOT)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J and PySpark library when initial lookup of Python Data Sources

Reply via email to