This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2ae7575b9ed1 [SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J
and PySpark library when initial lookup of Python Data Sources
2ae7575b9ed1 is described below
commit 2ae7575b9ed16aadaeed0e8279df6d42d1eb813d
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Wed Jan 24 12:57:19 2024 +0900
[SPARK-46530][PYTHON][SQL][FOLLOW-UP] Avoid checking Py4J and PySpark
library when initial lookup of Python Data Sources
### What changes were proposed in this pull request?
This PR partially reverts
https://github.com/apache/spark/commit/d6334a3ba87c39fff6ace04e43e760d86674551e
(with
https://github.com/apache/spark/commit/b303eced7f8639887278db34e0080ffa0c19bd0c)
by removing the check of Py4J and PySpark libraries when initial lookup of
Python Data Sources.
### Why are the changes needed?
We actually guard the case by try-catch already so we don't need to check
the existence.
Some people might want to use system-installed Py4J instead. In addition,
PySpark source location might vary as well.
### Does this PR introduce _any_ user-facing change?
Virtually, no. The main change has not been released yet. It will relax the
condition of loading initial Python Data Sources.
### How was this patch tested?
Manually tested.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44858 from HyukjinKwon/SPARK-46530-followup2.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.../apache/spark/sql/execution/datasources/DataSourceManager.scala | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
index ef18a3c67cf4..f63157b91efb 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceManager.scala
@@ -17,10 +17,8 @@
package org.apache.spark.sql.execution.datasources
-import java.io.File
import java.util.Locale
import java.util.concurrent.ConcurrentHashMap
-import java.util.regex.Pattern
import org.apache.spark.api.python.PythonUtils
import org.apache.spark.internal.Logging
@@ -94,10 +92,7 @@ object DataSourceManager extends Logging {
// Visible for testing
private[spark] var dataSourceBuilders: Option[Map[String,
UserDefinedPythonDataSource]] = None
private lazy val shouldLoadPythonDataSources: Boolean = {
- Utils.checkCommandAvailable(PythonUtils.defaultPythonExec) &&
- // Make sure PySpark zipped files also exist.
- PythonUtils.sparkPythonPath
- .split(Pattern.quote(File.separator)).forall(new File(_).exists())
+ Utils.checkCommandAvailable(PythonUtils.defaultPythonExec)
}
private def normalize(name: String): String = name.toLowerCase(Locale.ROOT)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]