spark git commit: [SPARK-6330] Fix filesystem bug in newParquet relation

adav Mon, 16 Mar 2015 12:17:26 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 684ff2476 -> 67fa6d1f8



[SPARK-6330] Fix filesystem bug in newParquet relation

If I'm running this locally and my path points to S3, this would currently 
error out because of incorrect FS.
I tested this in a scenario that previously didn't work, this change seemed to 
fix the issue.

Author: Volodymyr Lyubinets <[email protected]>

Closes #5020 from vlyubin/parquertbug and squashes the following commits:

a645ad5 [Volodymyr Lyubinets] Fix filesystem bug in newParquet relation


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/67fa6d1f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/67fa6d1f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/67fa6d1f

Branch: refs/heads/branch-1.3
Commit: 67fa6d1f830dee37244b5a30684d797093c7c134
Parents: 684ff24
Author: Volodymyr Lyubinets <[email protected]>
Authored: Mon Mar 16 12:13:18 2015 -0700
Committer: Aaron Davidson <[email protected]>
Committed: Mon Mar 16 12:14:41 2015 -0700

----------------------------------------------------------------------
 .../main/scala/org/apache/spark/sql/parquet/newParquet.scala    | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/67fa6d1f/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 234e6bb..c38b6e8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.parquet
 import java.io.IOException
 import java.lang.{Double => JDouble, Float => JFloat, Long => JLong}
 import java.math.{BigDecimal => JBigDecimal}
+import java.net.URI
 import java.text.SimpleDateFormat
 import java.util.{Date, List => JList}
 
@@ -244,11 +245,10 @@ private[sql] case class ParquetRelation2(
      * Refreshes `FileStatus`es, footers, partition spec, and table schema.
      */
     def refresh(): Unit = {
-      val fs = FileSystem.get(sparkContext.hadoopConfiguration)
-
       // Support either reading a collection of raw Parquet part-files, or a 
collection of folders
       // containing Parquet files (e.g. partitioned Parquet table).
       val baseStatuses = paths.distinct.map { p =>
+        val fs = FileSystem.get(URI.create(p), 
sparkContext.hadoopConfiguration)
         val qualified = fs.makeQualified(new Path(p))
 
         if (!fs.exists(qualified) && maybeSchema.isDefined) {
@@ -262,6 +262,7 @@ private[sql] case class ParquetRelation2(
 
       // Lists `FileStatus`es of all leaf nodes (files) under all base 
directories.
       val leaves = baseStatuses.flatMap { f =>
+        val fs = FileSystem.get(f.getPath.toUri, 
sparkContext.hadoopConfiguration)
         SparkHadoopUtil.get.listLeafStatuses(fs, f.getPath).filter { f =>
           isSummaryFile(f.getPath) ||
             !(f.getPath.getName.startsWith("_") || 
f.getPath.getName.startsWith("."))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-6330] Fix filesystem bug in newParquet relation

Reply via email to