spark git commit: [SPARK-6330] Fix filesystem bug in newParquet relation

adav Mon, 16 Mar 2015 12:15:52 -0700

Repository: spark
Updated Branches:
  refs/heads/master 12a345adc -> d19efeddc



[SPARK-6330] Fix filesystem bug in newParquet relation

If I'm running this locally and my path points to S3, this would currently 
error out because of incorrect FS.
I tested this in a scenario that previously didn't work, this change seemed to 
fix the issue.

Author: Volodymyr Lyubinets <[email protected]>

Closes #5020 from vlyubin/parquertbug and squashes the following commits:

a645ad5 [Volodymyr Lyubinets] Fix filesystem bug in newParquet relation


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d19efedd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d19efedd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d19efedd

Branch: refs/heads/master
Commit: d19efeddc0cb710c9496af11e447d39e1ad61b31
Parents: 12a345a
Author: Volodymyr Lyubinets <[email protected]>
Authored: Mon Mar 16 12:13:18 2015 -0700
Committer: Aaron Davidson <[email protected]>
Committed: Mon Mar 16 12:13:18 2015 -0700

----------------------------------------------------------------------
 .../main/scala/org/apache/spark/sql/parquet/newParquet.scala    | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d19efedd/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
index 234e6bb..c38b6e8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/parquet/newParquet.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.parquet
 import java.io.IOException
 import java.lang.{Double => JDouble, Float => JFloat, Long => JLong}
 import java.math.{BigDecimal => JBigDecimal}
+import java.net.URI
 import java.text.SimpleDateFormat
 import java.util.{Date, List => JList}
 
@@ -244,11 +245,10 @@ private[sql] case class ParquetRelation2(
      * Refreshes `FileStatus`es, footers, partition spec, and table schema.
      */
     def refresh(): Unit = {
-      val fs = FileSystem.get(sparkContext.hadoopConfiguration)
-
       // Support either reading a collection of raw Parquet part-files, or a 
collection of folders
       // containing Parquet files (e.g. partitioned Parquet table).
       val baseStatuses = paths.distinct.map { p =>
+        val fs = FileSystem.get(URI.create(p), 
sparkContext.hadoopConfiguration)
         val qualified = fs.makeQualified(new Path(p))
 
         if (!fs.exists(qualified) && maybeSchema.isDefined) {
@@ -262,6 +262,7 @@ private[sql] case class ParquetRelation2(
 
       // Lists `FileStatus`es of all leaf nodes (files) under all base 
directories.
       val leaves = baseStatuses.flatMap { f =>
+        val fs = FileSystem.get(f.getPath.toUri, 
sparkContext.hadoopConfiguration)
         SparkHadoopUtil.get.listLeafStatuses(fs, f.getPath).filter { f =>
           isSummaryFile(f.getPath) ||
             !(f.getPath.getName.startsWith("_") || 
f.getPath.getName.startsWith("."))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-6330] Fix filesystem bug in newParquet relation

Reply via email to