spark git commit: [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite

pwendell Fri, 13 Feb 2015 17:46:04 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.2 c2b4633f0 -> 26410a2d3



[SPARK-5227] [SPARK-5679] Disable FileSystem cache in 
WholeTextFileRecordReaderSuite

This patch fixes two difficult-to-reproduce Jenkins test failures in 
InputOutputMetricsSuite (SPARK-5227 and SPARK-5679).  The problem was that 
WholeTextFileRecordReaderSuite modifies the `fs.local.block.size` Hadoop 
configuration and this change was affecting subsequent test suites due to 
Hadoop's caching of FileSystem instances (see HADOOP-8490 for more details).

The fix implemented here is to disable FileSystem caching in 
WholeTextFileRecordReaderSuite.

Author: Josh Rosen <joshro...@databricks.com>

Closes #4599 from JoshRosen/inputoutputsuite-fix and squashes the following 
commits:

47dc447 [Josh Rosen] [SPARK-5227] [SPARK-5679] Disable FileSystem cache in 
WholeTextFileRecordReaderSuite

(cherry picked from commit d06d5ee9b33505774ef1e5becc01b47492f1a2dc)
Signed-off-by: Patrick Wendell <patr...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/26410a2d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/26410a2d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/26410a2d

Branch: refs/heads/branch-1.2
Commit: 26410a2d3d81ac90f1fd781041efe0c9b2882769
Parents: c2b4633
Author: Josh Rosen <joshro...@databricks.com>
Authored: Fri Feb 13 17:45:31 2015 -0800
Committer: Patrick Wendell <patr...@databricks.com>
Committed: Fri Feb 13 17:45:45 2015 -0800

----------------------------------------------------------------------
 .../spark/input/WholeTextFileRecordReaderSuite.scala    | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/26410a2d/core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
----------------------------------------------------------------------
diff --git 
a/core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
 
b/core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
index 98b0a16..2e58c15 100644
--- 
a/core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/input/WholeTextFileRecordReaderSuite.scala
@@ -28,7 +28,7 @@ import org.scalatest.FunSuite
 
 import org.apache.hadoop.io.Text
 
-import org.apache.spark.SparkContext
+import org.apache.spark.{SparkConf, SparkContext}
 import org.apache.spark.util.Utils
 import org.apache.hadoop.io.compress.{DefaultCodec, CompressionCodecFactory, 
GzipCodec}
 
@@ -42,7 +42,15 @@ class WholeTextFileRecordReaderSuite extends FunSuite with 
BeforeAndAfterAll {
   private var factory: CompressionCodecFactory = _
 
   override def beforeAll() {
-    sc = new SparkContext("local", "test")
+    // Hadoop's FileSystem caching does not use the Configuration as part of 
its cache key, which
+    // can cause Filesystem.get(Configuration) to return a cached instance 
created with a different
+    // configuration than the one passed to get() (see HADOOP-8490 for more 
details). This caused
+    // hard-to-reproduce test failures, since any suites that were run after 
this one would inherit
+    // the new value of "fs.local.block.size" (see SPARK-5227 and SPARK-5679). 
To work around this,
+    // we disable FileSystem caching in this suite.
+    val conf = new SparkConf().set("spark.hadoop.fs.file.impl.disable.cache", 
"true")
+
+    sc = new SparkContext("local", "test", conf)
 
     // Set the block size of local file system to test whether files are split 
right or not.
     sc.hadoopConfiguration.setLong("fs.local.block.size", 32)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite

Reply via email to