Repository: spark
Updated Branches:
  refs/heads/master f5ebb18c4 -> 8c49cebce


[SPARK-14966] SizeEstimator should ignore classes in the scala.reflect package

In local profiling, I noticed SizeEstimator spending tons of time estimating 
the size of objects which contain TypeTag or ClassTag fields. The problem with 
these tags is that they reference global Scala reflection objects, which, in 
turn, reference many singletons, such as TestHive. This throws off the accuracy 
of the size estimation and wastes tons of time traversing a huge object graph.

As a result, I think that SizeEstimator should ignore any classes in the 
`scala.reflect` package.

Author: Josh Rosen <[email protected]>

Closes #12741 from JoshRosen/ignore-scala-reflect-in-size-estimator.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8c49cebc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8c49cebc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8c49cebc

Branch: refs/heads/master
Commit: 8c49cebce572330fc84362662a9e3e8f7625bf5d
Parents: f5ebb18
Author: Josh Rosen <[email protected]>
Authored: Wed Apr 27 17:34:55 2016 -0700
Committer: Reynold Xin <[email protected]>
Committed: Wed Apr 27 17:34:55 2016 -0700

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/util/SizeEstimator.scala | 3 +++
 1 file changed, 3 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/8c49cebc/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala 
b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
index 6861a75..386fdfd 100644
--- a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
+++ b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
@@ -207,6 +207,9 @@ object SizeEstimator extends Logging {
     val cls = obj.getClass
     if (cls.isArray) {
       visitArray(obj, cls, state)
+    } else if (cls.getName.startsWith("scala.reflect")) {
+      // Many objects in the scala.reflect package reference global reflection 
objects which, in
+      // turn, reference many other large global objects. Do nothing in this 
case.
     } else if (obj.isInstanceOf[ClassLoader] || obj.isInstanceOf[Class[_]]) {
       // Hadoop JobConfs created in the interpreter have a ClassLoader, which 
greatly confuses
       // the size estimator since it references the whole REPL. Do nothing in 
this case. In


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to