Repository: spark Updated Branches: refs/heads/branch-2.0 2e3ead20c -> e11046457
[SPARK-15649][SQL] Avoid to serialize MetastoreRelation in HiveTableScanExec ## What changes were proposed in this pull request? in HiveTableScanExec, schema is lazy and is related with relation.attributeMap. So it needs to serialize MetastoreRelation when serializing task binary bytes.It can avoid to serialize MetastoreRelation. ## How was this patch tested? Author: Lianhui Wang <[email protected]> Closes #13397 from lianhuiwang/avoid-serialize. (cherry picked from commit 2bfc4f15214a870b3e067f06f37eb506b0070a1f) Signed-off-by: Reynold Xin <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1104645 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1104645 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1104645 Branch: refs/heads/branch-2.0 Commit: e110464571554942bc261ab93ee9e6503bb12516 Parents: 2e3ead2 Author: Lianhui Wang <[email protected]> Authored: Tue May 31 09:21:51 2016 -0700 Committer: Reynold Xin <[email protected]> Committed: Tue May 31 09:21:56 2016 -0700 ---------------------------------------------------------------------- .../org/apache/spark/sql/hive/execution/HiveTableScanExec.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/e1104645/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala ---------------------------------------------------------------------- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala index e29864f..cc3e74b 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala @@ -152,8 +152,10 @@ case class HiveTableScanExec( } } val numOutputRows = longMetric("numOutputRows") + // Avoid to serialize MetastoreRelation because schema is lazy. (see SPARK-15649) + val outputSchema = schema rdd.mapPartitionsInternal { iter => - val proj = UnsafeProjection.create(schema) + val proj = UnsafeProjection.create(outputSchema) iter.map { r => numOutputRows += 1 proj(r) --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
