spark git commit: [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection

2015-07-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 ff5e5f228 -> 712e13bba [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection Target file(s) can be hosted on CDN nodes. HTTP/HTTPS redirection must be supported to download these files. Author: Cheng

spark git commit: [SPARK-9385] [HOT-FIX] [PYSPARK] Comment out Python style check

2015-07-27 Thread yhuai
job/Spark-Master-SBT/3088/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/console Author: Yin Huai Closes #7702 from yhuai/SPARK-9385 and squashes the following commits: 146e6ef [Yin Huai] Comment out Python style check because of error shown in https://amplab.cs.berkeley.edu/jenkins/job/Sp

spark git commit: [SPARK-9385] [PYSPARK] Enable PEP8 but disable installing pylint.

2015-07-27 Thread yhuai
uai Closes #7704 from yhuai/SPARK-9385 and squashes the following commits: 0056359 [Yin Huai] Enable PEP8 but disable installing pylint. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dafe8d85 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-9386] [SQL] Feature flag for metastore partition pruning

2015-07-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8ddfa52c2 -> ce89ff477 [SPARK-9386] [SQL] Feature flag for metastore partition pruning Since we have been seeing a lot of failures related to this new feature, lets put it behind a flag and turn it off by default. Author: Michael Armbrust

spark git commit: [SPARK-9422] [SQL] Remove the placeholder attributes used in the aggregation buffers

2015-07-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e78ec1a8f -> 3744b7fd4 [SPARK-9422] [SQL] Remove the placeholder attributes used in the aggregation buffers https://issues.apache.org/jira/browse/SPARK-9422 Author: Yin Huai Closes #7737 from yhuai/removePlaceHolder and squashes

spark git commit: [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility

2015-07-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7bbf02f0b -> 5363ed715 [SPARK-9361] [SQL] Refactor new aggregation code to reduce the times of checking compatibility JIRA: https://issues.apache.org/jira/browse/SPARK-9361 Currently, we call `aggregate.Utils.tryConvert` in many places to

spark git commit: [SPARK-8640] [SQL] Enable Processing of Multiple Window Frames in a Single Window Operator

2015-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0a1d2ca42 -> 39ab199a3 [SPARK-8640] [SQL] Enable Processing of Multiple Window Frames in a Single Window Operator This PR enables the processing of multiple window frames in a single window operator. This should improve the performance of

spark git commit: [SPARK-9466] [SQL] Increate two timeouts in CliSuite.

2015-07-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fbef566a1 -> 815c8245f [SPARK-9466] [SQL] Increate two timeouts in CliSuite. Hopefully this can resolve the flakiness of this suite. JIRA: https://issues.apache.org/jira/browse/SPARK-9466 Author: Yin Huai Closes # from yhuai/SP

spark git commit: [SPARK-9233] [SQL] Enable code-gen in window function unit tests

2015-07-31 Thread yhuai
233 Author: Yin Huai Closes #7832 from yhuai/SPARK-9233 and squashes the following commits: 4e4e4cc [Yin Huai] style ca80e07 [Yin Huai] Test window function with codegen. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3fc0cb92 T

spark git commit: [SPARK-2205] [SQL] Avoid unnecessary exchange operators in multi-way joins

2015-08-02 Thread yhuai
rom t1 join t2 on (t1.x = t2.x) join t3 on (t2.x = t3.x)` will only have three Exchange operators (when shuffled joins are needed) instead of four. The code in this PR was authored by yhuai; I'm opening this PR to factor out this change from #7685, a larger pull request which contains t

spark git commit: [SPARK-9141] [SQL] Remove project collapsing from DataFrame API

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 34dcf1010 -> 23d982204 [SPARK-9141] [SQL] Remove project collapsing from DataFrame API Currently we collapse successive projections that are added by `withColumn`. However, this optimization violates the constraint that adding nodes to a

spark git commit: [SPARK-9141] [SQL] Remove project collapsing from DataFrame API

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 eedb996dd -> 125827a4f [SPARK-9141] [SQL] Remove project collapsing from DataFrame API Currently we collapse successive projections that are added by `withColumn`. However, this optimization violates the constraint that adding nodes t

spark git commit: [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7a969a696 -> 1f8c364b9 [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920 This is a follow-up of https://github.com/apache/spark/pull/7920 to fix comments. Author: Yin Huai Closes #7964 from yhuai/SPARK-9141-follow-up and squashes

spark git commit: [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 03bcf627d -> 19018d542 [SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920 This is a follow-up of https://github.com/apache/spark/pull/7920 to fix comments. Author: Yin Huai Closes #7964 from yhuai/SPARK-9141-follow-up and squas

spark git commit: [SPARK-9649] Fix flaky test MasterSuite - randomize ports

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master eb5b8f4a6 -> 5f0fb6466 [SPARK-9649] Fix flaky test MasterSuite - randomize ports ``` Error Message Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! Stacktrace java.net.BindException: Failed to bind

spark git commit: [SPARK-9649] Fix flaky test MasterSuite - randomize ports

2015-08-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b8136d7e0 -> 05cbf133d [SPARK-9649] Fix flaky test MasterSuite - randomize ports ``` Error Message Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! Stacktrace java.net.BindException: Failed to

spark git commit: [SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap

2015-08-05 Thread yhuai
an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. We will not not spill a InMemorySorter if it is empty. 3. We will not add a SpillReader to a SpillMerger if this SpillReader is empty. JIRA: https://issues.apache.org/jira/browse/SPARK-9611 Author: Yin Huai Closes #7948 from yh

spark git commit: [SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap

2015-08-05 Thread yhuai
an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. We will not not spill a InMemorySorter if it is empty. 3. We will not add a SpillReader to a SpillMerger if this SpillReader is empty. JIRA: https://issues.apache.org/jira/browse/SPARK-9611 Author: Yin Huai Closes #7948 from yh

spark git commit: [SPARK-9593] [SQL] Fixes Hadoop shims loading

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 c39d5d144 -> 11c28a568 [SPARK-9593] [SQL] Fixes Hadoop shims loading This PR is used to workaround CDH Hadoop versions like 2.0.0-mr1-cdh4.1.1. Internally, Hive `ShimLoader` tries to load different versions of Hadoop shims by checking

spark git commit: [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 93085c992 -> 9f94c85ff [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust This is a follow-up of #7929. We found that Jenkins SBT master build still fails because of the Hadoop shims loading issue. But the failure

spark git commit: [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 11c28a568 -> cc4c569a8 [SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more robust This is a follow-up of #7929. We found that Jenkins SBT master build still fails because of the Hadoop shims loading issue. But the fail

spark git commit: [SPARK-9632] [SQL] [HOT-FIX] Fix build.

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 2382b483a -> b51159def [SPARK-9632] [SQL] [HOT-FIX] Fix build. seems https://github.com/apache/spark/pull/7955 breaks the build. Author: Yin Huai Closes #8001 from yhuai/SPARK-9632-fixBuild and squashes the following comm

spark git commit: [SPARK-9632] [SQL] [HOT-FIX] Fix build.

2015-08-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2eca46a17 -> cdd53b762 [SPARK-9632] [SQL] [HOT-FIX] Fix build. seems https://github.com/apache/spark/pull/7955 breaks the build. Author: Yin Huai Closes #8001 from yhuai/SPARK-9632-fixBuild and squashes the following commits: 6c25

spark git commit: [SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlying rows.

2015-08-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 ff0abca2b -> 70bf170b9 [SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlying rows. JoinedRow.anyNull currently loops through every field to check for null, which is inefficient if the underlying rows are UnsafeRows. It

spark git commit: [SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlying rows.

2015-08-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2432c2e23 -> 9897cc5e3 [SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlying rows. JoinedRow.anyNull currently loops through every field to check for null, which is inefficient if the underlying rows are UnsafeRows. It sho

spark git commit: [SPARK-9674] Re-enable ignored test in SQLQuerySuite

2015-08-07 Thread yhuai
tly before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass. JoshRosen yhuai Author: Andrew Or Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits: 225eac2 [Andrew Or] Merge branch 'master' of github.com:apac

spark git commit: [SPARK-9674] Re-enable ignored test in SQLQuerySuite

2015-08-07 Thread yhuai
tly before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass. JoshRosen yhuai Author: Andrew Or Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits: 225eac2 [Andrew Or] Merge branch 'master' of github.com:apac

spark git commit: [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan

2015-08-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 874b9d855 -> 251d1eef4 [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan JIRA: https://issues.apache.org/jira/browse/SPARK-6212 Author: Yijie Shen Closes #7986 from yjshen/ctas_explain and squashes the follow

spark git commit: [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan

2015-08-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 25c363e93 -> 3ca995b78 [SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed plan JIRA: https://issues.apache.org/jira/browse/SPARK-6212 Author: Yijie Shen Closes #7986 from yjshen/ctas_explain and squashes the following

spark git commit: [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e9c36938b -> 68ccc6e18 [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions Author: Yijie Shen Closes #8057 from yjshen/explode_star and squashes the following commits: e

spark git commit: [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b12f0737f -> 1ce5061bb [SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages if DataFrame#explode takes a star in expressions Author: Yijie Shen Closes #8057 from yjshen/explode_star and squashes the following commits

spark git commit: [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a863348fd -> 23cf5af08 [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary shuffles.

spark git commit: [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles

2015-08-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 1ce5061bb -> 323d68606 [SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnecessary shuffles This pull request refactors the `EnsureRequirements` planning rule in order to avoid the addition of certain unnecessary shuff

spark git commit: [SPARK-9743] [SQL] Fixes JSONRelation refreshing

2015-08-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 f75c64b0c -> 94b2f5b32 [SPARK-9743] [SQL] Fixes JSONRelation refreshing PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the pat

spark git commit: [SPARK-9743] [SQL] Fixes JSONRelation refreshing

2015-08-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master be80def0d -> e3fef0f9e [SPARK-9743] [SQL] Fixes JSONRelation refreshing PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the path of

spark git commit: [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d378396f8 -> dfe347d2c [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions matters

spark git commit: [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 84ba990f2 -> efcae3a70 [SPARK-9785] [SQL] HashPartitioning compatibility should consider expression ordering HashPartitioning compatibility is currently defined w.r.t the _set_ of expressions, but the ordering of those expressions mat

[2/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
[SPARK-9646] [SQL] Add metrics for all join and aggregate operators This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the it

[1/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 71460b889 -> 767ee1884 http://git-wip-us.apache.org/repos/asf/spark/blob/767ee188/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --

[1/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5b8bb1b21 -> 5831294a7 http://git-wip-us.apache.org/repos/asf/spark/blob/5831294a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala -- diff

[2/2] spark git commit: [SPARK-9646] [SQL] Add metrics for all join and aggregate operators

2015-08-11 Thread yhuai
[SPARK-9646] [SQL] Add metrics for all join and aggregate operators This PR added metrics for all join and aggregate operators. However, I found the metrics may be confusing in the following two case: 1. The iterator is not totally consumed and the metric values will be less. 2. Recreating the it

spark git commit: [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 b7497e3a2 -> ec7a4b9b0 [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option,

spark git commit: [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible

2015-08-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5a5bbc299 -> afa757c98 [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so w

spark git commit: [SPARK-10005] [SQL] Fixes schema merging for nested structs

2015-08-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master cf016075a -> ae2370e72 [SPARK-10005] [SQL] Fixes schema merging for nested structs In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly handle

spark git commit: [SPARK-10005] [SQL] Fixes schema merging for nested structs

2015-08-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 e2c6ef810 -> 90245f65c [SPARK-10005] [SQL] Fixes schema merging for nested structs In case of schema merging, we only handled first level fields when converting Parquet groups to `InternalRow`s. Nested struct fields are not properly ha

spark git commit: [SPARK-7289] [SPARK-9949] [SQL] Backport SPARK-7289 and SPARK-9949 to branch 1.4

2015-08-17 Thread yhuai
1.4 (https://github.com/apache/spark/pull/6780). Also, we need to backport the fix of `TakeOrderedAndProject` as well (https://github.com/apache/spark/pull/8179). Author: Wenchen Fan Author: Yin Huai Closes #8252 from yhuai/backport7289And9949. Project: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 43e013542 -> b4f4e91c3 [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to 1.5 o

spark git commit: [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation.

2015-08-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 675e22494 -> 5be517584 [SPARK-10100] [SQL] Eliminate hash table lookup if there is no grouping key in aggregation. This improves performance by ~ 20 - 30% in one of my local test and should fix the performance regression from 1.4 to 1

spark git commit: [SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the min split size if necessary.

2015-08-21 Thread yhuai
and 42s), but it is much better than our master. Author: Yin Huai Closes #8346 from yhuai/parquetMinSplit. (cherry picked from commit e3355090d4030daffed5efb0959bf1d724c13c13) Signed-off-by: Yin Huai Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apach

spark git commit: [SPARK-10143] [SQL] Use parquet's block size (row group size) setting as the min split size if necessary.

2015-08-21 Thread yhuai
and 42s), but it is much better than our master. Author: Yin Huai Closes #8346 from yhuai/parquetMinSplit. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3355090 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3355

spark git commit: [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class.

2016-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 4148a9c2c -> 6871deb93 [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class. ## What changes were proposed in this pull request? Refactoring: Separated ORC serialization logic from OrcOutputWrit

spark git commit: [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class.

2016-05-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 201a51f36 -> c18fa464f [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class. ## What changes were proposed in this pull request? Refactoring: Separated ORC serialization logic from OrcOutputWriter a

spark git commit: [SPARK-15532][SQL] SQLContext/HiveContext's public constructors should use SparkSession.build.getOrCreate

2016-05-26 Thread yhuai
#x27;s public constructor to use SparkSession.build.getOrCreate and removes isRootContext from SQLContext. ## How was this patch tested? Existing tests. Author: Yin Huai Closes #13310 from yhuai/SPARK-15532. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-15532][SQL] SQLContext/HiveContext's public constructors should use SparkSession.build.getOrCreate

2016-05-26 Thread yhuai
#x27;s public constructor to use SparkSession.build.getOrCreate and removes isRootContext from SQLContext. ## How was this patch tested? Existing tests. Author: Yin Huai Closes #13310 from yhuai/SPARK-15532. (cherry picked from commit 3ac2363d757cc9cebc627974f17ecda3a263efdf) Signed-off-by: Yin Hu

spark git commit: [SPARK-15583][SQL] Disallow altering datasource properties

2016-05-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6ab973ec5 -> 3fca635b4 [SPARK-15583][SQL] Disallow altering datasource properties ## What changes were proposed in this pull request? Certain table properties (and SerDe properties) are in the protected namespace `spark.sql.sources.`, whi

spark git commit: [SPARK-15583][SQL] Disallow altering datasource properties

2016-05-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 702755f92 -> 8e26b74fc [SPARK-15583][SQL] Disallow altering datasource properties ## What changes were proposed in this pull request? Certain table properties (and SerDe properties) are in the protected namespace `spark.sql.sources.`,

spark git commit: [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d5911d117 -> 6f95c6c03 [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now ## What changes were proposed in this pull request? The test cases for `list` command added in `CliSuite` by PR #13212 can not run in s

spark git commit: [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 b3845fede -> b430aa98c [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now ## What changes were proposed in this pull request? The test cases for `list` command added in `CliSuite` by PR #13212 can not run

spark git commit: [SPARK-15565][SQL] Add the File Scheme to the Default Value of WAREHOUSE_PATH

2016-05-27 Thread yhuai
user.dir")/spark-warehouse`. Since `System.getProperty("user.dir")` is a local dir, we should explicitly set the scheme to local filesystem. cc yhuai How was this patch tested? Added two test cases Author: gatorsmile Closes #13348 from gatorsmile/addSchemeToDefaultWarehou

spark git commit: [SPARK-15565][SQL] Add the File Scheme to the Default Value of WAREHOUSE_PATH

2016-05-27 Thread yhuai
user.dir")/spark-warehouse`. Since `System.getProperty("user.dir")` is a local dir, we should explicitly set the scheme to local filesystem. cc yhuai How was this patch tested? Added two test cases Author: gatorsmile Closes #13348 from gatorsmile/addSchemeToDefaultWarehou

spark git commit: [SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 dcf498e8a -> 9c137b2e3 [SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases ## What changes were proposed in this pull request? This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands. CC yhuai Tha

spark git commit: [SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 21b2605dc -> 019afd9c7 [SPARK-15431][SQL][BRANCH-2.0-TEST] rework the clisuite test cases ## What changes were proposed in this pull request? This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands. CC yhuai Tha

spark git commit: [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES does not respect partition spec

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 dc6e94157 -> 80a40e8e2 [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES does not respect partition spec ## What changes were proposed in this pull request? These commands ignore the partition spec and change the storage properties of th

spark git commit: [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES does not respect partition spec

2016-05-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 776d183c8 -> 4a2fb8b87 [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES does not respect partition spec ## What changes were proposed in this pull request? These commands ignore the partition spec and change the storage properties of the ta

spark git commit: [SPARK-15636][SQL] Make aggregate expressions more concise in explain

2016-05-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 74c1b79f3 -> 472f16181 [SPARK-15636][SQL] Make aggregate expressions more concise in explain ## What changes were proposed in this pull request? This patch reduces the verbosity of aggregate expressions in explain (but does not actually re

spark git commit: [SPARK-15636][SQL] Make aggregate expressions more concise in explain

2016-05-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 a2f68ded2 -> f3570bcea [SPARK-15636][SQL] Make aggregate expressions more concise in explain ## What changes were proposed in this pull request? This patch reduces the verbosity of aggregate expressions in explain (but does not actuall

spark git commit: [SPARK-15658][SQL] UDT serializer should declare its data type as udt instead of udt.sqlType

2016-05-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d67c82e4b -> 2bfed1a0c [SPARK-15658][SQL] UDT serializer should declare its data type as udt instead of udt.sqlType ## What changes were proposed in this pull request? When we build serializer for UDT object, we should declare its data ty

spark git commit: [SPARK-15658][SQL] UDT serializer should declare its data type as udt instead of udt.sqlType

2016-05-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 6347ff512 -> 29b94fdb3 [SPARK-15658][SQL] UDT serializer should declare its data type as udt instead of udt.sqlType ## What changes were proposed in this pull request? When we build serializer for UDT object, we should declare its dat

spark git commit: [SPARK-15622][SQL] Wrap the parent classloader of Janino's classloader in the ParentClassLoader.

2016-05-31 Thread yhuai
fd7e1b4ea0R81) and `test-only *ReplSuite -- -z "SPARK-2576 importing implicits"` still passes the test (without the change in `CodeGenerator`, this test does not pass with the change in `ExecutorClassLoader `). Author: Yin Huai Closes #13366 from yhuai/SPARK-15622. Project: http:

spark git commit: [SPARK-15622][SQL] Wrap the parent classloader of Janino's classloader in the ParentClassLoader.

2016-05-31 Thread yhuai
expand=1#diff-bb538fda94224dd0af01d0fd7e1b4ea0R81) and `test-only *ReplSuite -- -z "SPARK-2576 importing implicits"` still passes the test (without the change in `CodeGenerator`, this test does not pass with the change in `ExecutorClassLoader `). Author: Yin Huai Closes #13366 from yhuai/SPARK-15622. (ch

spark git commit: [SPARK-12988][SQL] Can't drop top level columns that contain dots

2016-05-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0f2471346 -> 06514d689 [SPARK-12988][SQL] Can't drop top level columns that contain dots ## What changes were proposed in this pull request? Fixes "Can't drop top level columns that contain dots". This work is based on dilipbiswal's https

spark git commit: [SPARK-12988][SQL] Can't drop top level columns that contain dots

2016-05-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 7f240eaee -> b8de4ad7d [SPARK-12988][SQL] Can't drop top level columns that contain dots ## What changes were proposed in this pull request? Fixes "Can't drop top level columns that contain dots". This work is based on dilipbiswal's h

spark git commit: [SPARK-15596][SPARK-15635][SQL] ALTER TABLE RENAME fixes

2016-06-01 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5b08ee639 -> 9e2643b21 [SPARK-15596][SPARK-15635][SQL] ALTER TABLE RENAME fixes ## What changes were proposed in this pull request? **SPARK-15596**: Even after we renamed a cached table, the plan would remain in the cache with the old tab

spark git commit: [SPARK-15596][SPARK-15635][SQL] ALTER TABLE RENAME fixes

2016-06-01 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 46d5f7f38 -> 44052a707 [SPARK-15596][SPARK-15635][SQL] ALTER TABLE RENAME fixes ## What changes were proposed in this pull request? **SPARK-15596**: Even after we renamed a cached table, the plan would remain in the cache with the old

spark git commit: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_HOME is set

2016-06-09 Thread yhuai
vn clean install -DskipTests=true` when `JAVA_7_HOME` was set. Also manually inspected the effective POM diff to verify that the final POM changes were scoped correctly: https://gist.github.com/JoshRosen/f889d1c236fad14fa25ac4be01654653 /cc vanzin and yhuai for review. Author: Josh Rosen Close

spark git commit: [SPARK-15839] Fix Maven doc-jar generation when JAVA_7_HOME is set

2016-06-09 Thread yhuai
uild/mvn clean install -DskipTests=true` when `JAVA_7_HOME` was set. Also manually inspected the effective POM diff to verify that the final POM changes were scoped correctly: https://gist.github.com/JoshRosen/f889d1c236fad14fa25ac4be01654653 /cc vanzin and yhuai for review. Author: Josh Rosen

spark git commit: [SPARK-15676][SQL] Disallow Column Names as Partition Columns For Hive Tables

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 2a0da84dc -> 8c4050a5a [SPARK-15676][SQL] Disallow Column Names as Partition Columns For Hive Tables What changes were proposed in this pull request? When creating a Hive Table (not data source tables), a common error users might

spark git commit: [SPARK-15676][SQL] Disallow Column Names as Partition Columns For Hive Tables

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master a6a18a457 -> 3b7fb84cf [SPARK-15676][SQL] Disallow Column Names as Partition Columns For Hive Tables What changes were proposed in this pull request? When creating a Hive Table (not data source tables), a common error users might make

spark git commit: [SPARK-15530][SQL] Set #parallelism for file listing in listLeafFilesInParallel

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 3b7fb84cf -> 5ad4e32d4 [SPARK-15530][SQL] Set #parallelism for file listing in listLeafFilesInParallel ## What changes were proposed in this pull request? This pr is to set the number of parallelism to prevent file listing in `listLeafFile

spark git commit: [SPARK-15530][SQL] Set #parallelism for file listing in listLeafFilesInParallel

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 8c4050a5a -> d9db8a9c8 [SPARK-15530][SQL] Set #parallelism for file listing in listLeafFilesInParallel ## What changes were proposed in this pull request? This pr is to set the number of parallelism to prevent file listing in `listLeaf

spark git commit: [SPARK-15887][SQL] Bring back the hive-site.xml support for Spark 2.0

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master c654ae214 -> c4b1ad020 [SPARK-15887][SQL] Bring back the hive-site.xml support for Spark 2.0 ## What changes were proposed in this pull request? Right now, Spark 2.0 does not load hive-site.xml. Based on users' feedback, it seems make sen

spark git commit: [SPARK-15887][SQL] Bring back the hive-site.xml support for Spark 2.0

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 97fe1d8ee -> b148b0364 [SPARK-15887][SQL] Bring back the hive-site.xml support for Spark 2.0 ## What changes were proposed in this pull request? Right now, Spark 2.0 does not load hive-site.xml. Based on users' feedback, it seems make

spark git commit: [SPARK-15808][SQL] File Format Checking When Appending Data

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7b9071eea -> 5827b65e2 [SPARK-15808][SQL] File Format Checking When Appending Data What changes were proposed in this pull request? **Issue:** Got wrong results or strange errors when append data to a table with mismatched file format

spark git commit: [SPARK-15808][SQL] File Format Checking When Appending Data

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 774014250 -> 55c1fac21 [SPARK-15808][SQL] File Format Checking When Appending Data What changes were proposed in this pull request? **Issue:** Got wrong results or strange errors when append data to a table with mismatched file fo

spark git commit: [SPARK-15663][SQL] SparkSession.catalog.listFunctions shouldn't include the list of built-in functions

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/master baa3e633e -> 1842cdd4e [SPARK-15663][SQL] SparkSession.catalog.listFunctions shouldn't include the list of built-in functions ## What changes were proposed in this pull request? SparkSession.catalog.listFunctions currently returns all func

spark git commit: [SPARK-15663][SQL] SparkSession.catalog.listFunctions shouldn't include the list of built-in functions

2016-06-13 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 1a57bf0f4 -> 2841bbac4 [SPARK-15663][SQL] SparkSession.catalog.listFunctions shouldn't include the list of built-in functions ## What changes were proposed in this pull request? SparkSession.catalog.listFunctions currently returns all

spark git commit: [SPARK-15914][SQL] Add deprecated method back to SQLContext for backward source code compatibility

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 53bb03084 -> 6e8cdef0c [SPARK-15914][SQL] Add deprecated method back to SQLContext for backward source code compatibility ## What changes were proposed in this pull request? Revert partial changes in SPARK-12600, and add some deprecated m

spark git commit: [SPARK-15914][SQL] Add deprecated method back to SQLContext for backward source code compatibility

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 e90ba2287 -> 73beb9fb3 [SPARK-15914][SQL] Add deprecated method back to SQLContext for backward source code compatibility ## What changes were proposed in this pull request? Revert partial changes in SPARK-12600, and add some deprecat

spark git commit: [SPARK-15895][SQL] Filters out metadata files while doing partition discovery

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 515937046 -> e03c25193 [SPARK-15895][SQL] Filters out metadata files while doing partition discovery ## What changes were proposed in this pull request? Take the following directory layout as an example: ``` dir/ +- p0=0/ |-_metada

spark git commit: [SPARK-15895][SQL] Filters out metadata files while doing partition discovery

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master df4ea6614 -> bd39ffe35 [SPARK-15895][SQL] Filters out metadata files while doing partition discovery ## What changes were proposed in this pull request? Take the following directory layout as an example: ``` dir/ +- p0=0/ |-_metadata

spark git commit: [SPARK-15247][SQL] Set the default number of partitions for reading parquet schemas

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master bd39ffe35 -> dae4d5db2 [SPARK-15247][SQL] Set the default number of partitions for reading parquet schemas ## What changes were proposed in this pull request? This pr sets the default number of partitions when reading parquet schemas. SQLC

spark git commit: [SPARK-15247][SQL] Set the default number of partitions for reading parquet schemas

2016-06-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 24539223b -> 9adba414c [SPARK-15247][SQL] Set the default number of partitions for reading parquet schemas ## What changes were proposed in this pull request? This pr sets the default number of partitions when reading parquet schemas.

svn commit: r1748776 [2/2] - in /spark: news/_posts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-06-16 Thread yhuai
Modified: spark/site/news/spark-tips-from-quantifind.html URL: http://svn.apache.org/viewvc/spark/site/news/spark-tips-from-quantifind.html?rev=1748776&r1=1748775&r2=1748776&view=diff == --- spark/site/news/spark-tips-from

svn commit: r1748776 [1/2] - in /spark: news/_posts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-06-16 Thread yhuai
Author: yhuai Date: Thu Jun 16 22:14:05 2016 New Revision: 1748776 URL: http://svn.apache.org/viewvc?rev=1748776&view=rev Log: Add a new news for CFP of Spark Summit 2016 EU Added: spark/news/_posts/2016-06-16-submit-talks-to-spark-summit-eu-2016.md spark/site/news/submit-talks-to-s

spark git commit: [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION

2016-06-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 3994372f4 -> b82abde06 [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION What changes were proposed in this pull request? `IF NOT EXISTS` in `INSERT OVERWRITE` should not suppor

spark git commit: [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION

2016-06-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5ada60614 -> e5d703bca [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION What changes were proposed in this pull request? `IF NOT EXISTS` in `INSERT OVERWRITE` should not support dy

spark git commit: [SPARK-16033][SQL] insertInto() can't be used together with partitionBy()

2016-06-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ebb9a3b6f -> 10b671447 [SPARK-16033][SQL] insertInto() can't be used together with partitionBy() ## What changes were proposed in this pull request? When inserting into an existing partitioned table, partitioning columns should always be

spark git commit: [SPARK-16033][SQL] insertInto() can't be used together with partitionBy()

2016-06-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 b22b20db6 -> 57feaa572 [SPARK-16033][SQL] insertInto() can't be used together with partitionBy() ## What changes were proposed in this pull request? When inserting into an existing partitioned table, partitioning columns should always

spark git commit: [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 f159eb521 -> 8d2fc010b [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems ## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned tab

spark git commit: [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e574c9973 -> 3d010c837 [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems ## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned table w

  1   2   3   4   5   6   7   8   9   >