spark-release-1-0-0.html

pwendell Fri, 30 May 2014 02:30:08 -0700

Author: pwendell
Date: Fri May 30 09:29:14 2014
New Revision: 1598518

URL: http://svn.apache.org/r1598518
Log:
Adding release notes for Spark 1.0.0


Added:
    spark/releases/_posts/2014-05-30-spark-release-1-0-0.md
    spark/site/releases/spark-release-1-0-0.html

Added: spark/releases/_posts/2014-05-30-spark-release-1-0-0.md
URL: 
http://svn.apache.org/viewvc/spark/releases/_posts/2014-05-30-spark-release-1-0-0.md?rev=1598518&view=auto
==============================================================================
--- spark/releases/_posts/2014-05-30-spark-release-1-0-0.md (added)
+++ spark/releases/_posts/2014-05-30-spark-release-1-0-0.md Fri May 30 09:29:14 
2014
@@ -0,0 +1,183 @@
+---
+layout: post
+title: Spark Release 1.0.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Spark 1.0.0 is a major release marking the start of the 1.X line. This release 
brings both a variety of new features and strong API compatibility guarantees 
throughout the 1.X line. Spark 1.0 adds a new major component, [Spark 
SQL]({{site.url}}docs/1.0.0/sql-programming-guide.html), for loading and 
manipulating structured data in Spark. It includes major extensions to all of 
Sparkâs existing standard libraries 
([ML]({{site.url}}docs/1.0.0/mllib-guide.html), 
[Streaming]({{site.url}}docs/1.0.0/streaming-programming-guide.html), and 
[GraphX]({{site.url}}docs/1.0.0/graphx-programming-guide.html)) while also 
enhancing language support in Java and Python. Finally, Spark 1.0 brings 
operational improvements including full support for the Hadoop/YARN security 
model and a unified submission process for all supported cluster managers.
+
+You can download Spark 1.0.0 as either a 
+<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating.tgz"; 
onClick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating.tgz'); return false;">source package</a>
+(5 MB tgz) or a prebuilt package for 
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-hadoop1.tgz";
 onClick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-hadoop1.tgz'); return false;">Hadoop 1 / 
CDH3</a>, 
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-cdh4.tgz"; 
onClick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-cdh4.tgz'); return false;">CDH4</a>, or
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-hadoop2.tgz";
 onClick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-hadoop2.tgz'); return false;">Hadoop 2 / 
CDH5 / HDP2</a>
+(160 MB tgz). Release signatures and checksums are available at the official 
[Apache download 
site](http://www.apache.org/dist/incubator/spark/spark-1.0.0-incubating/).
+
+### API Stability
+Spark 1.0.0 is the first release in the 1.X major line. Spark is guaranteeing 
stability of its core API for all 1.X releases. Historically Spark has already 
been very conservative with API changes, but this guarantee codifies our 
commitment to application writers. The project has also clearly annotated 
experimental, alpha, and developer APIâs to provide guidance on future API 
changes of newer components.
+
+### Integration with YARN Security
+For users running in secured Hadoop environments, Spark now integrates with 
the Hadoop/YARN security model. Spark will authenticate job submission, 
securely transfer HDFS credentials, and authenticate communication between 
components.
+
+### Operational and Packaging Improvements
+This release significantly simplifies the process of bundling and submitting a 
Spark application. A new [spark-submit 
tool]({{site.url}}docs/1.0.0/submitting-applications.html) allows users to 
submit an application to any Spark cluster, including local clusters, Mesos, or 
YARN, through a common process. The documentation for bundling Spark 
applications has been substantially expanded. Weâve also added a history 
server for  Sparkâs web UI, allowing users to view Spark application data 
after individual applications are finished.
+
+### Spark SQL
+This release introduces [Spark 
SQL]({{site.url}}docs/1.0.0/sql-programming-guide.html) as a new alpha 
component. Spark SQL provides support for loading and manipulating structured 
data in Spark, either from external structured data sources (currently Hive and 
Parquet) or by adding a schema to an existing RDD. Spark SQLâs API 
interoperates with the RDD data model, allowing users to interleave Spark code 
with SQL statements. Under the hood, Spark SQL uses the Catalyst optimizer to 
choose an efficient execution plan, and can automatically push predicates into 
storage formats like Parquet. In future releases, Spark SQL will also provide a 
common API to other storage systems.
+
+### MLlib Improvements
+In 1.0.0, Sparkâs MLlib adds support for sparse feature vectors in Scala, 
Java, and Python. It takes advantage of sparsity in both storage and 
computation in linear methods, k-means, and naive Bayes. In addition, this 
release adds several new algorithms: scalable decision trees for both 
classification and regression, distributed matrix algorithms including SVD and 
PCA, model evaluation functions, and L-BFGS as an optimization primitive. The 
programming guide and code examples for MLlib have also been greatly expanded.
+
+### GraphX and Streaming Improvements
+In addition to usability and maintainability improvements, GraphX in Spark 1.0 
brings substantial performance boosts in graph loading, edge reversal, and 
neighborhood computation. These operations now require less communication and 
produce simpler RDD graphs. Sparkâs Streaming module has added performance 
optimizations for stateful stream transformations, along with improved Flume 
support, and automated state cleanup for long running jobs.
+
+### Extended Java and Python Support
+Spark 1.0 adds support for Java 8 [new lambda 
syntax](http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/Lambda-QuickStart/index.html#section2)
 in its Java bindings. Java 8 supports a concise syntax for writing anonymous 
functions, similar to the closure syntax in Scala and Python. This change 
requires small changes for users of the current Java API, which are noted in 
the documentation. Sparkâs Python API has been extended to support several 
new functions. Weâve also included several stability improvements in the 
Python API, particularly for large datasets. PySpark now supports running on 
YARN as well.
+
+### Documentation
+Sparkâs programming guide has been significantly expanded to centrally cover 
all supported languages and discuss more operators and aspects of the 
development life cycle. The MLlib guide has also been expanded with 
significantly more detail and examples for each algorithm, while documents on 
configuration, YARN and Mesos have also been revamped.
+
+### Smaller Changes
+- PySpark now works with more Python versions than before -- Python 2.6+ 
instead of 2.7+, and NumPy 1.4+ instead of 1.7+.
+- Spark has upgraded to Avro 1.7.6, adding support for Avro specific types.
+- Internal instrumentation has been added to allow applications to monitor and 
instrument Spark jobs.
+- Support for off-heap storage in Tachyon has been added via a special build 
target.
+- Datasets persisted with `DISK_ONLY` now write directly to disk, 
significantly improving memory usage for large datasets.
+- Intermediate state created during a Spark job is now garbage collected when 
the corresponding RDDs become unreferenced, improving performance.
+- Spark now includes a [Javadoc 
version]({{site.url}}docs/1.0.0/api/java/index.html) of all its API docs and a 
[unified Scaladoc]({{site.url}}docs/1.0.0/api/scala/index.html) for all modules.
+- A new SparkContext.wholeTextFiles method lets you operate on small text 
files as individual records.
+
+
+### Migrating to Spark 1.0
+While most of the Spark API remains the same as in 0.x versions, a few changes 
have been made for long-term flexibility, especially in the Java API (to 
support Java 8 lambdas). The documentation includes [migration 
information]({{site.url}}docs/1.0.0/programming-guide.html#migrating-from-pre-10-versions-of-spark)
 to upgrade your applications.
+
+### Contributors
+The following developers contributed to this release:
+
+ * Aaron Davidson -- packaging and deployment improvements, several bug fixes, 
local[*] mode
+ * Aaron Kimball -- documentation improvements
+ * Abhishek Kumar -- Python configuration fixes
+ * Ahir Reddy -- PySpark build, fixes, and cancellation support
+ * Allan Douglas R. de Oliveira -- Improvements to spark-ec2 scripts
+ * Andre Schumacher -- Parquet support and optimizations
+ * Andrew Ash -- Mesos documentation and other doc improvements, bug fixes
+ * Andrew Or -- history server (lead), garbage collection (lead), 
spark-submit, PySpark and YARN improvements
+ * Andrew Tulloch -- MLlib contributions and code clean-up
+ * Andy Konwinski -- documentation fix
+ * Anita Tailor -- Cassandra example
+ * Ankur Dave -- GraphX (lead) optimizations, documentation, and usability
+ * Archer Shao -- bug fixes
+ * Arun Ramakrishnan -- improved random sampling
+ * Baishuo -- test improvements
+ * Bernardo Gomez Palacio -- spark-shell improvements and Mesos updates
+ * Bharath Bhushan -- bug fix
+ * Bijay Bisht -- bug fixes
+ * Binh Nguyen -- dependency fix
+ * Bouke van der Bijl -- fixes for PySpark on Mesos and other Mesos fixes
+ * Bryn Keller -- improvement to HBase support and unit tests
+ * Chen Chao -- documentation, bug fix, and code clean-up
+ * Cheng Hao -- performance and feature improvements in Spark SQL
+ * Cheng Lian -- column storage and other improvements in Spark SQL
+ * Christian Lundgren -- improvement to spark-ec2 scripts
+ * DB Tsai -- L-BGFS optimizer in MLlib, MLlib documentation and fixes
+ * Dan McClary -- Improvement to stats counter
+ * Daniel Darabos -- GraphX performance improvement
+ * Davis Shepherd -- bug fix
+ * Diana Carroll -- documentation and bug fix
+ * Egor Pakhomov -- local iterator for RDDâs
+ * Emtiaz Ahmed -- bug fix
+ * Erik Selin -- bug fix
+ * Ethan Jewett -- documentation improvement
+ * Evan Chan -- automatic clean-up of application data
+ * Evan Sparks -- MLlib optimizations and doc improvement
+ * Frank Dai -- code clean-up in MLlib
+ * Guoquiang Li -- build improvements and several bug fixes
+ * Ghidireac -- bug fix
+ * Haoyuan Li -- Tachyon storage level for RDDâs
+ * Harvey Feng -- spark-ec2 update
+ * Henry Saputra -- code clean-up
+ * Henry Cook -- Spark SQL improvements
+ * Holden Karau -- cross validation in MLlib, Python and core engine 
improvements
+ * Ivan Wick -- Mesos bug fix
+ * Jey Kottalam -- sbt build improvement
+ * Jerry Shao -- Spark metrics and Spark SQL improvements
+ * Jiacheng Guo -- bug fix
+ * Jianghan -- bug fix
+ * Jianping J Wang -- JBLAS support in MLlib
+ * Joseph E. Gonzalez -- GraphX improvements, fixes, and documentation
+ * Josh Rosen -- PySpark improvements and bug fixes
+ * Jyotiska NK -- documentation, test improvements, and bug fix
+ * Kan Zhang -- bug fixes in Spark core, SQL, and PySpark
+ * Kay Ousterhout -- bug fixes and code refactoring in scheduler
+ * Kelvin Chu -- automatic clean-up of application data
+ * Kevin Mader -- example fix
+ * Koert Kuipers -- code visibility fix
+ * Kousuke Saruta -- documentation and build fixes
+ * Kyle Ellrott -- improved memory usage for DISK_ONLY persistence
+ * Larva Boy -- approximate counts in Spark SQL
+ * Madhu Siddalingaiah -- ec2 fixes
+ * Manish Amde -- decision trees in MLlib
+ * Marcelo Vanzin -- improvements and fixes to YARN support, dependency 
clean-up
+ * Mark Grover -- build fixes
+ * Mark Hamstra -- build and dependency improvements, scheduler bug fixes
+ * Margin Jaggi -- MLlib documentation improvements
+ * Matei Zaharia -- Python versions of several MLlib algorithms, spark-submit 
improvements, bug fixes, and documentation improvements
+ * Michael Armbrust -- Spark SQL (lead), including schema support for RDDâs, 
catalyst optimizer, and Hive support
+ * Mridul Muralidharan -- code visibility changes and bug fixes
+ * Nan Zhu -- bug and stability fixes, code clean-up, documentation, and new 
features
+ * Neville Li -- bug fix
+ * Nick Lanham -- Tachyon bundling in distribution script
+ * Nirmal Reddy -- code clean-up
+ * OuYang Jin -- local mode and json improvements
+ * Patrick Wendell -- release manager, build improvements, bug fixes, and code 
clean-up
+ * Petko Nikolov -- new utility functions
+ * Prabeesh K -- typo fix
+ * Prabin Banka -- new PySpark APIâs
+ * Prashant Sharma -- PySpark improvements, Java 8 lambda support, and build 
improvements
+ * Punya Biswal -- Java API improvements
+ * Qiuzhuang Lian -- bug fixes
+ * Rahul Singhal -- build improvements, bug fixes
+ * Raymond Liu -- YARN build fixes and UI improvements
+ * Reynold Xin -- bug fixes, internal changes, Spark SQL improvements, build 
fixes, and style improvements
+ * Reza Zadeh -- SVD implementation in MLlib and other MLlib contributions
+ * Roman Pastukhov -- clean-up of broadcast files
+ * Rong Gu -- Tachyon storage level for RDDâs
+ * Sandeep Sing -- several bug fixes, MLLib improvements and fixes to Spark 
examples
+ * Sandy Ryza -- spark-submit script and several YARN improvements
+ * Saurabh Rawat  -- Java API improvements
+ * Sean Owen -- several build improvements, code clean-up, and MLlib fixes
+ * Semih Salihoglu -- GraphX improvements
+ * Shaocun Tian -- bug fix in MLlib
+ * Shivaram Venkataraman -- bug fixes
+ * Shixiong Zhu -- code style and correctness fixes
+ * Shiyun Wxm -- typo fix
+ * Stevo Slavic -- bug fix
+ * Sumedh Mungee -- documentation fix
+ * Sundeep Narravula -- âcancelâ button in Spark UI
+ * Takayu Ueshin -- bug fixes and improvements to Spark SQL
+ * Tathagata Das -- web UI and other improvements to Spark Streaming (lead), 
bug fixes, state clean-up, and release manager
+ * Timothy Chen -- Spark SQL improvements
+ * Ted Malaska -- improved Flume support
+ * Tom Graves -- Hadoop security integration (lead) and YARN support
+ * Tianshuo Deng -- Bug fix
+ * Tor Myklebust -- improvements to ALS
+ * Wangfei -- Spark SQL docs
+ * Wang Tao -- code clean-up
+ * William Bendon -- JSON support changes and bug fixes
+ * Xiangrui Meng -- several improvements to MLlib (lead)
+ * Xuan Nguyen -- build fix
+ * Xusen Yin -- MLlib contributions and bug fix
+ * Ye Xianjin -- test fixes
+ * Yinan Li -- addFile improvement
+ * Yin Hua -- Spark SQL improvements
+ * Zheng Peng -- bug fixes
+
+_Thanks to everyone who contributed!_

Added: spark/site/releases/spark-release-1-0-0.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-0-0.html?rev=1598518&view=auto
==============================================================================
--- spark/site/releases/spark-release-1-0-0.html (added)
+++ spark/site/releases/spark-release-1-0-0.html Fri May 30 09:29:14 2014
@@ -0,0 +1,361 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta http-equiv="X-UA-Compatible" content="IE=edge">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <title>
+     Spark Release 1.0.0 | Apache Spark
+    
+  </title>
+
+  
+
+  <!-- Bootstrap core CSS -->
+  <link href="/css/cerulean.min.css" rel="stylesheet">
+  <link href="/css/custom.css" rel="stylesheet">
+
+  <script type="text/javascript">
+  <!-- Google Analytics initialization -->
+  var _gaq = _gaq || [];
+  _gaq.push(['_setAccount', 'UA-32518208-2']);
+  _gaq.push(['_trackPageview']);
+  (function() {
+    var ga = document.createElement('script'); ga.type = 'text/javascript'; 
ga.async = true;
+    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 
'http://www') + '.google-analytics.com/ga.js';
+    var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);
+  })();
+
+  <!-- Adds slight delay to links to allow async reporting -->
+  function trackOutboundLink(link, category, action) {  
+    try { 
+      _gaq.push(['_trackEvent', category , action]); 
+    } catch(err){}
+ 
+    setTimeout(function() {
+      document.location.href = link.href;
+    }, 100);
+  }
+  </script>
+
+  <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media 
queries -->
+  <!--[if lt IE 9]>
+  <script 
src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js";></script>
+  <script 
src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js";></script>
+  <![endif]-->
+</head>
+
+<body>
+
+<div class="container" style="max-width: 1200px;">
+
+<div class="masthead">
+  
+    <p class="lead">
+      <a href="/">
+      <img src="/images/spark-logo.png"
+        style="height:100px; width:auto; vertical-align: bottom; margin-top: 
20px;"></a><span class="tagline">
+          Lightning-fast cluster computing
+      </span>
+    </p>
+  
+</div>
+
+<nav class="navbar navbar-default" role="navigation">
+  <!-- Brand and toggle get grouped for better mobile display -->
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse"
+            data-target="#navbar-collapse-1">
+      <span class="sr-only">Toggle navigation</span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+  </div>
+
+  <!-- Collect the nav links, forms, and other content for toggling -->
+  <div class="collapse navbar-collapse" id="navbar-collapse-1">
+    <ul class="nav navbar-nav">
+      <li><a href="/downloads.html">Download</a></li>
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
+          Related Projects <b class="caret"></b>
+        </a>
+        <ul class="dropdown-menu">
+          
+          <li><a href="http://shark.cs.berkeley.edu";>Shark (SQL)</a></li>
+          <li><a href="/streaming/">Spark Streaming</a></li>
+          <li><a href="/mllib/">MLlib (machine learning)</a></li>
+          <li><a href="http://amplab.github.io/graphx/";>GraphX (graph)</a></li>
+        </ul>
+      </li>
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
+          Documentation <b class="caret"></b>
+        </a>
+        <ul class="dropdown-menu">
+          <li><a href="/documentation.html">Overview</a></li>
+          <li><a href="/docs/latest/">Latest Release</a></li>
+          <li><a href="/examples.html">Examples</a></li>
+        </ul>
+      </li>
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">
+          Community <b class="caret"></b>
+        </a>
+        <ul class="dropdown-menu">
+          <li><a href="/community.html">Mailing Lists</a></li>
+          <li><a href="/community.html#events">Events and Meetups</a></li>
+          <li><a href="/community.html#history">Project History</a></li>
+          <li><a 
href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark";>Powered
 By</a></li>
+        </ul>
+      </li>
+      <li><a href="/faq.html">FAQ</a></li>
+    </ul>
+  </div>
+  <!-- /.navbar-collapse -->
+</nav>
+
+
+<div class="row">
+  <div class="col-md-3 col-md-push-9">
+    <div class="news" style="margin-bottom: 20px;">
+      <h5>Latest News</h5>
+      <ul class="list-unstyled">
+        
+          <li><a href="/news/spark-summit-agenda-posted.html">Spark Summit 
agenda posted</a>
+          <span class="small">(May 11, 2014)</span></li>
+        
+          <li><a href="/news/spark-0-9-1-released.html">Spark 0.9.1 
released</a>
+          <span class="small">(Apr 09, 2014)</span></li>
+        
+          <li><a 
href="/news/submit-talks-to-spark-summit-2014.html">Submissions and 
registration open for Spark Summit 2014</a>
+          <span class="small">(Mar 20, 2014)</span></li>
+        
+          <li><a href="/news/spark-becomes-tlp.html">Spark becomes top-level 
Apache project</a>
+          <span class="small">(Feb 27, 2014)</span></li>
+        
+      </ul>
+      <p class="small" style="text-align: right;"><a 
href="/news/index.html">Archive</a></p>
+    </div>
+    <div class="hidden-xs hidden-sm">
+      <a href="/downloads.html" class="btn btn-success btn-lg btn-block" 
style="margin-bottom: 30px;">
+        Download Spark
+      </a>
+      <p style="font-size: 16px; font-weight: 500; color: #555;">
+        Related Projects:
+      </p>
+      <ul class="list-narrow">
+        
+        <li><a href="http://shark.cs.berkeley.edu";>Shark (SQL)</a></li>
+        <li><a href="/streaming/">Spark Streaming</a></li>
+        <li><a href="/mllib/">MLlib (machine learning)</a></li>
+        <li><a href="http://amplab.github.io/graphx/";>GraphX (graph)</a></li>
+      </ul>
+    </div>
+  </div>
+
+  <div class="col-md-9 col-md-pull-3">
+    <h2>Spark Release 1.0.0</h2>
+
+
+<p>Spark 1.0.0 is a major release marking the start of the 1.X line. This 
release brings both a variety of new features and strong API compatibility 
guarantees throughout the 1.X line. Spark 1.0 adds a new major component, <a 
href="/docs/1.0.0/sql-programming-guide.html">Spark SQL</a>, for loading and 
manipulating structured data in Spark. It includes major extensions to all of 
Sparkâs existing standard libraries (<a 
href="/docs/1.0.0/mllib-guide.html">ML</a>, <a 
href="/docs/1.0.0/streaming-programming-guide.html">Streaming</a>, and <a 
href="/docs/1.0.0/graphx-programming-guide.html">GraphX</a>) while also 
enhancing language support in Java and Python. Finally, Spark 1.0 brings 
operational improvements including full support for the Hadoop/YARN security 
model and a unified submission process for all supported cluster managers.</p>
+
+<p>You can download Spark 1.0.0 as either a 
+<a href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating.tgz"; 
onclick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating.tgz'); return false;">source package</a>
+(5 MB tgz) or a prebuilt package for 
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-hadoop1.tgz";
 onclick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-hadoop1.tgz'); return false;">Hadoop 1 / 
CDH3</a>, 
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-cdh4.tgz"; 
onclick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-cdh4.tgz'); return false;">CDH4</a>, or
+<a 
href="http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0-incubating-bin-hadoop2.tgz";
 onclick="trackOutboundLink(this, 'Release Download Links', 
'cloudfront_spark-1.0.0-incubating-bin-hadoop2.tgz'); return false;">Hadoop 2 / 
CDH5 / HDP2</a>
+(160 MB tgz). Release signatures and checksums are available at the official 
<a 
href="http://www.apache.org/dist/incubator/spark/spark-1.0.0-incubating/";>Apache
 download site</a>.</p>
+
+<h3 id="api-stability">API Stability</h3>
+<p>Spark 1.0.0 is the first release in the 1.X major line. Spark is 
guaranteeing stability of its core API for all 1.X releases. Historically Spark 
has already been very conservative with API changes, but this guarantee 
codifies our commitment to application writers. The project has also clearly 
annotated experimental, alpha, and developer APIâs to provide guidance on 
future API changes of newer components.</p>
+
+<h3 id="integration-with-yarn-security">Integration with YARN Security</h3>
+<p>For users running in secured Hadoop environments, Spark now integrates with 
the Hadoop/YARN security model. Spark will authenticate job submission, 
securely transfer HDFS credentials, and authenticate communication between 
components.</p>
+
+<h3 id="operational-and-packaging-improvements">Operational and Packaging 
Improvements</h3>
+<p>This release significantly simplifies the process of bundling and 
submitting a Spark application. A new <a 
href="/docs/1.0.0/submitting-applications.html">spark-submit tool</a> allows 
users to submit an application to any Spark cluster, including local clusters, 
Mesos, or YARN, through a common process. The documentation for bundling Spark 
applications has been substantially expanded. Weâve also added a history 
server for  Sparkâs web UI, allowing users to view Spark application data 
after individual applications are finished.</p>
+
+<h3 id="spark-sql">Spark SQL</h3>
+<p>This release introduces <a 
href="/docs/1.0.0/sql-programming-guide.html">Spark SQL</a> as a new alpha 
component. Spark SQL provides support for loading and manipulating structured 
data in Spark, either from external structured data sources (currently Hive and 
Parquet) or by adding a schema to an existing RDD. Spark SQLâs API 
interoperates with the RDD data model, allowing users to interleave Spark code 
with SQL statements. Under the hood, Spark SQL uses the Catalyst optimizer to 
choose an efficient execution plan, and can automatically push predicates into 
storage formats like Parquet. In future releases, Spark SQL will also provide a 
common API to other storage systems.</p>
+
+<h3 id="mllib-improvements">MLlib Improvements</h3>
+<p>In 1.0.0, Sparkâs MLlib adds support for sparse feature vectors in Scala, 
Java, and Python. It takes advantage of sparsity in both storage and 
computation in linear methods, k-means, and naive Bayes. In addition, this 
release adds several new algorithms: scalable decision trees for both 
classification and regression, distributed matrix algorithms including SVD and 
PCA, model evaluation functions, and L-BFGS as an optimization primitive. The 
programming guide and code examples for MLlib have also been greatly 
expanded.</p>
+
+<h3 id="graphx-and-streaming-improvements">GraphX and Streaming 
Improvements</h3>
+<p>In addition to usability and maintainability improvements, GraphX in Spark 
1.0 brings substantial performance boosts in graph loading, edge reversal, and 
neighborhood computation. These operations now require less communication and 
produce simpler RDD graphs. Sparkâs Streaming module has added performance 
optimizations for stateful stream transformations, along with improved Flume 
support, and automated state cleanup for long running jobs.</p>
+
+<h3 id="extended-java-and-python-support">Extended Java and Python Support</h3>
+<p>Spark 1.0 adds support for Java 8 <a 
href="http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/Lambda-QuickStart/index.html#section2";>new
 lambda syntax</a> in its Java bindings. Java 8 supports a concise syntax for 
writing anonymous functions, similar to the closure syntax in Scala and Python. 
This change requires small changes for users of the current Java API, which are 
noted in the documentation. Sparkâs Python API has been extended to support 
several new functions. Weâve also included several stability improvements in 
the Python API, particularly for large datasets. PySpark now supports running 
on YARN as well.</p>
+
+<h3 id="documentation">Documentation</h3>
+<p>Sparkâs programming guide has been significantly expanded to centrally 
cover all supported languages and discuss more operators and aspects of the 
development life cycle. The MLlib guide has also been expanded with 
significantly more detail and examples for each algorithm, while documents on 
configuration, YARN and Mesos have also been revamped.</p>
+
+<h3 id="smaller-changes">Smaller Changes</h3>
+<ul>
+  <li>PySpark now works with more Python versions than before &#8211; Python 
2.6+ instead of 2.7+, and NumPy 1.4+ instead of 1.7+.</li>
+  <li>Spark has upgraded to Avro 1.7.6, adding support for Avro specific 
types.</li>
+  <li>Internal instrumentation has been added to allow applications to monitor 
and instrument Spark jobs.</li>
+  <li>Support for off-heap storage in Tachyon has been added via a special 
build target.</li>
+  <li>Datasets persisted with <code>DISK_ONLY</code> now write directly to 
disk, significantly improving memory usage for large datasets.</li>
+  <li>Intermediate state created during a Spark job is now garbage collected 
when the corresponding RDDs become unreferenced, improving performance.</li>
+  <li>Spark now includes a <a href="/docs/1.0.0/api/java/index.html">Javadoc 
version</a> of all its API docs and a <a 
href="/docs/1.0.0/api/scala/index.html">unified Scaladoc</a> for all 
modules.</li>
+  <li>A new SparkContext.wholeTextFiles method lets you operate on small text 
files as individual records.</li>
+</ul>
+
+<h3 id="migrating-to-spark-10">Migrating to Spark 1.0</h3>
+<p>While most of the Spark API remains the same as in 0.x versions, a few 
changes have been made for long-term flexibility, especially in the Java API 
(to support Java 8 lambdas). The documentation includes <a 
href="/docs/1.0.0/programming-guide.html#migrating-from-pre-10-versions-of-spark">migration
 information</a> to upgrade your applications.</p>
+
+<h3 id="contributors">Contributors</h3>
+<p>The following developers contributed to this release:</p>
+
+<ul>
+  <li>Aaron Davidson &#8211; packaging and deployment improvements, several 
bug fixes, local[*] mode</li>
+  <li>Aaron Kimball &#8211; documentation improvements</li>
+  <li>Abhishek Kumar &#8211; Python configuration fixes</li>
+  <li>Ahir Reddy &#8211; PySpark build, fixes, and cancellation support</li>
+  <li>Allan Douglas R. de Oliveira &#8211; Improvements to spark-ec2 
scripts</li>
+  <li>Andre Schumacher &#8211; Parquet support and optimizations</li>
+  <li>Andrew Ash &#8211; Mesos documentation and other doc improvements, bug 
fixes</li>
+  <li>Andrew Or &#8211; history server (lead), garbage collection (lead), 
spark-submit, PySpark and YARN improvements</li>
+  <li>Andrew Tulloch &#8211; MLlib contributions and code clean-up</li>
+  <li>Andy Konwinski &#8211; documentation fix</li>
+  <li>Anita Tailor &#8211; Cassandra example</li>
+  <li>Ankur Dave &#8211; GraphX (lead) optimizations, documentation, and 
usability</li>
+  <li>Archer Shao &#8211; bug fixes</li>
+  <li>Arun Ramakrishnan &#8211; improved random sampling</li>
+  <li>Baishuo &#8211; test improvements</li>
+  <li>Bernardo Gomez Palacio &#8211; spark-shell improvements and Mesos 
updates</li>
+  <li>Bharath Bhushan &#8211; bug fix</li>
+  <li>Bijay Bisht &#8211; bug fixes</li>
+  <li>Binh Nguyen &#8211; dependency fix</li>
+  <li>Bouke van der Bijl &#8211; fixes for PySpark on Mesos and other Mesos 
fixes</li>
+  <li>Bryn Keller &#8211; improvement to HBase support and unit tests</li>
+  <li>Chen Chao &#8211; documentation, bug fix, and code clean-up</li>
+  <li>Cheng Hao &#8211; performance and feature improvements in Spark SQL</li>
+  <li>Cheng Lian &#8211; column storage and other improvements in Spark 
SQL</li>
+  <li>Christian Lundgren &#8211; improvement to spark-ec2 scripts</li>
+  <li>DB Tsai &#8211; L-BGFS optimizer in MLlib, MLlib documentation and 
fixes</li>
+  <li>Dan McClary &#8211; Improvement to stats counter</li>
+  <li>Daniel Darabos &#8211; GraphX performance improvement</li>
+  <li>Davis Shepherd &#8211; bug fix</li>
+  <li>Diana Carroll &#8211; documentation and bug fix</li>
+  <li>Egor Pakhomov &#8211; local iterator for RDDâs</li>
+  <li>Emtiaz Ahmed &#8211; bug fix</li>
+  <li>Erik Selin &#8211; bug fix</li>
+  <li>Ethan Jewett &#8211; documentation improvement</li>
+  <li>Evan Chan &#8211; automatic clean-up of application data</li>
+  <li>Evan Sparks &#8211; MLlib optimizations and doc improvement</li>
+  <li>Frank Dai &#8211; code clean-up in MLlib</li>
+  <li>Guoquiang Li &#8211; build improvements and several bug fixes</li>
+  <li>Ghidireac &#8211; bug fix</li>
+  <li>Haoyuan Li &#8211; Tachyon storage level for RDDâs</li>
+  <li>Harvey Feng &#8211; spark-ec2 update</li>
+  <li>Henry Saputra &#8211; code clean-up</li>
+  <li>Henry Cook &#8211; Spark SQL improvements</li>
+  <li>Holden Karau &#8211; cross validation in MLlib, Python and core engine 
improvements</li>
+  <li>Ivan Wick &#8211; Mesos bug fix</li>
+  <li>Jey Kottalam &#8211; sbt build improvement</li>
+  <li>Jerry Shao &#8211; Spark metrics and Spark SQL improvements</li>
+  <li>Jiacheng Guo &#8211; bug fix</li>
+  <li>Jianghan &#8211; bug fix</li>
+  <li>Jianping J Wang &#8211; JBLAS support in MLlib</li>
+  <li>Joseph E. Gonzalez &#8211; GraphX improvements, fixes, and 
documentation</li>
+  <li>Josh Rosen &#8211; PySpark improvements and bug fixes</li>
+  <li>Jyotiska NK &#8211; documentation, test improvements, and bug fix</li>
+  <li>Kan Zhang &#8211; bug fixes in Spark core, SQL, and PySpark</li>
+  <li>Kay Ousterhout &#8211; bug fixes and code refactoring in scheduler</li>
+  <li>Kelvin Chu &#8211; automatic clean-up of application data</li>
+  <li>Kevin Mader &#8211; example fix</li>
+  <li>Koert Kuipers &#8211; code visibility fix</li>
+  <li>Kousuke Saruta &#8211; documentation and build fixes</li>
+  <li>Kyle Ellrott &#8211; improved memory usage for DISK_ONLY persistence</li>
+  <li>Larva Boy &#8211; approximate counts in Spark SQL</li>
+  <li>Madhu Siddalingaiah &#8211; ec2 fixes</li>
+  <li>Manish Amde &#8211; decision trees in MLlib</li>
+  <li>Marcelo Vanzin &#8211; improvements and fixes to YARN support, 
dependency clean-up</li>
+  <li>Mark Grover &#8211; build fixes</li>
+  <li>Mark Hamstra &#8211; build and dependency improvements, scheduler bug 
fixes</li>
+  <li>Margin Jaggi &#8211; MLlib documentation improvements</li>
+  <li>Matei Zaharia &#8211; Python versions of several MLlib algorithms, 
spark-submit improvements, bug fixes, and documentation improvements</li>
+  <li>Michael Armbrust &#8211; Spark SQL (lead), including schema support for 
RDDâs, catalyst optimizer, and Hive support</li>
+  <li>Mridul Muralidharan &#8211; code visibility changes and bug fixes</li>
+  <li>Nan Zhu &#8211; bug and stability fixes, code clean-up, documentation, 
and new features</li>
+  <li>Neville Li &#8211; bug fix</li>
+  <li>Nick Lanham &#8211; Tachyon bundling in distribution script</li>
+  <li>Nirmal Reddy &#8211; code clean-up</li>
+  <li>OuYang Jin &#8211; local mode and json improvements</li>
+  <li>Patrick Wendell &#8211; release manager, build improvements, bug fixes, 
and code clean-up</li>
+  <li>Petko Nikolov &#8211; new utility functions</li>
+  <li>Prabeesh K &#8211; typo fix</li>
+  <li>Prabin Banka &#8211; new PySpark APIâs</li>
+  <li>Prashant Sharma &#8211; PySpark improvements, Java 8 lambda support, and 
build improvements</li>
+  <li>Punya Biswal &#8211; Java API improvements</li>
+  <li>Qiuzhuang Lian &#8211; bug fixes</li>
+  <li>Rahul Singhal &#8211; build improvements, bug fixes</li>
+  <li>Raymond Liu &#8211; YARN build fixes and UI improvements</li>
+  <li>Reynold Xin &#8211; bug fixes, internal changes, Spark SQL improvements, 
build fixes, and style improvements</li>
+  <li>Reza Zadeh &#8211; SVD implementation in MLlib and other MLlib 
contributions</li>
+  <li>Roman Pastukhov &#8211; clean-up of broadcast files</li>
+  <li>Rong Gu &#8211; Tachyon storage level for RDDâs</li>
+  <li>Sandeep Sing &#8211; several bug fixes, MLLib improvements and fixes to 
Spark examples</li>
+  <li>Sandy Ryza &#8211; spark-submit script and several YARN improvements</li>
+  <li>Saurabh Rawat  &#8211; Java API improvements</li>
+  <li>Sean Owen &#8211; several build improvements, code clean-up, and MLlib 
fixes</li>
+  <li>Semih Salihoglu &#8211; GraphX improvements</li>
+  <li>Shaocun Tian &#8211; bug fix in MLlib</li>
+  <li>Shivaram Venkataraman &#8211; bug fixes</li>
+  <li>Shixiong Zhu &#8211; code style and correctness fixes</li>
+  <li>Shiyun Wxm &#8211; typo fix</li>
+  <li>Stevo Slavic &#8211; bug fix</li>
+  <li>Sumedh Mungee &#8211; documentation fix</li>
+  <li>Sundeep Narravula &#8211; âcancelâ button in Spark UI</li>
+  <li>Takayu Ueshin &#8211; bug fixes and improvements to Spark SQL</li>
+  <li>Tathagata Das &#8211; web UI and other improvements to Spark Streaming 
(lead), bug fixes, state clean-up, and release manager</li>
+  <li>Timothy Chen &#8211; Spark SQL improvements</li>
+  <li>Ted Malaska &#8211; improved Flume support</li>
+  <li>Tom Graves &#8211; Hadoop security integration (lead) and YARN 
support</li>
+  <li>Tianshuo Deng &#8211; Bug fix</li>
+  <li>Tor Myklebust &#8211; improvements to ALS</li>
+  <li>Wangfei &#8211; Spark SQL docs</li>
+  <li>Wang Tao &#8211; code clean-up</li>
+  <li>William Bendon &#8211; JSON support changes and bug fixes</li>
+  <li>Xiangrui Meng &#8211; several improvements to MLlib (lead)</li>
+  <li>Xuan Nguyen &#8211; build fix</li>
+  <li>Xusen Yin &#8211; MLlib contributions and bug fix</li>
+  <li>Ye Xianjin &#8211; test fixes</li>
+  <li>Yinan Li &#8211; addFile improvement</li>
+  <li>Yin Hua &#8211; Spark SQL improvements</li>
+  <li>Zheng Peng &#8211; bug fixes</li>
+</ul>
+
+<p><em>Thanks to everyone who contributed!</em></p>
+
+
+<p>
+<br/>
+<a href="/news/">Spark News Archive</a>
+</p>
+
+  </div>
+</div>
+
+
+
+<footer class="small">
+  <hr>
+  Apache Spark, Spark, Apache, and the Spark logo are trademarks of
+  <a href="http://www.apache.org";>The Apache Software Foundation</a>.
+</footer>
+
+</div>
+
+<script src="https://code.jquery.com/jquery.js";></script>
+<script 
src="//netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script>
+<script src="/js/lang-tabs.js"></script>
+
+</body>
+</html>

svn commit: r1598518 - in /spark: releases/_posts/2014-05-30-spark-release-1-0-0.md site/releases/spark-release-1-0-0.html

Reply via email to