git commit: [SPARK-2013] Documentation for saveAsPickleFile and pickleFile in Python

rxin Sat, 14 Jun 2014 13:23:12 -0700

Repository: spark
Updated Branches:
  refs/heads/master 2550533a2 -> b52603b03



[SPARK-2013] Documentation for saveAsPickleFile and pickleFile in Python

Author: Kan Zhang <[email protected]>

Closes #983 from kanzhang/SPARK-2013 and squashes the following commits:

0e128bb [Kan Zhang] [SPARK-2013] minor update
e728516 [Kan Zhang] [SPARK-2013] Documentation for saveAsPickleFile and 
pickleFile in Python


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b52603b0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b52603b0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b52603b0

Branch: refs/heads/master
Commit: b52603b039cdfa0f8e58ef3c6229d79e732ffc58
Parents: 2550533
Author: Kan Zhang <[email protected]>
Authored: Sat Jun 14 13:22:30 2014 -0700
Committer: Reynold Xin <[email protected]>
Committed: Sat Jun 14 13:22:30 2014 -0700

----------------------------------------------------------------------
 docs/programming-guide.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/b52603b0/docs/programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 7978468..ef0c0e3 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -377,13 +377,15 @@ Some notes on reading files with Spark:
 
 * The `textFile` method also takes an optional second argument for controlling 
the number of slices of the file. By default, Spark creates one slice for each 
block of the file (blocks being 64MB by default in HDFS), but you can also ask 
for a higher number of slices by passing a larger value. Note that you cannot 
have fewer slices than blocks.
 
-Apart from reading files as a collection of lines,
-`SparkContext.wholeTextFiles` lets you read a directory containing multiple 
small text files, and returns each of them as (filename, content) pairs. This 
is in contrast with `textFile`, which would return one record per line in each 
file.
+Apart from text files, Spark's Python API also supports several other data 
formats:
 
-### SequenceFile and Hadoop InputFormats
+* `SparkContext.wholeTextFiles` lets you read a directory containing multiple 
small text files, and returns each of them as (filename, content) pairs. This 
is in contrast with `textFile`, which would return one record per line in each 
file.
+
+* `RDD.saveAsPickleFile` and `SparkContext.pickleFile` support saving an RDD 
in a simple format consisting of pickled Python objects. Batching is used on 
pickle serialization, with default batch size 10.
 
-In addition to reading text files, PySpark supports reading ```SequenceFile``` 
-and any arbitrary ```InputFormat```.
+* Details on reading `SequenceFile` and arbitrary Hadoop `InputFormat` are 
given below.
+
+### SequenceFile and Hadoop InputFormats
 
 **Note** this feature is currently marked ```Experimental``` and is intended 
for advanced users. It may be replaced in future with read/write support based 
on SparkSQL, in which case SparkSQL is the preferred approach.

git commit: [SPARK-2013] Documentation for saveAsPickleFile and pickleFile in Python

Reply via email to