spark git commit: [SPARK-7035] Encourage getitem over getattr on column access in the Python DataFrame API

rxin Thu, 07 May 2015 01:03:01 -0700

Repository: spark
Updated Branches:
  refs/heads/master fa8fddffd -> fae4e2d60



[SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the 
Python DataFrame API

Author: ksonj <[email protected]>

Closes #5971 from ksonj/doc and squashes the following commits:

dadfebb [ksonj] __getitem__ is cleaner than __getattr__


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fae4e2d6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fae4e2d6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fae4e2d6

Branch: refs/heads/master
Commit: fae4e2d6094de57a438ee4188ce47fc5b01b96fe
Parents: fa8fddf
Author: ksonj <[email protected]>
Authored: Thu May 7 01:02:00 2015 -0700
Committer: Reynold Xin <[email protected]>
Committed: Thu May 7 01:02:00 2015 -0700

----------------------------------------------------------------------
 docs/sql-programming-guide.md | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/fae4e2d6/docs/sql-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index b8233ae..df4c123 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -139,7 +139,6 @@ DataFrames provide a domain-specific language for 
structured data manipulation i
 
 Here we include some basic examples of structured data processing using 
DataFrames:
 
-
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
 {% highlight scala %}
@@ -242,6 +241,12 @@ df.groupBy("age").count().show();
 </div>
 
 <div data-lang="python"  markdown="1">
+In Python it's possible to access a DataFrame's columns either by attribute
+(`df.age`) or by indexing (`df['age']`). While the former is convenient for
+interactive data exploration, users are highly encouraged to use the
+latter form, which is future proof and won't break with column names that
+are also attributes on the DataFrame class.
+
 {% highlight python %}
 from pyspark.sql import SQLContext
 sqlContext = SQLContext(sc)
@@ -270,14 +275,14 @@ df.select("name").show()
 ## Justin
 
 # Select everybody, but increment the age by 1
-df.select(df.name, df.age + 1).show()
+df.select(df['name'], df['age'] + 1).show()
 ## name    (age + 1)
 ## Michael null
 ## Andy    31
 ## Justin  20
 
 # Select people older than 21
-df.filter(df.age > 21).show()
+df.filter(df['age'] > 21).show()
 ## age name
 ## 30  Andy
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-7035] Encourage __getitem__ over __getattr__ on column access in the Python DataFrame API

Reply via email to

spark git commit: [SPARK-7035] Encourage getitem over getattr on column access in the Python DataFrame API