Re: Python Hadoop Example

Nascimento, Rodrigo Mon, 17 Jun 2019 14:08:40 -0700

Wei-Chiu,

I see people using python with Spark (pySpark).

{
  "Name"  : "Rodrigo Nascimento",
  "Title" : "Solutions Architect – Open Ecosystems"
}

From: Wei-Chiu Chuang <[email protected]>
Date: Sunday, June 16, 2019 at 2:01 PM
To: Artem Ervits <[email protected]>
Cc: Mike IT Expert <[email protected]>, user <[email protected]>
Subject: Re: Python Hadoop Example

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.

Thanks Artem,
Looks interesting. I honestly didn't know what Hadoop Streaming API is used for.
Here are more references: 
https://hadoop.apache.org/docs/r3.2.0/hadoop-streaming/HadoopStreaming.html

I think it brings to another question: how do we treat Python as a first class 
citizen. Especially for data science use cases, Python is *the* language.
For example, we have Java and C and (in Hadoop 3.2) C++ client for HDFS. But 
Hadoop does not ship a Python client.
I see a number of Python libraries that support webhdfs. It's not clear to me 
how well they perform, and if they support more advanced features like 
encryption/Kerberos.

NFS gateway is a possibility. Fuse-dfs is another option. But we know they 
don't work at scale, and the community seems to lost the steam to improve 
NFS/fuse-dfs.

Thoughts?

On Sun, Jun 16, 2019 at 6:52 AM Artem Ervits 
<[email protected]<mailto:[email protected]>> wrote:
https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
On Sun, Jun 16, 2019, 9:18 AM Mike IT Expert 
<[email protected]<mailto:[email protected]>> wrote:
Please let me know where I can find a good/simple example of mapreduce Python 
code running on Hadoop. Like tutorial or sth.

Thank you

Re: Python Hadoop Example

Reply via email to