Re: Running pig script on a remote cluster

Daniel Santos Wed, 19 Feb 2020 12:05:30 -0800

Hello,

The script is a simple pig script that reads files in a remote hdfs and then 
processes them. It runs 3 map reduce jobs in yarn.
On the cluster it takes several hours to run on the dataset.


Do you mean that I have to run the script on the server ?

Thanks,
Regards

> On 16 Feb 2020, at 04:31, Tushar Kapila <[email protected]> wrote:
> 
> Depends on what the script does? Of it's launching a job on a remote cluster 
> then yes.
> 
> Bit of script does something more and needs to run for longer than no.
> 
> But if script it on a remote system, not what you asked but an alternative 
> see 
> https://stackoverflow.com/questions/39574653/error-executing-pigserver-in-java
>  
> <https://stackoverflow.com/questions/39574653/error-executing-pigserver-in-java>
> On Sat, 15 Feb, 2020, 19:16 Daniel Santos, <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello,
> 
> What I was thinking was : launching the pig script on my laptop, the hadoop 
> cluster would be left executing it, and I could shut down the laptop.
> 
> Is this possible ?
> 
> Thanks,
> Regards
> 
>> On 12 Feb 2020, at 02:06, Shashwat Shriparv <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> nohup <your pig command> &
>> 
>> 
>> Warm Regards,
>> Shashwat Shriparv
>> http://bit.ly/14cHpad <http://bit.ly/14cHpad> 
>> http://goo.gl/rxz0z8 <http://goo.gl/rxz0z8>
>> http://goo.gl/RKyqO8 <http://goo.gl/RKyqO8>
>> http://helpmetocode.blogspot.in/
>>  <http://helpmetocode.blogspot.in/>
>> http://photoinfinity.blogspot.in/
>>  <http://photoinfinity.blogspot.in/>
>> http://writingishabit.blogspot.in/
>>  <http://writingishabit.blogspot.in/>
>> http://realiq.blogspot.in/
>>  <http://realiq.blogspot.in/>
>> http://sshriparv.blogspot.in/ <http://sshriparv.blogspot.in/>
>> https://goo.gl/M8Us3B <https://goo.gl/M8Us3B>
>> https://goo.gl/nrI2mv <https://goo.gl/nrI2mv>
>> https://500px.com/shriparv <https://500px.com/shriparv>
>> https://www.flickr.com/photos/55141469@N02/ 
>> <https://www.flickr.com/photos/55141469@N02/>
>> https://about.me/shriparv <https://about.me/shriparv>
>> ISBN - 10: 1783985941
>> ISBN - 13: 9781783985944
>>  <https://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9> 
>> <https://twitter.com/shriparv> <https://www.facebook.com/shriparv> 
>> <http://google.com/+ShashwatShriparv> 
>> <http://www.youtube.com/user/sShriparv/videos> <mailto:[email protected]>
>> 
>> 
>> 
>> On Wed, 12 Feb 2020 at 04:48, Daniel Santos <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello,
>> 
>> I managed to create a properties file with the following contents :
>> 
>> fs.defaultFS=hdfs://hadoopnamenode:9000 <>
>> mapreduce.framework.name <http://mapreduce.framework.name/>=yarn
>> yarn.resourcemanager.address=hadoopresourcemanager:8032
>> 
>> It is now submitting the jobs to the cluster. I also set the HADOOP_HOME on 
>> my laptop to point to the same version of hadoop that is running on the 
>> cluster (2.7.0). I am running pig version  0.17
>> 
>> Then a main class not found error happened on the yarn nodes where the job 
>> was scheduled to run. I had to add the following to yarn-site.xml and 
>> restart yarn and the nodes :
>> 
>>         <property>
>>                 <name>mapreduce.application.classpath</name>
>>                 
>> <value>/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*,/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*</value>
>>         </property>
>> 
>> After this change, the script ran. But the pig command only returned after 
>> the job finished.
>> Does anyone know how to launch the script and exit immediately to the shell ?
>> If the job takes a long time I will have to keep the terminal open.
>> 
>> Thanks,
>> Regards
>> 
>> 
>> > On 11 Feb 2020, at 05:25, Vinod Kumar Vavilapalli <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> > 
>> > It’s running the job in local mode (LocalJobRunner), that’s why. Please 
>> > check your configuration files and make sure that the right directories 
>> > are on the classpath. Also look in mapred-site.xml for 
>> > mapreduce.framework.name <http://mapreduce.framework.name/> (should be 
>> > yarn).
>> > 
>> > Thanks
>> > +Vinod
>> > 
>> >> On Feb 11, 2020, at 2:09 AM, Daniel Santos <[email protected] 
>> >> <mailto:[email protected]>> wrote:
>> >> 
>> >> Hello all,
>> >> 
>> >> I have developed a script in my laptop. The script is now ready to be 
>> >> unleashed in a non secured cluster.
>> >> But when I do : pig -x mapreduce <script name> it doesn’t return to the 
>> >> shell immediately. It prints stuff like [LocalJobRunner Map Task Executor 
>> >> #0]
>> >> 
>> >> I have exported the PIG_CLASSPATH shell variable to point to a directory 
>> >> with the cluster’s configuration and its operating on the files located 
>> >> there.
>> >> But I would expect the job to be launched, the shell prompt returned and 
>> >> the job would be left executing independently on the cluster.
>> >> 
>> >> Another thing I noticed while developing the script and running it both 
>> >> locally and on the cluster, is that the web page for there resource 
>> >> manager does not show the map reduce jobs that pig generates. What do I 
>> >> have to do to be able to see them ?
>> >> 
>> >> Thanks,
>> >> Regards
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected] 
>> >> <mailto:[email protected]>
>> >> For additional commands, e-mail: [email protected] 
>> >> <mailto:[email protected]>
>> >> 
>> > 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected] 
>> <mailto:[email protected]>
>> For additional commands, e-mail: [email protected] 
>> <mailto:[email protected]>
>> 
>

Re: Running pig script on a remote cluster

Reply via email to