Re: Running pig script on a remote cluster

Daniel Santos Tue, 11 Feb 2020 15:18:27 -0800

Hello,

I managed to create a properties file with the following contents :


fs.defaultFS=hdfs://hadoopnamenode:9000
mapreduce.framework.name=yarn
yarn.resourcemanager.address=hadoopresourcemanager:8032

It is now submitting the jobs to the cluster. I also set the HADOOP_HOME on my 
laptop to point to the same version of hadoop that is running on the cluster 
(2.7.0). I am running pig version  0.17

Then a main class not found error happened on the yarn nodes where the job was 
scheduled to run. I had to add the following to yarn-site.xml and restart yarn 
and the nodes :

        <property>
                <name>mapreduce.application.classpath</name>
                
<value>/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*,/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*</value>
        </property>

After this change, the script ran. But the pig command only returned after the 
job finished.
Does anyone know how to launch the script and exit immediately to the shell ?
If the job takes a long time I will have to keep the terminal open.

Thanks,
Regards


> On 11 Feb 2020, at 05:25, Vinod Kumar Vavilapalli <[email protected]> wrote:
> 
> It’s running the job in local mode (LocalJobRunner), that’s why. Please check 
> your configuration files and make sure that the right directories are on the 
> classpath. Also look in mapred-site.xml for mapreduce.framework.name (should 
> be yarn).
> 
> Thanks
> +Vinod
> 
>> On Feb 11, 2020, at 2:09 AM, Daniel Santos <[email protected]> wrote:
>> 
>> Hello all,
>> 
>> I have developed a script in my laptop. The script is now ready to be 
>> unleashed in a non secured cluster.
>> But when I do : pig -x mapreduce <script name> it doesn’t return to the 
>> shell immediately. It prints stuff like [LocalJobRunner Map Task Executor #0]
>> 
>> I have exported the PIG_CLASSPATH shell variable to point to a directory 
>> with the cluster’s configuration and its operating on the files located 
>> there.
>> But I would expect the job to be launched, the shell prompt returned and the 
>> job would be left executing independently on the cluster.
>> 
>> Another thing I noticed while developing the script and running it both 
>> locally and on the cluster, is that the web page for there resource manager 
>> does not show the map reduce jobs that pig generates. What do I have to do 
>> to be able to see them ?
>> 
>> Thanks,
>> Regards
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Running pig script on a remote cluster

Reply via email to