Hello,
I managed to create a properties file with the following contents :
fs.defaultFS=hdfs://hadoopnamenode:9000
mapreduce.framework.name=yarn
yarn.resourcemanager.address=hadoopresourcemanager:8032
It is now submitting the jobs to the cluster. I also set the HADOOP_HOME on my
laptop to point to the same version of hadoop that is running on the cluster
(2.7.0). I am running pig version 0.17
Then a main class not found error happened on the yarn nodes where the job was
scheduled to run. I had to add the following to yarn-site.xml and restart yarn
and the nodes :
<property>
<name>mapreduce.application.classpath</name>
<value>/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*,/home/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/*,/home/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/*,/home/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*,/home/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*</value>
</property>
After this change, the script ran. But the pig command only returned after the
job finished.
Does anyone know how to launch the script and exit immediately to the shell ?
If the job takes a long time I will have to keep the terminal open.
Thanks,
Regards
> On 11 Feb 2020, at 05:25, Vinod Kumar Vavilapalli <[email protected]> wrote:
>
> It’s running the job in local mode (LocalJobRunner), that’s why. Please check
> your configuration files and make sure that the right directories are on the
> classpath. Also look in mapred-site.xml for mapreduce.framework.name (should
> be yarn).
>
> Thanks
> +Vinod
>
>> On Feb 11, 2020, at 2:09 AM, Daniel Santos <[email protected]> wrote:
>>
>> Hello all,
>>
>> I have developed a script in my laptop. The script is now ready to be
>> unleashed in a non secured cluster.
>> But when I do : pig -x mapreduce <script name> it doesn’t return to the
>> shell immediately. It prints stuff like [LocalJobRunner Map Task Executor #0]
>>
>> I have exported the PIG_CLASSPATH shell variable to point to a directory
>> with the cluster’s configuration and its operating on the files located
>> there.
>> But I would expect the job to be launched, the shell prompt returned and the
>> job would be left executing independently on the cluster.
>>
>> Another thing I noticed while developing the script and running it both
>> locally and on the cluster, is that the web page for there resource manager
>> does not show the map reduce jobs that pig generates. What do I have to do
>> to be able to see them ?
>>
>> Thanks,
>> Regards
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]