Greetings All !!! I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in which 5 are used for a fully distributed cluster, 1 for pseudo-distributed & 1 as management-node.
Fully distributed cluster: HDFS, Mapreduce & Hbase cluster Pseudo distributed mode: All I had read about we can install Pig, hive & Sqoop on the client node, no need to install it in cluster. What is the client node actually? Can I use my management-node as a client? What is the best practice to install Pig, Hive, & Sqoop? For the fully distributed cluster do we need to install Pig, Hive, & Sqoop in each nodes? Mysql is needed for Hive as a metastore and sqoop can import mysql database to HDFS or hive or pig, so can we make use of mysql DB's residing on another node? -- Thanks & Regards ---- Manu S SI Engineer - OpenSource & HPC Wipro Infotech Mob: +91 8861302855 Skype: manuspkd www.opensourcetalk.co.in
