Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving stuff.
On Thu, May 3, 2012 at 7:25 AM, Michel Segel <[email protected]> wrote: > Ok... When you get your new hardware... > > Set up one server as your new NN, JT, SN. > Set up the others as a DN. > (Cloudera CDH3u3) > > On your existing cluster... > Remove your old log files, temp files on HDFS anything you don't need. > This should give you some more space. > Start copying some of the directories/files to the new cluster. > As you gain space, decommission a node, rebalance, add node to new cluster... > > It's a slow process. > > Should I remind you to make sure you up you bandwidth setting, and to clean > up the hdfs directories when you repurpose the nodes? > > Does this make sense? > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 3, 2012, at 5:46 AM, Austin Chungath <[email protected]> wrote: > >> Yeah I know :-) >> and this is not a production cluster ;-) and yes there is more hardware >> coming :-) >> >> On Thu, May 3, 2012 at 4:10 PM, Michel Segel >> <[email protected]>wrote: >> >>> Well, you've kind of painted yourself in to a corner... >>> Not sure why you didn't get a response from the Cloudera lists, but it's a >>> generic question... >>> >>> 8 out of 10 TB. Are you talking effective storage or actual disks? >>> And please tell me you've already ordered more hardware.. Right? >>> >>> And please tell me this isn't your production cluster... >>> >>> (Strong hint to Strata and Cloudea... You really want to accept my >>> upcoming proposal talk... ;-) >>> >>> >>> Sent from a remote device. Please excuse any typos... >>> >>> Mike Segel >>> >>> On May 3, 2012, at 5:25 AM, Austin Chungath <[email protected]> wrote: >>> >>>> Yes. This was first posted on the cloudera mailing list. There were no >>>> responses. >>>> >>>> But this is not related to cloudera as such. >>>> >>>> cdh3 is based on apache hadoop 0.20 as the base. My data is in apache >>>> hadoop 0.20.205 >>>> >>>> There is an upgrade namenode option when we are migrating to a higher >>>> version say from 0.20 to 0.20.205 >>>> but here I am downgrading from 0.20.205 to 0.20 (cdh3) >>>> Is this possible? >>>> >>>> >>>> On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi <[email protected] >>>> wrote: >>>> >>>>> Seems like a matter of upgrade. I am not a Cloudera user so would not >>> know >>>>> much, but you might find some help moving this to Cloudera mailing list. >>>>> >>>>> On Thu, May 3, 2012 at 2:51 AM, Austin Chungath <[email protected]> >>>>> wrote: >>>>> >>>>>> There is only one cluster. I am not copying between clusters. >>>>>> >>>>>> Say I have a cluster running apache 0.20.205 with 10 TB storage >>> capacity >>>>>> and has about 8 TB of data. >>>>>> Now how can I migrate the same cluster to use cdh3 and use that same 8 >>> TB >>>>>> of data. >>>>>> >>>>>> I can't copy 8 TB of data using distcp because I have only 2 TB of free >>>>>> space >>>>>> >>>>>> >>>>>> On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> you can actually look at the distcp >>>>>>> >>>>>>> http://hadoop.apache.org/common/docs/r0.20.0/distcp.html >>>>>>> >>>>>>> but this means that you have two different set of clusters available >>> to >>>>>> do >>>>>>> the migration >>>>>>> >>>>>>> On Thu, May 3, 2012 at 12:51 PM, Austin Chungath <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for the suggestions, >>>>>>>> My concerns are that I can't actually copyToLocal from the dfs >>>>> because >>>>>>> the >>>>>>>> data is huge. >>>>>>>> >>>>>>>> Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a >>>>>>>> namenode upgrade. I don't have to copy data out of dfs. >>>>>>>> >>>>>>>> But here I am having Apache hadoop 0.20.205 and I want to use CDH3 >>>>> now, >>>>>>>> which is based on 0.20 >>>>>>>> Now it is actually a downgrade as 0.20.205's namenode info has to be >>>>>> used >>>>>>>> by 0.20's namenode. >>>>>>>> >>>>>>>> Any idea how I can achieve what I am trying to do? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar < >>>>> [email protected] >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> i can think of following options >>>>>>>>> >>>>>>>>> 1) write a simple get and put code which gets the data from DFS and >>>>>>> loads >>>>>>>>> it in dfs >>>>>>>>> 2) see if the distcp between both versions are compatible >>>>>>>>> 3) this is what I had done (and my data was hardly few hundred GB) >>>>> .. >>>>>>>> did a >>>>>>>>> dfs -copyToLocal and then in the new grid did a copyFromLocal >>>>>>>>> >>>>>>>>> On Thu, May 3, 2012 at 11:41 AM, Austin Chungath < >>>>> [email protected] >>>>>>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> I am migrating from Apache hadoop 0.20.205 to CDH3u3. >>>>>>>>>> I don't want to lose the data that is in the HDFS of Apache >>>>> hadoop >>>>>>>>>> 0.20.205. >>>>>>>>>> How do I migrate to CDH3u3 but keep the data that I have on >>>>>> 0.20.205. >>>>>>>>>> What is the best practice/ techniques to do this? >>>>>>>>>> >>>>>>>>>> Thanks & Regards, >>>>>>>>>> Austin >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Nitin Pawar >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Nitin Pawar >>>>>>> >>>>>> >>>>> >>>
