I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
understand how to check progress.

I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
slowly. Sometimes not at all after 24 hours of processing.

Is that value accurate? Does the shuffle operation supports
disabling/re-enabling (or restarting the cluster) and resuming from the
last position? Or does it start over?

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
> If your shuffle succeeds, you will be the first reported case of
shuffle succeeding on a non-test cluster.

Awesome! :O

I'll try to migrate to a new cluster then.

Any better alternatives than creating a small application that reads from
one cluster and inserts in the new one that anybody can suggest?

On Tuesday, September 17, 2013, Robert Coli wrote:

> On Tue, Sep 17, 2013 at 12:13 PM, Juan Manuel Formoso 
> 
> >wrote:
>
> > I am running shuffle on a cluster after upgrading to 1.2.X, and I don't
> > understand how to check progress.
> >
>
> If your shuffle succeeds, you will be the first reported case of shuffle
> succeeding on a non-test cluster. Until I hear a report of someone having
> real world success, I recommend against using shuffle.
>
> If you want to enable vnodes on a cluster with existing data, IMO you
> should fork writes and bulk load a replacement cluster.
>
>
> > I'm counting the lines of cassandra-shuffle ls, and it decreases VERY
> > slowly. Sometimes not at all after 24 hours of processing.
> >
>
> I have heard reports of shuffle taking an insanely long amount of time,
> such as this, as well.
>
>
> > Is that value accurate?
> >
>
> Probably.
>
>
> > Does the shuffle operation supports disabling/re-enabling (or restarting
> > the cluster) and resuming from the last position? Or does it start over?
> >
>
> Yes, via the arguments "enable" and "disable". "clear" is what you use if
> you want to clear the queue and start over.
>
> Note that once you have started shuffle, you don't want to add/remove a
> node until the shuffle is complete.
>
> https://issues.apache.org/jira/browse/CASSANDRA-5525
>
> =Rob
>


-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


How can I switch from multiple disks to a single disk?

2013-09-17 Thread Juan Manuel Formoso
Because I ran out of space when shuffling, I was forced to add multiple
disks on my Cassandra nodes.

When I finish compacting, cleaning up, and repairing, I'd like to remove
them and return to one disk per node.

What is the procedure to make the switch?
Can I just kill cassandra, move the data from one disk to the other, remove
the configuration for the second disk, and re-start cassandra?

I assume files will not have the same name and thus not be overwritten, is
this the case? Does it pick it up just like that?

Thanks

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-17 Thread Juan Manuel Formoso
I have been trying to make it work non-stop since Friday afternoon. I
officially gave up today and I'm going to go the sstableloader route.

I wrote a little of what I tried here:
http://seniorgeek.com.ar/blog/2013/09/16/tips-for-running-cassandra-shuffle/
(I have yet to update it with the fact that I had to give up)

I would strongly recommend you don't use shuffle unless you have very
little data to move around.


On Tue, Sep 17, 2013 at 10:41 PM, Paulo Motta wrote:

> That is very disappointing to hear. Vnodes support is one of the main
> reasons we're upgrading from 1.1.X to 1.2.X.
>
> So you're saying the only feasible way of enabling VNodes on an upgraded C*
> 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
> from the old cluster? Or is it possible to succeed on shuffling, even if
> that means waiting some weeks for the shuffle to complete?
>
>
> 2013/9/17 Robert Coli 
>
> > On Tue, Sep 17, 2013 at 4:00 PM, Juan Manuel Formoso  > >wrote:
> >
> > > Any better alternatives than creating a small application that reads
> from
> > > one cluster and inserts in the new one that anybody can suggest?
> > >
> > >
> > http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
> >
> > In theory if you wanted to do the "copy-the-files" method while enabling
> > vnodes on the target cluster, you could :
> >
> > 1) create new target cluster with vnodes enabled
> > 2) fork writes so they go to both source and target cluster
> > 3) copy 100% of sstables from all source nodes to all target nodes (being
> > sure to ensure non-collision of sstables of names, probably by adding a
> few
> > hundreds/thousands to the sequence of various nodes in a predictable
> > fashion)
> > 4) be certain that you did not accidentally resurrect data from purged
> > source sstables in 3)
> > 5) run cleanup compaction on all nodes in target cluster
> > 6) turn off writes to old source cluster
> >
> > =Rob
> > * notes that this process would make a good blog post.. :D
> >
>
>
>
> --
> Paulo Ricardo
>
> --
> European Master in Distributed Computing***
> Royal Institute of Technology - KTH
> *
> *Instituto Superior Técnico - IST*
> *http://paulormg.com*
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-18 Thread Juan Manuel Formoso
I really like this idea. I can create a new cluster and have it replicate
the old one, after it finishes I can remove the original.

Any good resource that explains how to add a new datacenter to a live
single dc cluster that anybody can recommend?


On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
wrote:

> On 09/17/2013 09:41 PM, Paulo Motta wrote:
>
>> So you're saying the only feasible way of enabling VNodes on an upgraded
>> C*
>> 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
>> from the old cluster? Or is it possible to succeed on shuffling, even if
>> that means waiting some weeks for the shuffle to complete?
>>
>
> In a multi "DC" cluster situation you *should* be able to bring up a new
> DC with vnodes, bootstrap it, and then decommission the old cluster.
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: I don't understand shuffle progress

2013-09-18 Thread Juan Manuel Formoso
Awesome, thanks!

A few final questions:
1) Can I change the Snitch in the live source cluster? I'm using
SimpleSnitch, I'd change it to GossipingPropertyFileSnitch (in preparation
for changing the replication strategy when the new cluster is up and
running).
2) Can I have different Partitioners on the 2 clusters? I have
RandomPartitioner in the current one, I'd like to use Murmur on the new one
(it will be empty at first). Are partitioners only required to be the same
in the same cluster or also across clusters in different DCs?
3) Will I be able to remove the old dc from the cluster when I finish
rebuilding?

Thanks again!


On Wed, Sep 18, 2013 at 11:41 AM, Chris Burroughs  wrote:

> http://www.datastax.com/**documentation/cassandra/1.2/**
> webhelp/index.html#cassandra/**operations/ops_add_dc_to_**cluster_t.html<http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html>
>
> This is a basic outline.
>
>
>
> On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
>
>> I really like this idea. I can create a new cluster and have it replicate
>> the old one, after it finishes I can remove the original.
>>
>> Any good resource that explains how to add a new datacenter to a live
>> single dc cluster that anybody can recommend?
>>
>>
>> On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
>> **wrote:
>>
>>  On 09/17/2013 09:41 PM, Paulo Motta wrote:
>>>
>>>  So you're saying the only feasible way of enabling VNodes on an upgraded
>>>> C*
>>>> 1.2 is by doing fork writes to a brand new cluster + bulk load of
>>>> sstables
>>>> from the old cluster? Or is it possible to succeed on shuffling, even if
>>>> that means waiting some weeks for the shuffle to complete?
>>>>
>>>>
>>> In a multi "DC" cluster situation you *should* be able to bring up a new
>>> DC with vnodes, bootstrap it, and then decommission the old cluster.
>>>
>>>
>>
>>
>>
>


-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Error after decommissioning a datacenter

2013-09-20 Thread Juan Manuel Formoso
I just finished decommissioning a datacenter, and I'm seeing this error in
the replicas in the new one whenever I restart them.

ERROR 17:25:21,471 Exception in thread Thread[GossipStage:6,5,main]
java.lang.NumberFormatException: For input string:
"10035671368176114840423743305116601547"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:422)
at java.lang.Long.parseLong(Long.java:468)
at
org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660)
at
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1515)
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1234)
at
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1958)
at
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:841)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:892)
at
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

The old replicas appear as down, but I'm not worried because no client is
writing or reading against that datacenter, and my replication strategy is
not referencing it.

However, I don't want to see them forever listed there. Is there a way to
manually remove them from the "known replicas list" or wherever they are
stored?

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP