Re: Questions related to the data in SSTable files

2013-10-23 Thread Robert Coli
On Wed, Oct 23, 2013 at 5:23 AM, java8964 java8964 wrote: > We enabled the major repair on every node every 7 days. > This is almost certainly the cause of your many duplicates. If you don't DELETE heavily, consider changing gc_grace_seconds to 34 days and then doing a repair on the first of the

RE: Questions related to the data in SSTable files

2013-10-23 Thread java8964 java8964
ate: Tue, 22 Oct 2013 17:52:24 -0700 Subject: Re: Questions related to the data in SSTable files From: rc...@eventbrite.com To: user@cassandra.apache.org On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 wrote: Any way I can verify how often the system being "repaired"? I can ask a

Re: Questions related to the data in SSTable files

2013-10-22 Thread Robert Coli
On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 wrote: > Any way I can verify how often the system being "repaired"? I can ask > another group who maintain the Cassandra cluster. But do you mean that even > the failed writes will be stored in the SSTable files? > "repair" sessions are logged i

RE: Questions related to the data in SSTable files

2013-10-22 Thread java8964 java8964
he regular good data in memtable, then in the SSTable files. Yong Date: Tue, 22 Oct 2013 14:50:07 -0700 Subject: Re: Questions related to the data in SSTable files From: rc...@eventbrite.com To: user@cassandra.apache.org On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 wrote: 1) In the da

Re: Questions related to the data in SSTable files

2013-10-22 Thread Robert Coli
On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 wrote: > 1) In the data of full snapshot, I see more than 10% of duplication data. > What I mean duplication is that there are event_activities with the same > (entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp, > column_tim

Questions related to the data in SSTable files

2013-10-22 Thread java8964 java8964
Hi, I have some questions related the data in the SSTable files. Our production environment has 36 boxes, so in theory 12 of them will make one group of data without replication. Right now, I got all the SSTable files from 12 nodes of the cluster (Based on my understanding, these 12 nodes are one