Re: Questions related to the data in SSTable files

2013-10-23 Thread Robert Coli
On Wed, Oct 23, 2013 at 5:23 AM, java8964 java8964 wrote: > We enabled the major repair on every node every 7 days. > This is almost certainly the cause of your many duplicates. If you don't DELETE heavily, consider changing gc_grace_seconds to 34 days and then doing a repair on the first of the

RE: Questions related to the data in SSTable files

2013-10-23 Thread java8964 java8964
ate: Tue, 22 Oct 2013 17:52:24 -0700 Subject: Re: Questions related to the data in SSTable files From: rc...@eventbrite.com To: user@cassandra.apache.org On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 wrote: Any way I can verify how often the system being "repaired"? I can ask a

Re: Questions related to the data in SSTable files

2013-10-22 Thread Robert Coli
On Tue, Oct 22, 2013 at 5:17 PM, java8964 java8964 wrote: > Any way I can verify how often the system being "repaired"? I can ask > another group who maintain the Cassandra cluster. But do you mean that even > the failed writes will be stored in the SSTable files? > "repair" sessions are logged i

RE: Questions related to the data in SSTable files

2013-10-22 Thread java8964 java8964
he regular good data in memtable, then in the SSTable files. Yong Date: Tue, 22 Oct 2013 14:50:07 -0700 Subject: Re: Questions related to the data in SSTable files From: rc...@eventbrite.com To: user@cassandra.apache.org On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 wrote: 1) In the da

Re: Questions related to the data in SSTable files

2013-10-22 Thread Robert Coli
On Tue, Oct 22, 2013 at 2:29 PM, java8964 java8964 wrote: > 1) In the data of full snapshot, I see more than 10% of duplication data. > What I mean duplication is that there are event_activities with the same > (entity_1_id, entity_2_id, entity_3_id, entity_4_id, created_on_timestamp, > column_tim