Re: SSTable count at 10K during repair (won't decrease)

2016-05-20 Thread Fabrice Facorat
Are you using repairParallelism = sequential or parallel ? As said by Alain: - try to decrease streamthroughput to avoid overflooding nodes with a lots of (small) streamed sstables - if you are using // repair, switch to sequential - don't start too much repair simultaneously. - Do you really need

Re: Increase compaction performance

2016-05-20 Thread Fabrice Facorat
@Alain: Indeed when repairing (or bootstraping) all sstables end up in L0 as original level is not passed down to the node. So cassandra end up compacting a lot of sstables in L0 before trying to make them move to upper levels. The issue still exist in 2.1 and is even worse as you have less concu

Re: SSTable count at 10K during repair (won't decrease)

2016-05-20 Thread Alain RODRIGUEZ
Thanks for the detailed information that definitely deserves an answer, even a bit late. Any suggestions as what I should look at ? Is the node receiving a lot of streams during the repair ? What does this output? 'nodetool netstats -H' or 'nodetool netstats -H | grep -v 100%' and 'iftop' Th

Consistency level ONE and using withLocalDC

2016-05-20 Thread George Sigletos
Hello, Using withLocalDC="myLocalDC" and withUsedHostsPerRemoteDc>0 will guarantee that you will connect to one of the nodes in "myLocalDC", but DOES NOT guarantee that your read/write request will be acknowledged by a "myLocalDC" node. It may well be acknowledged by a remote DC node as well, eve

Re: on-disk size vs partition-size in cfhistograms

2016-05-20 Thread Alain RODRIGUEZ
Hi Joseph, The approach i took was to insert increasing number of rows into a replica > of the table to sized, watch the size of the "data" directory (after doing > nodetool flush and compact), and calculate the average size per row (total > directory size/count of rows). Can this be considered a

Thrift client creates massive amounts of network packets

2016-05-20 Thread Ralf Steppacher
Hi all, tl:dr The Titan 0.5.4 cassandrathrift client + C* 2.0.8/2.2.6 create massive amounts of network packets for multiget_slice queries. Is there a way to avoid the “packet storm”? Details... We are using Titan 0.5.4 with its cassandrathrift storage engine to connect to a single node clu