Re: Bloom filter false positives high

2019-05-16 Thread Martin Mačura
I've decreased bloom_filter_fp_chance from 0.01 to 0.001. The sstableupgrade took 3 days to complete. And this is a result: node1 Bloom filter false positives: 380965 Bloom filter false ratio: 0.46560 Bloom filter space used: 27.1 MiB Blo

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
Lastly I wonder if that number is very same from every node you connect your nodetool to. Do all nodes see very similar false positives ratio / number? On Wed, 17 Apr 2019 at 21:41, Stefan Miklosovic wrote: > > One thing comes to my mind but my reasoning is questionable as I am > not an expert in

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
One thing comes to my mind but my reasoning is questionable as I am not an expert in this. If you think about this, the whole concept of Bloom filter is to check if some record is in particular SSTable. False positive mean that, obviously, filter thought it was there but in fact it is not. So Cass

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
We cannot run any repairs on these tables. Whenever we tried it (incremental or full or partitioner range), it caused a node to run out of disk space during anticompaction. We'll try again once Cassandra 4.0 is released. On Wed, Apr 17, 2019 at 1:07 PM Stefan Miklosovic < stefan.mikloso...@insta

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
if you invoke nodetool it gets false positives number from this metric https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/metrics/TableMetrics.java#L564-L578 You get high false positives so this accumulates them https://github.com/apache/cassandra/blob/cassandr

Re: Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Both tables use the default bloom_filter_fp_chance of 0.01 ... CREATE TABLE ... ( a int, b int, bucket timestamp, ts timeuuid, c int, ... PRIMARY KEY ((a, b, bucket), ts, c) ) WITH CLUSTERING ORDER BY (ts DESC, monitor ASC) AND bloom_filter_fp_chance = 0.01 AND compaction =

Re: Bloom filter false positives high

2019-04-17 Thread Stefan Miklosovic
What is your bloom_filter_fp_chance for either table? I guess it is bigger for the first one, bigger that number is between 0 and 1, less memory it will use (17 MiB against 54.9 Mib) which means more false positives you will get. On Wed, 17 Apr 2019 at 19:59, Martin Mačura wrote: > > Hi, > I have

Bloom filter false positives high

2019-04-17 Thread Martin Mačura
Hi, I have a table with poor bloom filter false ratio: SSTable count: 1223 Space used (live): 726.58 GiB Number of partitions (estimate): 8592749 Bloom filter false positives: 35796352 Bloom filter false ratio: 0.68472