Re: downgrade from 1.1.4 to 1.0.X

2012-10-01 Thread Daniel Doubleday
Since I was just fiddling around with sst2json: if you have row level deletes you might get problems since row level deletion info is not exported in at least 1.0. But if you're not using those you might be fine. Віталій Тимчишин wrote: I suppose the way is to convert all SST to json, then i

SST Inconsistency

2012-10-01 Thread Daniel Doubleday
Hi all we are running c* 1.0.8 and found some strange row level tombstone problems. Some rows (~50 in around 2B keys) have markedForDeleteAt timestamps in the future (so they 'drop' all writes) and 0 values as localDeletionTime. A non-thorough check didn't bring up any code paths that could l

Re: architectural understanding of write operation node flow

2012-01-23 Thread Daniel Doubleday
t sure if hints are written when request time out but CL is reached. On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote: > Your first thought was pretty much correct: > > 1. The node which is called by the client is the coordinator > 2. The coordinator determines the nodes in th

Re: architectural understanding of write operation node flow

2012-01-23 Thread Daniel Doubleday
Your first thought was pretty much correct: 1. The node which is called by the client is the coordinator 2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes 3. Given the c

Re: Second Cassandra users survey

2011-11-08 Thread Daniel Doubleday
>> server was heavier than the others, the only choice was to "scale up" >> the hardware. >> >> My understanding of Cassandra's current sharding is consistent and >> random. Does the new feature sit some where in-between? Are you >> thinking of a p

Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
was heavier than the others, the only choice was to "scale up" > the hardware. > > My understanding of Cassandra's current sharding is consistent and > random. Does the new feature sit some where in-between? Are you > thinking of a pluggable API so that you can provide

Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
Allow for deterministic / manual sharding of rows. Right now it seems that there is no way to force rows with different row keys will be stored on the same nodes in the ring. This is our number one reason why we get data inconsistencies when nodes fail. Sometimes a logical transaction requires w

Re: data model for unique users in a time period

2011-11-01 Thread Daniel Doubleday
With leveled compaction this should work pretty nicely. If you need fast access and want to use the row cache you will need to do some further patching though. This is early brainstorming phase so any comments would be welcome Cheers, Daniel Doubleday smeet.com On Oct 31, 2011, at 7:08 PM, Ed

Re: Can not repair

2011-07-21 Thread Daniel Doubleday
Sounds like this one: http://comments.gmane.org/gmane.comp.db.cassandra.user/15828 or http://comments.gmane.org/gmane.comp.db.cassandra.user/15936 Hope you have a backup. That would make your life much easier ... On Jul 21, 2011, at 4:54 PM, cbert...@libero.it wrote: > Hi all, > I can't get

Re: JNA to avoid swap but physical memory increase

2011-07-18 Thread Daniel Doubleday
http://permalink.gmane.org/gmane.comp.db.cassandra.user/14225 but given https://issues.apache.org/jira/browse/CASSANDRA-2868 and me thinking 2 secs longer I guess it was the leaked native memory from gc inspector that has been swapped out. (I didn't believe that mlockall is broken but at that

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Daniel Doubleday
When using jna the mlockall call will result in all pages locked in rss and thus reported there so you have either configured -Xms650M or you are running on a small box and the start script calculated it for you. Also our experience shows that the jna call does not prevent swapping so the gener

Re: Cassandra memory problem

2011-07-07 Thread Daniel Doubleday
res, but the OS can page it out > instead of killing the process. > > On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday > wrote: >> Hi all, >> we have a mem problem with cassandra. res goes up without bounds (well until >> the os kills the process because we dont have

Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
27;ve switched it to Sun and this part of the issue stabilized. The other > issues we had were Heap going through the roof and then OOM under load. > > > On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday > wrote: > Just to make sure: > You were seeing that res mem was mor

Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
score 13723 or a child On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote: > mmap'd data will be attributed to res, but the OS can page it out > instead of killing the process. > > On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday > wrote: >> Hi all, >> we have a

Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
stabilized. The other > issues we had were Heap going through the roof and then OOM under load. > > > On Mon, Jul 4, 2011 at 11:01 AM, Daniel Doubleday > wrote: > Just to make sure: > You were seeing that res mem was more than twice of max java heap and that > did c

Re: Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
eads, writes? > > SC > > On Mon, Jul 4, 2011 at 6:52 AM, Daniel Doubleday > wrote: > Hi all, > > we have a mem problem with cassandra. res goes up without bounds (well until > the os kills the process because we dont have swap) > > I found a thread that

Re: Row cache

2011-07-04 Thread Daniel Doubleday
Just to make sure: The yaml doesn't matter. The cache config is stored in the system tables. Its the "CREATE ... WITH ..." stuff you did via cassandra-cli to create the CF. In Jconsole you see that the cache capacity is > 0? On Jul 4, 2011, at 11:18 AM, Shay Assulin wrote: > Hi, > > The row c

Cassandra memory problem

2011-07-04 Thread Daniel Doubleday
Hi all, we have a mem problem with cassandra. res goes up without bounds (well until the os kills the process because we dont have swap) I found a thread that's about the same problem but on OpenJDK: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-n

Alternative Row Cache Implementation

2011-06-30 Thread Daniel Doubleday
Hi all - or rather devs we have been working on an alternative implementation to the existing row cache(s) We have 2 main goals: - Decrease memory -> get more rows in the cache without suffering a huge performance penalty - Reduce gc pressure This sounds a lot like we should be using the new

Re: Row cache

2011-06-30 Thread Daniel Doubleday
Here's my understanding of things ... (this applies only for the regular heap implementation of row cache) > Why Cassandra does not cache a row that was requested few times? What does the cache capacity read. Is it > 0? > What the ReadCount attribute in ColumnFamilies indicates and why it rema

JRockit

2011-06-01 Thread Daniel Doubleday
Hi all now that JRockit is available for free and the claims are there that it has better performance and gc I wanted to know if anybody out here has done any testing / benchmarking yet. Also interested in deterministic gc ... maybe its worth the 300 bucks? Cheers, Daniel

Re: Database grows 10X bigger after running nodetool repair

2011-05-25 Thread Daniel Doubleday
I can't remember. Easiest way is to configure it to listen only on localhost and restart. Thirdly does anyone know if the problem is contagious i.e. should I consider decommissioning the whole node and try to rebuild from replicas? No. That should not be necessary Good luck Thank

Re: Database grows 10X bigger after running nodetool repair

2011-05-25 Thread Daniel Doubleday
We are having problems with repair too. It sounds like yours are the same. From today: http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619 On May 25, 2011, at 4:52 PM, Dominic Williams wrote: > Hi, > > I've got a strange problem, where the database on a node has inflated 10X > after

Re: repair question

2011-05-25 Thread Daniel Doubleday
ing. So I guess my next repair will be scheduled in 0.8.1. But I don't understand why this did not hit others so hard that it is considered more critical. We seem to use cassandra in unusual ways. Thanks again. Daniel On May 24, 2011, at 9:05 PM, Daniel Doubleday wrote: > Ok th

Re: repair question

2011-05-24 Thread Daniel Doubleday
this book Daniel If your interested here's the log: http://dl.dropbox.com/u/5096376/system.log.gz I also lied about total size of one node. It wasn't 320 but 280. All nodes On May 24, 2011, at 3:41 PM, Sylvain Lebresne wrote: > On Tue, May 24, 2011 at 12:40 AM, Daniel Doubled

Re: repair question

2011-05-23 Thread Daniel Doubleday
We are performing the repair on one node only. Other nodes receive reasonable amounts of data (~500MB). It's only the repairing node itself which 'explodes'. I must admit that I'm a noob when it comes to aes/repair. Its just strange that a cluster that is up and running with no probs is doing

Re: repair question

2011-05-23 Thread Daniel Doubleday
ds of data for that CF from the other nodes. Sigh... On May 23, 2011, at 7:48 PM, Sylvain Lebresne wrote: > On Mon, May 23, 2011 at 7:17 PM, Daniel Doubleday > wrote: >> Hi all >> >> I'm a bit lost: I tried a repair yesterday with only one CF and that didn

repair question

2011-05-23 Thread Daniel Doubleday
Hi all I'm a bit lost: I tried a repair yesterday with only one CF and that didn't really work the way I expected but I thought that would be a bug which only affects that special case. So I tried again for all CFs. I started with a nicely compacted machine with around 320GB of load. Total dis

Documentation of Known Issues

2011-05-20 Thread Daniel Doubleday
Hi all I was wondering if there might be some way to better communicate known issues. We do try to track jira issues but at times some slip through or we miss implications. Things like the broken repair of specific CFs. (https://issues.apache.org/jira/browse/CASSANDRA-2670). I know that this

Berlin Buzzword Hackathon

2011-05-18 Thread Daniel Doubleday
Hi all was wondering if there's anybody here planning to go to the Berlin Buzzwords and attend the cassandra hackathon. I'm still indecisive but it might be good to have the chance to talk about experiences in more detail. Cheers, Daniel

Dynamic Snitch Problem

2011-05-17 Thread Daniel Doubleday
Hi all after upgrading to 0.7 we have a small problem with dynamic snitch: we have rf=3, quorum read/write and read repair prop set to 0. Thus cassandra always shortcuts reads to only 2 hosts. Problem is that one of our nodes get ignored unless using a little patch and initialize the scores.

Re: Monitoring bytes read per cf

2011-05-13 Thread Daniel Doubleday
Thanks - yes I agree. Didn't want to judge solely based on this figure. It should just add to the picture. But since we know access patterns and other stats like key and row cache hit ratios we hope to be able to make a more educated guess whats going on. On May 13, 2011, at 9:08 AM, Peter Sch

Monitoring bytes read per cf

2011-05-12 Thread Daniel Doubleday
Hi all got a question for folks with some code insight again. To be able to better understand where our IO load is coming from we want to monitor the number of bytes read from disc per cf. (we love stats) What I have done is wrapping the FileDataInput in SSTableReader to sum the bytes read in

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Daniel Doubleday
it's the root cause of my > problems, something something encoding error, but that doesn't really help > me. :-) > > However, I've done all my tests with 0.7.5, I'm gonna try them again with > 0.7.4, just to see how that version reacts. > > > /Henrik >

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Daniel Doubleday
ssandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 5 May 2011, at 22:36, aaron morton wrote: >> >>> Interesting but as we are dealing with keys it should not matter as they >>> are treated as byte buffers. >>> >&

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-04 Thread Daniel Doubleday
This is a bit of a wild guess but Windows and encoding and 0.7.5 sounds like https://issues.apache.org/jira/browse/CASSANDRA-2367 On May 3, 2011, at 5:15 PM, Henrik Schröder wrote: > Hey everyone, > > We did some tests before upgrading our Cassandra cluster from 0.6 to 0.7, > just to make su

Re: Strange corrupt sstable

2011-05-02 Thread Daniel Doubleday
apparently happened during compaction was 1. read sst and generate string based order rows 2. write the new file based on that order 3. read the compacted file based on raw bytes order -> crash That bug never made it to production so we are fine. On Apr 29, 2011, at 10:32 AM, Daniel Doubleday wr

Re: best way to backup

2011-04-29 Thread Daniel Doubleday
What we are about to set up is a time machine like backup. This is more like an add on to the s3 backup. Our boxes have an additional larger drive for local backup. We create a new backup snaphot every x hours which hardlinks the files in the previous snapshot (bit like cassandras incremental_b

Re: Strange corrupt sstable

2011-04-29 Thread Daniel Doubleday
Bad == Broken That means you cannot rely on 1 == 1. In such a scenario everything can happen including data loss. That's why you want ECC mem on production servers. Our cheapo dev boxes dont. On Apr 28, 2011, at 7:46 PM, mcasandra wrote: > What do you mean by Bad memory? Is it less heap size,

Strange corrupt sstable

2011-04-28 Thread Daniel Doubleday
Hi all on one of our dev machines we ran into this: INFO [CompactionExecutor:1] 2011-04-28 15:07:35,174 SSTableWriter.java (line 108) Last written key : DecoratedKey(12707736894140473154801792860916528374, 74657374) INFO [CompactionExecutor:1] 2011-04-28 15:07:35,174 SSTableWriter.java (line

Re: Advice on mmap related swapping issue

2011-03-23 Thread Daniel Doubleday
FWIW: For whatever reason jna memlockall does not work for us. jna call is successful but cassandra process swaps anyway. see: http://www.mail-archive.com/user@cassandra.apache.org/msg11235.html We disabled swap entirely. On Mar 22, 2011, at 8:56 PM, Chris Goffinet wrote: > The easiest way to

Re: cassandra nodes with mixed hard disk sizes

2011-03-22 Thread Daniel Doubleday
On Mar 22, 2011, at 5:09 AM, aaron morton wrote: > 1) You should use nodes with the same capacity (CPU, RAM, HDD), cassandra > assumes they are all equal. Care to elaborate? While equal node will certainly make life easier I would have thought that dynamic snitch would take care of performan

Re: nodetool repair on cluster

2011-03-15 Thread Daniel Doubleday
At least if you are using RackUnawareStrategy Cheers, Daniel On Mar 15, 2011, at 6:44 PM, Huy Le wrote: > Hi, > > We have a cluster with 12 servers and use RF=3. When running nodetool > repair, do we have to run it on all nodes on the cluster or can we run on > every 3rd node? Thanks! > >

jna and swapping

2011-03-15 Thread Daniel Doubleday
Hi all strange things here: we are using jna. Log file says mlockall was successful. We start with -Xms2000M -Xmx2000M and run cassandra as root process so RLIMIT_MEMLOCK limit should have no relevance. Still cassandra is swapping ... Used swap varies between 100MB - 800MB We removed the swap

Increase flush writer queue

2011-03-14 Thread Daniel Doubleday
Hi all, on 0.6: we are facing increased write latencies every now and then when an unfortunate write command thread becomes the flush writer for a mem table because of an already running mem table flush. I was thinking of setting the work queue in CFS.flushWriterPool to new LinkedBlockingQu

mixed cluster 0.6.9 and 0.6.12

2011-03-09 Thread Daniel Doubleday
Hi all we are still on 0.6.9 and plan to upgrade to 0.6.12 but are a little concerned about: https://issues.apache.org/jira/browse/CASSANDRA-2170 I thought of upgrading only one node (of 5) to .12 and monitor for a couple of days. Is this a bad idea? Thanks, Daniel

Re: Alternative to repair

2011-03-08 Thread Daniel Doubleday
cluster load is lower. For the time being I guess thats good enough and we hope that 0.7 works a little smoother when doing repairs. Cheers, Daniel On Mar 7, 2011, at 7:22 PM, Jonathan Ellis wrote: > On Mon, Mar 7, 2011 at 11:18 AM, Daniel Doubleday > wrote: >> Since we alre

Alternative to repair

2011-03-07 Thread Daniel Doubleday
Hi all we're still on 0.6 and are facing problems with repairs. I.e. a repair for one CF takes around 60h and we have to do that twice (RF=3, 5 nodes). During that time the cluster is under pretty heavy IO load. It kinda works but during peek times we see lots of dropped messages (including wr

Re: Does variation in no of columns in rows over the column family has any performance impact ?

2011-02-07 Thread Daniel Doubleday
It depends a little on your write pattern: - Wide rows tend to get distributed over more sstables so more disk reads are necessary. This will become noticeable when you have high io load and reads actually hit the discs. - If you delete a lot slice query performance might suffer: extreme example

Re: Using Cassandra to store files

2011-02-04 Thread Daniel Doubleday
; brendan.po...@new-law.co.uk > 029 2078 4283 > www.new-law.co.uk > > > > > > From: Daniel Doubleday [mailto:daniel.double...@gmx.net] > Sent: 03 February 2011 17:21 > To: user@cassandra.apache.org > Subject: Re: Using Cassandra to store files &g

Re: Using Cassandra to store files

2011-02-03 Thread Daniel Doubleday
Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok directory structure. We are doing this. Our documents are pretty small though (a few kb). We have around 40M right now with around 300GB total. Generally the problem is that much data usually means that cassandra beco

Re: monitoring with Zabbix

2011-01-10 Thread Daniel Doubleday
We use zabbix and cassandra like so: http://www.mail-archive.com/user@cassandra.apache.org/msg08100.html Daniel Doubleday, smeet.com On Jan 9, 2011, at 1:09 AM, ruslan usifov wrote: > Zapcat is a simple bridge between JMX and zabbix protocol, and it imho > doesn't allow collec

Row Cache / Slice Cache

2011-01-03 Thread Daniel Doubleday
/ tried something in that direction. Cheers, Daniel Doubleday smeet.com, Berlin

Problematic usage pattern

2010-12-22 Thread Daniel Doubleday
Hi all wanted to share a cassandra usage pattern you might want to avoid (if you can). The combinations of - heavy rows, - large volume and - many updates (overwriting columns) will lead to a higher count of live ssts (at least if you're not starting mayor compactions a lot) with many ssts ac

Re: Virtual IP / hardware load balancing for cassandra nodes

2010-12-20 Thread Daniel Doubleday
You will loose part of the retry / fallback functionality offered by hector. The job of the client lib is not only load-balancing. I.e. if a node is bootstrapping it will accept TCP connections but throw an exception which will be communicated via thrift. The client lib is supposed to handle tha

Re: Read Latency Degradation

2010-12-19 Thread Daniel Doubleday
On 19.12.10 03:05, Wayne wrote: Rereading through everything again I am starting to wonder if the page cache is being affected by compaction. Oh yes ... http://chbits.blogspot.com/2010/06/lucene-and-fadvisemadvise.html https://issues.apache.org/jira/browse/CASSANDRA-1470 We have been heavily

Re: Read Latency Degradation

2010-12-17 Thread Daniel Doubleday
serve up 15+TB of data. Based on what we have seen we need 100 Cassandra > > nodes with rf=3 to give us good read latency (by keeping the node data sizes > > down). The cost/value equation just does not add up. > > > > Thanks in advance for any advice/experience you can provi

Re: Dynamic Snitch / Read Path Questions

2010-12-17 Thread Daniel Doubleday
> the purpose of your thread is: How far are you away from being I/O > bound (say in terms of % utilization - last column of iostat -x 1 - > assuming you don't have a massive RAID underneath the block device) No my cheap boss didn't want to by me a stack of these http://www.ocztechnology.com/prod

Cassandra Monitoring

2010-12-17 Thread Daniel Doubleday
How / what are you monitoring? Best practices someone? Cheers, Daniel Doubleday, smeet.com, Berlin

Re: Read Latency Degradation

2010-12-17 Thread Daniel Doubleday
On Dec 16, 2010, at 11:35 PM, Wayne wrote: > I have read that read latency goes up with the total data size, but to what > degree should we expect a degradation in performance? What is the "normal" > read latency range if there is such a thing for a small slice of scol/cols? > Can we really pu

Re: org.apache.cassandra.service.ReadResponseResolver question

2010-12-15 Thread Daniel Doubleday
... Thanks, Daniel smeet.com, Berlin > On Tue, Dec 14, 2010 at 1:55 PM, Daniel Doubleday > wrote: > Hi > > I'm sorry - don't want to be a pain in the neck with source questions. So > please just ignore me if this is stupid: > > Isn't org.apache.cassa

org.apache.cassandra.service.ReadResponseResolver question

2010-12-14 Thread Daniel Doubleday
Hi I'm sorry - don't want to be a pain in the neck with source questions. So please just ignore me if this is stupid: Isn't org.apache.cassandra.service.ReadResponseResolver suposed to throw a DigestMismatchException if it receives a digest wich does not match the digest of a read message? If

Re: Dynamic Snitch / Read Path Questions

2010-12-14 Thread Daniel Doubleday
On Dec 14, 2010, at 2:29 AM, Brandon Williams wrote: > On Mon, Dec 13, 2010 at 6:43 PM, Daniel Doubleday > wrote: > Oh - well but I see that the coordinator is actually using its own score for > ordering. I was only concerned that dropped messages are ignored when > calculatin

Re: Dynamic Snitch / Read Path Questions

2010-12-13 Thread Daniel Doubleday
On 13.12.10 21:15, Brandon Williams wrote: On Sun, Dec 12, 2010 at 10:49 AM, Daniel Doubleday mailto:daniel.double...@gmx.net>> wrote: Hi again. It would be great if someone could comment whether the following is true or not. I tried to understand the consequences of

Re: Dynamic Snitch / Read Path Questions

2010-12-13 Thread Daniel Doubleday
Hi Peter I should have started with the why instead of what ... Background Info (I try to be brief ...) We have a very small production cluster (started with 3 nodes, now we have 5). Most of our data is currently in mysql but we want to slowly move the larger tables which are killing our mysql

Dynamic Snitch / Read Path Questions

2010-12-12 Thread Daniel Doubleday
Hi again. It would be great if someone could comment whether the following is true or not. I tried to understand the consequences of using |-Dcassandra.dynamic_snitch=true for the read path |and that's what I came up with: 1) If using CL > 1 than using the dynamic snitch will result in a dat

Re: Stuck with adding nodes

2010-12-10 Thread Daniel Doubleday
Thanks for your help Peter. We gave up and rolled back to our mysql implementation (we did all writes to our old store in parallel so we did not lose anything). Problem was that every solution we came up with would require at least on major compaction before the new nodes could join and our clus

Stuck with adding nodes

2010-12-09 Thread Daniel Doubleday
Hi good people. I underestimated load during peak times and now I'm stuck with our production cluster. Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster The problem derives from our quorum read

Re: Dont bogart that connection my friend

2010-12-04 Thread Daniel Doubleday
sion of Cassandra is this? On Fri, Dec 3, 2010 at 7:27 PM, Daniel Doubleday wrote: Yes. I thought that would make sense, no? I guessed that the quorum read forces the slowest of the 3 nodes to keep the pace of the faster ones. But it cant. No matter how small the performance diff is. So it will

Re: Dont bogart that connection my friend

2010-12-03 Thread Daniel Doubleday
rs shooting their own foot as I did. On 03.12.10 23:36, Jonathan Ellis wrote: Am I understanding correctly that you had all connections going to one cassandra node, which caused one of the *other* nodes to die, and spreading the connections around the cluster fixed it? On Fri, Dec 3, 2010 at 4:00 AM, D

Re: Best Practice for Data Center Migration

2010-12-03 Thread Daniel Doubleday
imary node or the node after the the primary (when the primary was located in the switched off dc) Daniel Doubleday smeet.com, Berlin On Dec 2, 2010, at 6:11 PM, Jonathan Ellis wrote: > On Thu, Dec 2, 2010 at 4:08 AM, Jake Maizel wrote: >> Hello, >> >> We have a ring of 1

Dont bogart that connection my friend

2010-12-03 Thread Daniel Doubleday
f all rows on one node is enough. But the same thing will probably happen if you scan by continuos tokens (meaning that you will read from the same node a long time). Cheers, Daniel Doubleday smeet.com, Berlin

Re: High BloomFilterFalseRation

2010-11-02 Thread Daniel Doubleday
exist this ration will always show 1.0 Meaning it is rather a measure of how many of your queries ask for non existing values. Cheers, Daniel On Oct 28, 2010, at 1:10 PM, Daniel Doubleday wrote: > Hi Ryan > > I took a sample of one sstable (just flushed, not compacted). > &g

Re: High BloomFilterFalseRation

2010-10-28 Thread Daniel Doubleday
file size: 110730565 bytes rows: 47432 FILTER FILE file size: 96565 bytes bloom filter bitset size: 771904 bloom filter bitset cardinalaity: 354610 On Oct 27, 2010, at 6:41 PM, Ryan King wrote: > On Wed, Oct 27, 2010 at 3:24 AM, Daniel Doubleday > wrote: >> Hi people >> >>

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
l data model it's not unlikely that this sort of skew exists since you'd tend to query for items towards the root of the hierarchy more frequently. Mike On Wed, Oct 27, 2010 at 2:14 PM, Daniel Doubleday mailto:daniel.double...@gmx.net>> wrote: Hm - not sure if I u

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
gt; couple outlier rows causing the false positives that are being queried > over and over then that could just be the luck of the draw. > > On Wed, Oct 27, 2010 at 5:24 AM, Daniel Doubleday > wrote: >> Hi people >> >> We are currently moving our second use case from mysq

High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hi people We are currently moving our second use case from mysql to cassandra. While importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to be pretty high compared to another CF which is in used in production right now. Its a hierarchical data model and I cannot avoid t

Re: nodetool repair

2010-09-22 Thread Daniel Doubleday
Hi all, just wanted to make sure that I get this right: What this means is that I have to schedule repairs only on every RFs node? So with 4 nodes and RF=2 I would repair nodes 1 and 3 and with 6 nodes and RF=3 I would repair nodes 1 and 4 and that would lead to a synched cluster? > On Thu, Jul 15

Read before Write

2010-08-27 Thread Daniel Doubleday
Hi people I was wondering if anyone already benchmarked such a situation: I have: day of year (row key) -> SomeId (column key) -> byte[0] I need to make sure that I write SomeId, but in around 80% of the cases it will be already present (so I would essentially replace it with itself). RF will