Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-20 Thread Rob Coli
On 8/20/10 1:58 PM, Julie wrote: Julie nextcentury.com> writes: Please see previous post but is hinted handoff a factor if the CL is set to ALL? Your previous post looks like a flush or compaction is causing the node to mark its neighbors down. Do you see correlation between memtable flush

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-20 Thread Julie
Julie nextcentury.com> writes: Please see previous post but is hinted handoff a factor if the CL is set to ALL?

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-20 Thread Julie
Robert Coli digg.com> writes: > Check the size of the Hinted Handoff CF? If your nodes are flapping > under sustained write, they could be storing a non-trivial number of > hinted handoff rows? Probably not 5x usage though.. > > http://wiki.apache.org/cassandra/Operations > " > The reason why yo

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-19 Thread Robert Coli
On Thu, Aug 19, 2010 at 7:23 AM, Julie wrote: > At this point, I logged in.  The data distribution on this node was 122GB.  I > started performing a manual nodetool cleanup. Check the size of the Hinted Handoff CF? If your nodes are flapping under sustained write, they could be storing a non-triv

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-19 Thread Julie
Peter Schuller infidyne.com> writes: > Without necessarily dumping all the information - approximately what > do they contain? Do they contain anything about compactions, > anti-compactions, streaming, etc? > > With an idle node after taking writes, I *think* the only expected > disk I/O (once i

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Peter Schuller
> I actually have the log files from all 8 nodes if it helps to diagnose what > activity was going on behind the scenes.  I really need to understand how this > happened. Without necessarily dumping all the information - approximately what do they contain? Do they contain anything about compaction

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Julie
Jonathan Ellis gmail.com> writes: > > If you read the stack traces you pasted, the node in question ran out > of diskspace. When you have < 25% space free this is not surprising. > > But fundamentally you are missing something important from your story > here. Disk space doesn't just increase

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Edward Capriolo
On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis wrote: > If you read the stack traces you pasted, the node in question ran out > of diskspace.  When you have < 25% space free this is not surprising. > > But fundamentally you are missing something important from your story > here.  Disk space does

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Jonathan Ellis
If you read the stack traces you pasted, the node in question ran out of diskspace. When you have < 25% space free this is not surprising. But fundamentally you are missing something important from your story here. Disk space doesn't just increase spontaneously with "absolutely no activity." On

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Julie
Rob Coli digg.com> writes: > As I understand Julie's case, she is : > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction > In oth

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-16 Thread Julie
Rob Coli digg.com> writes: > As I understand Julie's case, she is : > > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction Yes, t

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-06 Thread Peter Schuller
> sstables waiting for the GC to trigger actual file removal. *However*, > and this is what I meant with my follow-up, that still does not > explain the data from her post unless 'nodetool ring' reports total > sstable size rather than the total size of live sstables. As far as I can tell, the inf

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-06 Thread Rob Coli
On 8/6/10 3:30 PM, Peter Schuller wrote: *However*, and this is what I meant with my follow-up, that still does not explain the data from her post unless 'nodetool ring' reports total sstable size rather than the total size of live sstables. Relatively limited time available to respond to this

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-06 Thread Peter Schuller
> Your post refers to "obsolete" sstables, but the only thing that makes them > "obsolete" in this case is that they have been compacted? Yes. > As I understand Julie's case, she is : > > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more dis

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-06 Thread Rob Coli
On 8/5/10 11:51 AM, Peter Schuller wrote: Also, the variation in disk space in your most recent post looks entirely as expected to me and nothing really extreme. The temporary disk space occupied during the compact/cleanup would easily be as high as your original disk space usage to begin with, a

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-06 Thread Peter Schuller
> One thing to keep in mind is that SSTables are not actually removed > from disk until the garbage collector has identified the relevant > in-memory structures as garbage (there is a note on the wiki about However I forgot that the 'load' reported by nodetool ring does not, I think, represent on-

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-05 Thread Peter Schuller
Oh and, > Nodetool cleanup works so beautifully, that I am wondering if there is any > harm > in using "nodetool cleanup" in a cron job on a live system that is actively > processing reads and writes to the database? since a cleanup/compact is supposed to trigger a full compaction, that's genera

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-05 Thread Peter Schuller
> So the manual compaction did help somewhat but did not get the nodes down to > the > size of their raw data.  There are still multiple SSTables on most nodes. > > At 4:02pm, ran nodetool cleanup on every node. > > At 4:12pm, nodes are taking up the expected amount of space and all nodes are > us

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-04 Thread Julie
Jonathan Ellis gmail.com> writes: > > did you try compact instead of cleanup, anyway? > Hi Jonathan, Thanks for your reply. Actually, I didn't use compact, I used cleanup. But I did some testing with compact today since you mentioned it. Using nodetool compact does improve my disk usage on e

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-30 Thread Jonathan Ellis
did you try compact instead of cleanup, anyway? On Tue, Jul 27, 2010 at 1:08 PM, Julie wrote: > Peter Schuller infidyne.com> writes: > >> > a) cleanup is a superset of compaction, so if you've been doing >> > overwrites at all then it will reduce space used for that reason >> > > Hi Peter and Jo

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Julie
Peter Schuller infidyne.com> writes: > > a) cleanup is a superset of compaction, so if you've been doing > > overwrites at all then it will reduce space used for that reason > Hi Peter and Jonathan, In my test, I write 80,000 rows (100KB each row) to an 8 node cluster. The 80,000 rows all hav

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Peter Schuller
> Minor compactions (see > http://wiki.apache.org/cassandra/MemtableSSTable) will try to keep the > growth in check but it is by no means limited to 2x. Sorry I was being unclear. I was rather thinking along the lines of a doubling of data triggering an implicit major compaction. However I was wro

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Jonathan Ellis
On Tue, Jul 27, 2010 at 9:26 AM, Peter Schuller wrote: > I had failed to consider over-writes as a possible culprit (since > removals were stated not to be done). However thinking about it I > believe the effect of this should be limited to roughly a doubling of > disk space in the absolute worst

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Peter Schuller
> a) cleanup is a superset of compaction, so if you've been doing > overwrites at all then it will reduce space used for that reason I had failed to consider over-writes as a possible culprit (since removals were stated not to be done). However thinking about it I believe the effect of this should

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-27 Thread Jonathan Ellis
On Fri, Jul 23, 2010 at 8:57 AM, Julie wrote: > But in my focused testing today I see that if I run nodetool "cleanup" on the > nodes taking up way more space than I expect, I see multiple SS Tables being > combined into 1 or 2 and the live disk usage going way down, down to what I > know > the r

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-23 Thread Julie
Jonathan Ellis gmail.com> writes: > > then obsolete sstables is not your culprit. > I believe I figured out how to force my node disk usage to go down. I had been letting Cassandra perform its own data management, and did not use nodetool to force anything since in our real system, the data w

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-09 Thread Jonathan Ellis
then obsolete sstables is not your culprit. On Thu, Jul 8, 2010 at 8:32 AM, Julie wrote: > Jonathan Ellis gmail.com> writes: > >> "SSTables that are obsoleted by a compaction are deleted >> asynchronously when the JVM performs a GC. You can force a GC from >> jconsole if necessary, but Cassandra

RE: Help! Cassandra disk space utilization WAY higher than I would expect

2010-07-09 Thread Stu Hood
ot;Julie" Sent: Friday, July 9, 2010 9:58am To: user@cassandra.apache.org Subject: Help! Cassandra disk space utilization WAY higher than I would expect Hi guys, I am on the hook to explain why 30GB of data is filling up 106GB of disk space since this is concerning information for my project.

Help! Cassandra disk space utilization WAY higher than I would expect

2010-07-09 Thread Julie
Hi guys, I am on the hook to explain why 30GB of data is filling up 106GB of disk space since this is concerning information for my project. We are very excited about the possibility of using Cassandra but need to understand this anomaly in order to feel confident. Does anyone know why this cou

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-08 Thread Julie
Jonathan Ellis gmail.com> writes: > "SSTables that are obsoleted by a compaction are deleted > asynchronously when the JVM performs a GC. You can force a GC from > jconsole if necessary, but Cassandra will force one itself if it > detects that it is low on space. A compaction marker is also added

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Jonathan Ellis
On Wed, Jul 7, 2010 at 1:22 PM, Julie wrote: > Jonathan Ellis gmail.com> writes: > >> On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com> > wrote: >> > >> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours >> > after all writes have completed.  Compactions should b

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Julie
Jonathan Ellis gmail.com> writes: > On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com> wrote: > > > > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours > > after all writes have completed.  Compactions should be complete, no? > > http://wiki.apache.org/cassandra/

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Julie
Rob Coli digg.com> writes: > Is your workload straight INSERT or does it contain UPDATE and/or > DELETE? If your workload contains UPDATE/DELETE and GCGraceSeconds (10 > days by default) hasn't passed, you might have a non-trivial number of > tombstone rows. Only major compactions clean up to

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Rob Coli
On 7/7/10 10:10 AM, Julie wrote: This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours after all writes have completed. Compactions should be complete, no? Is your workload straight INSERT or does it contain UPDATE and/or DELETE? If your workload contains UPDATE/DELETE

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Jonathan Ellis
On Wed, Jul 7, 2010 at 12:10 PM, Julie wrote: > I am thinking that the timestamps and column names should be included in the > column family stats, which basically says 300,000 rows that are 100KB each=30 > GB.  My rows only have 1 column so there should only be one timestamp.  My > column name is

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Jordan Pittier - Rezel
I see the same thing here. I have tried to do some maths including timestamps, columns name, keys and raw data but in the end cassandra reports a cluster size from 2 to 3 times bigger than the raw data. I am surely missing something in my formula + i have a lot of free hard drive space, so it's not

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Peter Schuller
> I am thinking that the timestamps and column names should be included in the > column family stats, which basically says 300,000 rows that are 100KB each=30 > GB.  My rows only have 1 column so there should only be one timestamp.  My > column name is only 10 bytes long. > > This doesn't explain w

Re: Cassandra disk space utilization WAY higher than I would expect

2010-07-07 Thread Julie
Peter Schuller infidyne.com> writes: > > Keep in mind that there is additional data storage overhead, including > > timestamps and column names. Because the schema can vary from row to row, > > the column names are stored with each row, in addition to the data. Disk > > space-efficiency is not

Re: Cassandra disk space utilization

2010-07-07 Thread Peter Schuller
> Keep in mind that there is additional data storage overhead, including > timestamps and column names. Because the schema can vary from row to row, > the column names are stored with each row, in addition to the data. Disk > space-efficiency is not a primary design goal for Cassandra. If the row'

Re: Cassandra disk space utilization

2010-07-07 Thread Mason Hale
Hi Julie -- Keep in mind that there is additional data storage overhead, including timestamps and column names. Because the schema can vary from row to row, the column names are stored with each row, in addition to the data. Disk space-efficiency is not a primary design goal for Cassandra. Mason

Cassandra disk space utilization

2010-07-07 Thread Julie
Hi guys, I have what may be a dumb question but I am confused by how much disk space is being used by my Cassandra nodes. I have 10 nodes in my cluster with a replication factor of 3. After I write 1,000,000 rows to the database (100kB each), I see that they have been distributed very evenly,