On 8/20/10 1:58 PM, Julie wrote:
Julie nextcentury.com> writes:
Please see previous post but is hinted handoff a factor if the CL is set to ALL?
Your previous post looks like a flush or compaction is causing the node
to mark its neighbors down. Do you see correlation between memtable
flush
Julie nextcentury.com> writes:
Please see previous post but is hinted handoff a factor if the CL is set to ALL?
Robert Coli digg.com> writes:
> Check the size of the Hinted Handoff CF? If your nodes are flapping
> under sustained write, they could be storing a non-trivial number of
> hinted handoff rows? Probably not 5x usage though..
>
> http://wiki.apache.org/cassandra/Operations
> "
> The reason why yo
On Thu, Aug 19, 2010 at 7:23 AM, Julie wrote:
> At this point, I logged in. The data distribution on this node was 122GB. I
> started performing a manual nodetool cleanup.
Check the size of the Hinted Handoff CF? If your nodes are flapping
under sustained write, they could be storing a non-triv
Peter Schuller infidyne.com> writes:
> Without necessarily dumping all the information - approximately what
> do they contain? Do they contain anything about compactions,
> anti-compactions, streaming, etc?
>
> With an idle node after taking writes, I *think* the only expected
> disk I/O (once i
> I actually have the log files from all 8 nodes if it helps to diagnose what
> activity was going on behind the scenes. I really need to understand how this
> happened.
Without necessarily dumping all the information - approximately what
do they contain? Do they contain anything about compaction
Jonathan Ellis gmail.com> writes:
>
> If you read the stack traces you pasted, the node in question ran out
> of diskspace. When you have < 25% space free this is not surprising.
>
> But fundamentally you are missing something important from your story
> here. Disk space doesn't just increase
On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis wrote:
> If you read the stack traces you pasted, the node in question ran out
> of diskspace. When you have < 25% space free this is not surprising.
>
> But fundamentally you are missing something important from your story
> here. Disk space does
If you read the stack traces you pasted, the node in question ran out
of diskspace. When you have < 25% space free this is not surprising.
But fundamentally you are missing something important from your story
here. Disk space doesn't just increase spontaneously with "absolutely
no activity."
On
Rob Coli digg.com> writes:
> As I understand Julie's case, she is :
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction
> In oth
Rob Coli digg.com> writes:
> As I understand Julie's case, she is :
>
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more disk space (6x?) than is expected is used
> d) but that she gets expected usage if she does a major compaction
Yes, t
> sstables waiting for the GC to trigger actual file removal. *However*,
> and this is what I meant with my follow-up, that still does not
> explain the data from her post unless 'nodetool ring' reports total
> sstable size rather than the total size of live sstables.
As far as I can tell, the inf
On 8/6/10 3:30 PM, Peter Schuller wrote:
*However*,
and this is what I meant with my follow-up, that still does not
explain the data from her post unless 'nodetool ring' reports total
sstable size rather than the total size of live sstables.
Relatively limited time available to respond to this
> Your post refers to "obsolete" sstables, but the only thing that makes them
> "obsolete" in this case is that they have been compacted?
Yes.
> As I understand Julie's case, she is :
>
> a) initializing her cluster
> b) inserting some number of unique keys with CL.ALL
> c) noticing that more dis
On 8/5/10 11:51 AM, Peter Schuller wrote:
Also, the variation in disk space in your most recent post looks
entirely as expected to me and nothing really extreme. The temporary
disk space occupied during the compact/cleanup would easily be as high
as your original disk space usage to begin with, a
> One thing to keep in mind is that SSTables are not actually removed
> from disk until the garbage collector has identified the relevant
> in-memory structures as garbage (there is a note on the wiki about
However I forgot that the 'load' reported by nodetool ring does not, I
think, represent on-
Oh and,
> Nodetool cleanup works so beautifully, that I am wondering if there is any
> harm
> in using "nodetool cleanup" in a cron job on a live system that is actively
> processing reads and writes to the database?
since a cleanup/compact is supposed to trigger a full compaction,
that's genera
> So the manual compaction did help somewhat but did not get the nodes down to
> the
> size of their raw data. There are still multiple SSTables on most nodes.
>
> At 4:02pm, ran nodetool cleanup on every node.
>
> At 4:12pm, nodes are taking up the expected amount of space and all nodes are
> us
Jonathan Ellis gmail.com> writes:
>
> did you try compact instead of cleanup, anyway?
>
Hi Jonathan,
Thanks for your reply. Actually, I didn't use compact, I used cleanup. But I
did some testing with compact today since you mentioned it. Using nodetool
compact does improve my disk usage on e
did you try compact instead of cleanup, anyway?
On Tue, Jul 27, 2010 at 1:08 PM, Julie wrote:
> Peter Schuller infidyne.com> writes:
>
>> > a) cleanup is a superset of compaction, so if you've been doing
>> > overwrites at all then it will reduce space used for that reason
>>
>
> Hi Peter and Jo
Peter Schuller infidyne.com> writes:
> > a) cleanup is a superset of compaction, so if you've been doing
> > overwrites at all then it will reduce space used for that reason
>
Hi Peter and Jonathan,
In my test, I write 80,000 rows (100KB each row) to an 8 node cluster. The
80,000 rows all hav
> Minor compactions (see
> http://wiki.apache.org/cassandra/MemtableSSTable) will try to keep the
> growth in check but it is by no means limited to 2x.
Sorry I was being unclear. I was rather thinking along the lines of a
doubling of data triggering an implicit major compaction. However I
was wro
On Tue, Jul 27, 2010 at 9:26 AM, Peter Schuller
wrote:
> I had failed to consider over-writes as a possible culprit (since
> removals were stated not to be done). However thinking about it I
> believe the effect of this should be limited to roughly a doubling of
> disk space in the absolute worst
> a) cleanup is a superset of compaction, so if you've been doing
> overwrites at all then it will reduce space used for that reason
I had failed to consider over-writes as a possible culprit (since
removals were stated not to be done). However thinking about it I
believe the effect of this should
On Fri, Jul 23, 2010 at 8:57 AM, Julie wrote:
> But in my focused testing today I see that if I run nodetool "cleanup" on the
> nodes taking up way more space than I expect, I see multiple SS Tables being
> combined into 1 or 2 and the live disk usage going way down, down to what I
> know
> the r
Jonathan Ellis gmail.com> writes:
>
> then obsolete sstables is not your culprit.
>
I believe I figured out how to force my node disk usage to go down. I had been
letting Cassandra perform its own data management, and did not use nodetool to
force anything since in our real system, the data w
then obsolete sstables is not your culprit.
On Thu, Jul 8, 2010 at 8:32 AM, Julie wrote:
> Jonathan Ellis gmail.com> writes:
>
>> "SSTables that are obsoleted by a compaction are deleted
>> asynchronously when the JVM performs a GC. You can force a GC from
>> jconsole if necessary, but Cassandra
ot;Julie"
Sent: Friday, July 9, 2010 9:58am
To: user@cassandra.apache.org
Subject: Help! Cassandra disk space utilization WAY higher than I would expect
Hi guys,
I am on the hook to explain why 30GB of data is filling up 106GB of disk space
since this is concerning information for my project.
Hi guys,
I am on the hook to explain why 30GB of data is filling up 106GB of disk space
since this is concerning information for my project.
We are very excited about the possibility of using Cassandra but need to
understand this anomaly in order to feel confident. Does anyone know why this
cou
Jonathan Ellis gmail.com> writes:
> "SSTables that are obsoleted by a compaction are deleted
> asynchronously when the JVM performs a GC. You can force a GC from
> jconsole if necessary, but Cassandra will force one itself if it
> detects that it is low on space. A compaction marker is also added
On Wed, Jul 7, 2010 at 1:22 PM, Julie wrote:
> Jonathan Ellis gmail.com> writes:
>
>> On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com>
> wrote:
>> >
>> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
>> > after all writes have completed. Compactions should b
Jonathan Ellis gmail.com> writes:
> On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com>
wrote:
> >
> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
> > after all writes have completed. Compactions should be complete, no?
>
> http://wiki.apache.org/cassandra/
Rob Coli digg.com> writes:
> Is your workload straight INSERT or does it contain UPDATE and/or
> DELETE? If your workload contains UPDATE/DELETE and GCGraceSeconds (10
> days by default) hasn't passed, you might have a non-trivial number of
> tombstone rows. Only major compactions clean up to
On 7/7/10 10:10 AM, Julie wrote:
This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
after all writes have completed. Compactions should be complete, no?
Is your workload straight INSERT or does it contain UPDATE and/or
DELETE? If your workload contains UPDATE/DELETE
On Wed, Jul 7, 2010 at 12:10 PM, Julie wrote:
> I am thinking that the timestamps and column names should be included in the
> column family stats, which basically says 300,000 rows that are 100KB each=30
> GB. My rows only have 1 column so there should only be one timestamp. My
> column name is
I see the same thing here. I have tried to do some maths including
timestamps, columns name, keys and raw data but in the end cassandra reports
a cluster size from 2 to 3 times bigger than the raw data. I am surely
missing something in my formula + i have a lot of free hard drive space, so
it's not
> I am thinking that the timestamps and column names should be included in the
> column family stats, which basically says 300,000 rows that are 100KB each=30
> GB. My rows only have 1 column so there should only be one timestamp. My
> column name is only 10 bytes long.
>
> This doesn't explain w
Peter Schuller infidyne.com> writes:
> > Keep in mind that there is additional data storage overhead, including
> > timestamps and column names. Because the schema can vary from row to row,
> > the column names are stored with each row, in addition to the data. Disk
> > space-efficiency is not
> Keep in mind that there is additional data storage overhead, including
> timestamps and column names. Because the schema can vary from row to row,
> the column names are stored with each row, in addition to the data. Disk
> space-efficiency is not a primary design goal for Cassandra.
If the row'
Hi Julie --
Keep in mind that there is additional data storage overhead, including
timestamps and column names. Because the schema can vary from row to row,
the column names are stored with each row, in addition to the data. Disk
space-efficiency is not a primary design goal for Cassandra.
Mason
Hi guys,
I have what may be a dumb question but I am confused by how much disk space is
being used by my Cassandra nodes. I have 10 nodes in my cluster with a
replication factor of 3. After I write 1,000,000 rows to the database (100kB
each), I see that they have been distributed very evenly,
41 matches
Mail list logo