Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Yes, a disadvantage of more no. of CF in terms of memory utilization which I see is: - if some CF is written less often as compared to other CFs, then the memtable would consume space in the memory until it is flushed, this memory space could have been much better used by a CF that's heavily writt

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Thanks Tyler ! I could not fully understand the reason why more no of column families would mean more memory.. if you have under control parameters like memtable_throughput & memtable_operations which are set per column family basis then you can directly control & adjust by splitting the memory sp

Re: Pig not reading all cassandra data

2011-02-04 Thread Jonathan Ellis
On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy wrote: > Found the culprit.  There is a new feature in Pig 0.8 that will try to > reduce the number of splits used to speed up the whole job.  Since the > ColumnFamilyInputFormat lists the input size as zero, this feature > eliminates all of the splits

Re: Pig not reading all cassandra data

2011-02-04 Thread Matt Kennedy
Found the culprit. There is a new feature in Pig 0.8 that will try to reduce the number of splits used to speed up the whole job. Since the ColumnFamilyInputFormat lists the input size as zero, this feature eliminates all of the splits except for one. The workaround is to disable this featu

Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Tyler Hobbs
> I read somewhere that more no of column families is not a good idea as > it consumes more memory and more compactions to occur This is primarily true, but not in every case. But the caching requirements may be different as they cater to two > different features. > This is a great reason to *n

Re: Unavalible Exception

2011-02-04 Thread Jonathan Ellis
Start with "grep -i down system.log" on each machine On Fri, Feb 4, 2011 at 7:37 PM, David King wrote: > We're going to need *way* more information than this > > On 03 Feb 2011, at 20:03, ruslan usifov wrote: > >> Hello >> >> Why i can get Unavalible Exception on live cluster (all nodes is up and

Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
I read somewhere that more no of column families is not a good idea as it consumes more memory and more compactions to occur & thus I am trying to reduce the no. of column families by adding the rows of other Column families(with similar attributes) as separate rows into one. I have two kinds of d

Re: Unavalible Exception

2011-02-04 Thread David King
We're going to need *way* more information than this On 03 Feb 2011, at 20:03, ruslan usifov wrote: > Hello > > Why i can get Unavalible Exception on live cluster (all nodes is up and never > shutdown) > > PS: v 0.7.0

Re: Sorting in time order without using TimeUUID type column names

2011-02-04 Thread Aditya Narayan
Thanks Aaron, Yes I can put the column names without using the userId in the timeline row, and when I want to retrieve the row corresponding to that column name, I will attach the userId to get the row key. Yes I'll store it as a long & I guess I'll have to write with a custom comparator type (Re

Re: Unavalible Exception

2011-02-04 Thread aaron morton
Please provide some information the client you are using, the client side error stack, the command you are running, the output from nodetool ring Aaron On 5 Feb 2011, at 05:10, Oleg Proudnikov wrote: > ruslan usifov gmail.com> writes: > >> >> >> 2011/2/4 Oleg Proudnikov cloudorange.com>

Re: Sorting in time order without using TimeUUID type column names

2011-02-04 Thread aaron morton
IMHO If you know the time of the event use store the time as a long, rather than a UUID. It will make it easier to get back to a time and make it easier for you to compare columns. TimeUUIDS has a pseudo random part as well as the time part, it could be set to a constant. By why bother if you k

Re: Problems with Python Stress Test

2011-02-04 Thread Sameer Farooqui
Brandon, Thanks for the response. I have also noticed that stress.py's progress interval gets thrown off in low memory situations. What did you mean by "contrib/stress on 0.7 instead". I don't see that dir in the src version of 0.7. - Sameer On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams w

Re: New Generation Size guidelines

2011-02-04 Thread Ryan King
On Fri, Feb 4, 2011 at 1:45 PM, Oleg Proudnikov wrote: > > Hi All, > > I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The > servers > have 4 cores. I used default heap settings. The Eden space ended up around 60M > and the Survivor spaces are around 7M. This feels a little bi

New Generation Size guidelines

2011-02-04 Thread Oleg Proudnikov
Hi All, I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The servers have 4 cores. I used default heap settings. The Eden space ended up around 60M and the Survivor spaces are around 7M. This feels a little bit low for a process that creates so much short-lived garbage. I just w

Re: Tracking down read latency

2011-02-04 Thread sridhar basam
On Fri, Feb 4, 2011 at 2:44 PM, David Dabbs wrote: > > Our data is on sdb, commit logs on sdc. > So do I read this correctly that we're 'await'ing 6+millis on average for > data drive (sdb) > requests to be serviced? > > That is right. Those numbers look pretty good for rotational media. What sor

Re: read latency in cassandra

2011-02-04 Thread aaron morton
What operation are you calling ? Are you trying to read the entire row back? How many SSTables do you have for the CF? Does your data have a lot of overwrites ? Have you modified the default compaction settings ? Do you have row cache enabled ? How long does the second request take ? Can you

Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Ali Ahsan
So do we need to write a script ? or its some thing i can do as a system admin without involving and developer.If yes please guide me in this case. On 02/04/2011 10:36 PM, Jonathan Ellis wrote: In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at

RE: Tracking down read latency

2011-02-04 Thread David Dabbs
Thank you both for your advice. See my updated iostats below. >From: sridhar.ba...@gmail.com [mailto:sridhar.ba...@gmail.com] On Behalf Of sridhar basam >Sent: Thursday, February 03, 2011 10:58 AM >To: user@cassandra.apache.org >Subject: Re: Tracking down read latency > >The data provided is als

Re: How to monitor Cassandra's throughput?

2011-02-04 Thread Oleg Proudnikov
The issue has been resolved, the fix is on Hector's GitHub. Oleg Proudnikov cloudorange.com> writes: > > I have posted on Hector ML: > > http://thread.gmane.org/gmane.comp.db.hector.user/1690 > > Oleg > >

read latency in cassandra

2011-02-04 Thread Dan Kuebrich
Hi all, It often takes more than two seconds to load: - one row of ~450 events comprising ~600k - cluster size of 1 - client is pycassa 1.04 - timeout on recv - cold read (I believe) - load generally < 0.5 on a 4-core machine, 2 EC2 instance store drives for cassandra - cpu wait generally < 1% O

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-04 Thread Aklin_81
Thanks so much Ryan for the links; I'll definitely take them into consideration. Just another thought which came to my mind:- perhaps it may be beneficial to store(or duplicate) some of the data like the Login credentials & particularly userId to User's Name mapping, etc (which is very heavily rea

Re: CF Read and Write Latency Histograms

2011-02-04 Thread Oleg Proudnikov
David Dabbs gmail.com> writes: > > Is this 0.7? > Yes

RE: CF Read and Write Latency Histograms

2011-02-04 Thread David Dabbs
Is this 0.7? -Original Message- From: Oleg Proudnikov [mailto:ol...@cloudorange.com] Sent: Friday, February 04, 2011 11:42 AM To: user@cassandra.apache.org Subject: CF Read and Write Latency Histograms Hi All, I suspect that Write and Read Latency column headers need to be swapped. I am

Re: Moving data

2011-02-04 Thread buddhasystem
FWIW, I'm working on migrating a large amount of data out of Oracle into my test cluster. The data has been warehoused as CSV files on Amazon S3. Having that in place allows me to not put extra load on the production service when doing many repeated tests. I then parse the data using CSV Python mo

Re: Using Cassandra to store files

2011-02-04 Thread sridhar basam
For the number of file the OP has why not just use a traditional filesystem and solr to index the pdf data. You get to search inside of the files for relevant information? Sri On Fri, Feb 4, 2011 at 12:47 PM, buddhasystem wrote: > > Even when storage is in NFS, Cassandra can still be quite us

Re: Moving data

2011-02-04 Thread Jonathan Ellis
I'm afraid there is no short answer. The long answer is, 1) Read about Cassandra data modeling at http://wiki.apache.org/cassandra/ArticlesAndPresentations. It is not as simple as "one table equals one columnfamily." 2) Write a program to read your data out of SQL Server and write it into Cassan

Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
yes, definitely a database for mapping ofcourse! On Fri, Feb 4, 2011 at 11:17 PM, buddhasystem wrote: > > Even when storage is in NFS, Cassandra can still be quite useful as a file > catalog. Your physical storage can change, move etc. Therefore, it's a good > idea to provide mapping of logical n

Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem
Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage.

Re: CF Read and Write Latency Histograms

2011-02-04 Thread Jonathan Ellis
Can you create a ticket? On Fri, Feb 4, 2011 at 9:41 AM, Oleg Proudnikov wrote: > Hi All, > > I suspect that Write and Read Latency column headers need to be swapped. I am > running a bulk load with no reads on this CF but I see Read column with values > while the Write column has zeros only. The

Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
I am also looking to possible solutions to store pdfs & word documents. But why wont you store in them in the filesystem instead of a database unless your files are too small in which case it would be recommended to use a database. -Aditya On Fri, Feb 4, 2011 at 5:30 PM, Daniel Doubleday wrote

CF Read and Write Latency Histograms

2011-02-04 Thread Oleg Proudnikov
Hi All, I suspect that Write and Read Latency column headers need to be swapped. I am running a bulk load with no reads on this CF but I see Read column with values while the Write column has zeros only. The MBean shows the values correctly. Thank you, Oleg

Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Jonathan Ellis
In that case, you should shut down the server before removing data files. On Fri, Feb 4, 2011 at 9:01 AM, wrote: > I thought truncate() was not available before 0.7 (in 0.6.3)was it? > > --- > Sent from BlackBerry > > -Original Message-

Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread roshandawrani
I thought truncate() was not available before 0.7 (in 0.6.3)was it? --- Sent from BlackBerry -Original Message- From: Jonathan Ellis Date: Fri, 4 Feb 2011 08:58:35 To: user Reply-To: user@cassandra.apache.org Subject: Re: How to delete

Re: get_range_slices and tombstones

2011-02-04 Thread Jonathan Ellis
You can't create a row with no columns without tombstones being involved somehow. :) There's no distinction between "a row with no columns because the individual columns were removed," and "a row with no columns because the row was removed." the latter is just a more efficient expression of the f

Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Jonathan Ellis
You should use truncate instead. (Then remove the snapshot truncate creates.) On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan wrote: > Hi All > > Is there any way i can delete column families data (not removing column > families ) from Cassandra without effecting ring integrity.What if  i delete > some

Re: Move

2011-02-04 Thread Jonathan Ellis
Looks like https://issues.apache.org/jira/browse/CASSANDRA-1992, fixed for 0.7.1. On Fri, Feb 4, 2011 at 12:18 AM, Stu King wrote: > I am running a move on one node in a 5 node cluster. There are no writes to > the cluster during the move. > I am seeing an exception on one of the nodes (not the n

Re: performance degradation in cluster

2011-02-04 Thread Jonathan Ellis
Some potential problems: 1) sounds like you are using OPP/BOP and not adjusting tokens to balance the data on each node 2) 8 client threads is not enough to saturate 16 cassandra cores 3) if your commitlog is not on a separate device from your data directories you will have a lot of contention bet

Re: Using a synchronized counter that keeps track of no of users on the application & using it to allot UserIds/ keys to the new users after sign up

2011-02-04 Thread Ryan King
On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 wrote: > Thanks Matthew & Ryan, > > The main inspiration behind me trying to generate Ids in sequential > manner is to reduce the size of the userId, since I am using it for > heavy denormalization. UUIDs are 16 bytes long, but I can also have a > unique Id

Re: Unavalible Exception

2011-02-04 Thread Oleg Proudnikov
ruslan usifov gmail.com> writes: > > > 2011/2/4 Oleg Proudnikov cloudorange.com> > ruslan usifov gmail.com> writes: > > > > HelloWhy i can get Unavalible Exception on live cluster (all nodes is up andnever shutdown)PS: v 0.7.0 > Can the nodes see each other? Check Cassandra logs for messages

Re: Unavalible Exception

2011-02-04 Thread ruslan usifov
2011/2/4 Oleg Proudnikov > ruslan usifov gmail.com> writes: > > > > > HelloWhy i can get Unavalible Exception on live cluster (all nodes is up > and > never shutdown)PS: v 0.7.0 > > > Can the nodes see each other? Check Cassandra logs for messages regarding > other > nodes. > > Yes they can, nod

Re: Unavalible Exception

2011-02-04 Thread Oleg Proudnikov
ruslan usifov gmail.com> writes: > > HelloWhy i can get Unavalible Exception on live cluster (all nodes is up and never shutdown)PS: v 0.7.0 Can the nodes see each other? Check Cassandra logs for messages regarding other nodes. Oleg

Re: Column Sorting of integer names

2011-02-04 Thread Jonathan Ellis
create a ReversedIntegerType. On Fri, Feb 4, 2011 at 5:15 AM, Aditya Narayan wrote: > Is there any way to sort the columns named as integers in the descending > order ? > > > Regards > -Aditya > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professi

Moving data

2011-02-04 Thread Morey, Gary
I have several large SQL Server 2005 tables. I need to load the data in these tables into Cassandra. FYI, the Cassandra installation is on a linux server running CentOS. Can anyone suggest the best way to accomplish this? I am a newbie to Cassandra, so any advice would be greatly appreciated

Column Sorting of integer names

2011-02-04 Thread Aditya Narayan
Is there any way to sort the columns named as integers in the descending order ? Regards -Aditya

Re: Using Cassandra to store files

2011-02-04 Thread Daniel Doubleday
We are doing this with cassandra. But we cache a lot. We get around 20 writes/s and 1k reads/s (~ 100Mbit/s) for that particular CF but only 1% of them hit our cassandra cluster (5 nodes, rf=3). /Daniel On Feb 4, 2011, at 9:37 AM, Brendan Poole wrote: > Hi Daniel > > When you say "We are do

RE: CQL

2011-02-04 Thread Vivek Mishra
Thanks Eric. I am able to make it running. -Original Message- From: Eric Evans [mailto:eev...@rackspace.com] Sent: Wednesday, February 02, 2011 9:34 PM To: user@cassandra.apache.org Subject: Re: CQL On Wed, 2011-02-02 at 06:57 +, Vivek Mishra wrote: > I am trying to run CQL from a jav

get_range_slices and tombstones

2011-02-04 Thread Patrik Modesto
Hi! I'm getting tombstones from get_range_slices(). I know that's normal. But is there a way to know that a key is tombstone? I know tombstone has no columns but I can create a row without any columns that would look like a tombstone in get_range_slices(). Regards, Patrik

How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Ali Ahsan
Hi All Is there any way i can delete column families data (not removing column families ) from Cassandra without effecting ring integrity.What if i delete some column families data in linux with rm command ? -- S.Ali Ahsan Senior System Engineer e-Business (Pvt) Ltd 49-C Jail Road, Lahor

Re: Do supercolumns have a purpose?

2011-02-04 Thread Sylvain Lebresne
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: > On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: > >> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >> >>> The advantage would be to enable secondary indexes on supercolumn >>> families. >>> >> >> Then I suggest opening a ticke

Recall: Using Cassandra to store files

2011-02-04 Thread Brendan Poole
Brendan Poole would like to recall the message, "Using Cassandra to store files". Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk Please consider the e

RE: Using Cassandra to store files

2011-02-04 Thread Brendan Poole
The first line on the couchDB website doesn't fill me with confidence... "The 1.0.0 release has a critical bug which can lead to data loss in the default configuration" Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff bre

RE: Using Cassandra to store files

2011-02-04 Thread Brendan Poole
Hi Daniel When you say "We are doing this" do you mean via NFS or Cassandra. Thanks Brendan Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.

Re: cassandra 0.6.11 binary package problem

2011-02-04 Thread Stephen Connolly
That's because of an issue I found in the ANT scripts while doing the maven-ant-tasks switch on 0.7.0. Any jar in build will be bundled... (so ivy goes into the bin dist... when I did the m-a-t version eric was wondering why i was including m-a-t in the bin dist, and I said I was being symmetric w

Re: for counters: does read have to be ALL ?

2011-02-04 Thread Sylvain Lebresne
On Thu, Feb 3, 2011 at 10:39 PM, Yang wrote: > the pdf at the design doc > > https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf > > does say so: > page 2 "- strongly consistent read: requires consistency level ALL. > (QUORUM is insufficient.) > " > > but th

Move

2011-02-04 Thread Stu King
I am running a move on one node in a 5 node cluster. There are no writes to the cluster during the move. I am seeing an exception on one of the nodes (not the node which I am doing the move on). The exception stack is ERROR [CompactionExecutor:1] 2011-02-04 08:10:46,855 PrecompactedRow.java (lin