Re: Client connection and data distribution across nodes

2010-06-16 Thread Ran Tavory
On Thu, Jun 17, 2010 at 8:52 AM, Mubarak Seyed wrote: > Hi All, > > Regarding client thrift connection, i have 4 nodes which formed a ring, but > client only knows the IP address of an one node (and thrift RPC port > number), > how does client can connect to any one other node without getting rin

Client connection and data distribution across nodes

2010-06-16 Thread Mubarak Seyed
Hi All, Regarding client thrift connection, i have 4 nodes which formed a ring, but client only knows the IP address of an one node (and thrift RPC port number), how does client can connect to any one other node without getting ring information? Can we keep the load balancer and bind all the fo

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-16 Thread Jonathan Ellis
That is consistent with the https://issues.apache.org/jira/browse/CASSANDRA-1169 bug I mentioned, that is fixed in the 0.6 svn branch. On Wed, Jun 16, 2010 at 10:51 PM, Julie wrote: > Jonathan Ellis gmail.com> writes: > >> > Another thing that is odd is that even when the server nodes are quiesc

Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-16 Thread Julie
Jonathan Ellis gmail.com> writes: > > Another thing that is odd is that even when the server nodes are quiescent > > because compacting is complete, I am still seeing cpu usage stay at > > about 40% .  Even after several hours, no reading or writing to the database > > and all compactions comple

RE: Some questions about using Cassandra

2010-06-16 Thread Anthony Ikeda
Thanks Gary, I'm looking at that plug-in feature at the moment but there seems to be very little documentation on how to use it. >From the Application point of view though we are still trying to model the process flows based on our technological approach. Mainly due to the fact that we need rea

Chiton not showing any keyspaces

2010-06-16 Thread John Schneider
Hi, I've got Chiton comming up on a mac, however when I connect to my Cassandra instance, it never comes back from "Fetching keyspaces..." Log from Chiton: chiton> bin/chiton-client /opt/local/Library/Frameworks/ Python.framework/Versions/2.6/lib/python2.6/site-packages/twisted/internet/_sslverif

Re: Strage Read Perfoamnce 1xN column slice or N column slice

2010-06-16 Thread Arya Goudarzi
Hey'all, As Jonathan pointed out in CASSANDRA-1199, this issue seams to be related to https://issues.apache.org/jira/browse/THRIFT-788. If you experience slowness with multiget_slice, take a look at that bug. -Arya - Original Message - From: "Arya Goudarzi" To: user@cassandra.apache.o

Re: Cassandra uses more memory than Xmx

2010-06-16 Thread Jonathan Ellis
you're just seeing address space used by mmap, not actually allocated-by-the-jvm memory. On Wed, Jun 16, 2010 at 7:02 AM, György Dózsa wrote: > Hi, > I have a Cassandra cluster and I set java Xmx to 2G in cassandra.in.sh in > every node, but the memory usage of cassandra daemon is 3 or 4 times of

RE: Cassandra / Hadoop

2010-06-16 Thread Stu Hood
Hey Dave, This won't work out of the box, but it should be relatively easy to fix. Implementing a TextColumnFamilyInputFormat that wraps ColumnFamilyInputFormat to convert the datastructures it outputs to JSON/TSV/CSV. If you have time to work on this, there is an open ticket: https://issues.a

Re: Some questions about using Cassandra

2010-06-16 Thread Gary Dusbabek
On Tue, Jun 15, 2010 at 19:49, Anthony Ikeda wrote: > Is there any concept of Listeners such that when data is added to Cassandra > we can fire off another process to do something with that data? E.g. create > a copy in a secondary database for Business Intelligence reports? Send the > data to an

Cassandra / Hadoop

2010-06-16 Thread Dave Gardner
Hi all, Is it possible to use the Cassandra ColumnFamilyInputFormat in combination with the Hadoop "streaming" job? Within the Hadoop docs it says that you can specify other plugins, eg: -inputformat JavaClassName http://hadoop.apache.org/common/docs/r0.15.2/streaming.html#Specifying+Other+Plug

Replicating nodes through firewall

2010-06-16 Thread jeff
I am trying to set up replication among three cassandra nodes. I believe it is because I am not setting the correct seed hostnames in storage-conf.xml. How can I set up node communication through the NAT on firewall. Two nodes will be on my network with private ip 192.168.15.86 and 192.168.15

Re: Cassandra timeouts under low load

2010-06-16 Thread Jonathan Ellis
10s is just not a long timeout for hadoop map tasks. you should increase it. you may also want to experiment with running less simultaneous map tasks per tracker. On Tue, Jun 15, 2010 at 11:00 AM, Drew Dahlke wrote: > Hi, I'm running cassandra .6.2 on a dedicated 4 node cluster and I > also hav

Re: Update vs Delete/Insert

2010-06-16 Thread aaron morton
It may make sense to use a secondary index for the counts. You could store the counts in both places and use a batch mutation to update them. It does not give you a transaction guarantee, but it will mean you still make one request to Cassandra. e.g. { : : } The secondar

Cassandra uses more memory than Xmx

2010-06-16 Thread György Dózsa
Hi, I have a Cassandra cluster and I set java Xmx to 2G in cassandra.in.sh in every node, but the memory usage of cassandra daemon is 3 or 4 times of that (so 6-8 Gb). Have you got any idea to decrease the memory usage of cassandra? There are 4 nodes in the cluster and the system has approx 1000

Re: Too many ParNew's

2010-06-16 Thread Peter Schuller
> I set the swappiness to 0 but the problem remained.  The only way I've > managed to avoid it is to use standard disk mode. Ok. But so can I take it you have confirmed that the machine *is* in fact swapping during these ParNew:s? I guess so since you mention it helps to move to standard disk mode

RE: Update vs Delete/Insert

2010-06-16 Thread Dr . Martin Grabmüller
Hi Colin, > From: Colin Vipurs [mailto:zodiac...@gmail.com] [...] > I've got some data that I'm doing counts on, stored in a CF as: > > { > : > : > > } [...] > { > : PLACEHOLDER > : PLACEHOLDER > } > > would be a better way of storing the data? Does anyone know t

Re: Too many ParNew's

2010-06-16 Thread Colin Vipurs
Hi Pete, I set the swappiness to 0 but the problem remained. The only way I've managed to avoid it is to use standard disk mode. On Sat, Jun 5, 2010 at 9:14 PM, Peter Schuller wrote: >>  INFO 17:54:18,567 GC for ParNew: 1522 ms, 69437384 reclaimed leaving >> 979692384 >> used; max is 442466304

Update vs Delete/Insert

2010-06-16 Thread Colin Vipurs
I've got some data that I'm doing counts on, stored in a CF as: { : : } With updates happening as an insert on the specific column. I need to extract the top X values by count and I was wondering if storing this as: { : PLACEHOLDER : PLACEHOLDER } would be a bett

Re: Some questions about using Cassandra

2010-06-16 Thread Oleg Anastasjev
Anthony Ikeda cardlink.com.au> writes: > One factor I need to consider is our Business > Intelligence platform that will need to use the data stored for reporting > purposes. > >   > > We are looking at using Cassandra for our real-time layer for > Active-Active data centre use and perhaps have