Re: column family names

2010-08-30 Thread Terje Marthinussen
Another option would of course be to store a mapping between dir/filenames and Keyspace/columns familes together with other info related to keyspaces and column families. Just add API/command line tools to look up the filenames and maybe store the values in the files as well for recovery purposes.

Re: column family names

2010-08-30 Thread Janne Jalkanen
I've been doing it for years with no technical problems. However, using "%" as the escape char tends to, in some cases, confuse a certain operating system whose name may or may not begin with "W", so using something else makes sense. However, it does require an extra cognitive step for th

Re: column family names

2010-08-30 Thread Terje Marthinussen
Beyond aesthetics, specific reasons? Terje On Tue, Aug 31, 2010 at 11:54 AM, Benjamin Black wrote: > URL encoding. > >

Re: column family names

2010-08-30 Thread Benjamin Black
URL encoding. On Mon, Aug 30, 2010 at 5:55 PM, Aaron Morton wrote: > under scores or URL encoding ? > Aaron > On 31 Aug, 2010,at 12:27 PM, Benjamin Black wrote: > > Please don't do this. > > On Mon, Aug 30, 2010 at 5:22 AM, Terje Marthinussen > wrote: >> Ah, sorry, I forgot that underscore was

Re: column family names

2010-08-30 Thread Aaron Morton
under scores or URL encoding ?AaronOn 31 Aug, 2010,at 12:27 PM, Benjamin Black wrote:Please don't do this. On Mon, Aug 30, 2010 at 5:22 AM, Terje Marthinussen wrote: > Ah, sorry, I forgot that underscore was part of \w. > That will do the trick for now. > > I do not see the big issue with file n

Re: column family names

2010-08-30 Thread Benjamin Black
Please don't do this. On Mon, Aug 30, 2010 at 5:22 AM, Terje Marthinussen wrote: > Ah, sorry, I forgot that underscore was part of \w. > That will do the trick for now. > > I do not see the big issue with file names though. Why not expand the > allowed characters a bit and escape the file names?

Re: cassandra for a inbox search with high reading qps

2010-08-30 Thread Chen Xinli
what's the average size of a user? As I know, lucandra will first poll the data from cassandra, then do computation in the client. That's ok for small rows. But we have 1M row in average, and some rows scale to 100M; at the same time, we expect high reading qps. Polling these data to client machin

Re: cassandra for a inbox search with high reading qps

2010-08-30 Thread Todd Nine
We use Lucandra as well for searching for users, as well as geo-encoding. It really works well except for numeric fields. https://issues.apache.org/jira/browse/CASSANDRA-1235 That bug may be a bit of an issue, but after they release 0.6.5 all the Lucene functionality will be available to you. T

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-30 Thread Jonathan Ellis
On Mon, Aug 30, 2010 at 5:18 PM, Peter Schuller wrote: > Has anyone run Cassandra with G1 in production for prolonged periods > of time? Not AFAIK. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

2010-08-30 Thread Peter Schuller
> collection runs for the cases tested. In most cases, I prefer having low > pauses due to any garbage collection runs and don't care too much about the > shape of the memory usage, and I guess, that's the reason why the low pause > collector is used by default for running cassandra. For myself, I

Re: get_slice sometimes returns previous result on php

2010-08-30 Thread Juho Mäkinen
I'm not using connection poolin where the same tcp socket is used between different php requests. I open a new thrift connection with new socket to the node and I use the node through the request and I close it after. The get_slice requests are all happening in the same request, so something odd ha

Re: get_slice sometimes returns previous result on php

2010-08-30 Thread Benjamin Black
On Mon, Aug 30, 2010 at 6:05 AM, Juho Mäkinen wrote: > The application is using the > same cassandra thrift connection (it doesn't close it in between) and > everything is happening inside same php process. > This is why you are seeing this problem (and is specific to connection reuse in certain

Re: Client developer mailing list

2010-08-30 Thread Mike Peters
I'm in! We really need a better PHP Thrift

Re: Client developer mailing list

2010-08-30 Thread Ran Tavory
awesome, thanks, I'm subscribed :) On Mon, Aug 30, 2010 at 10:05 PM, Jeremy Hanna wrote: > There has been a new mailing list created for those who are working on > Cassandra clients above thrift and/or avro. You can subscribe by sending an > email to client-dev-subscr...@cassandra.apache.org or

Re: Job opening cassandra Barcelona, Spain

2010-08-30 Thread Dimitry Lvovsky
Thanks for the suggestion. On Aug 30, 2010, at 8:01 PM, Norman Maurer wrote: > I think you should try jobs at apache.org too ;) > > Bye, > Norman > > 2010/8/25 Dimitry Lvovsky : >> Hi All, >> Please forgive the job offer spam. >> >> We're looking to add a developer with experience using Cassa

Client developer mailing list

2010-08-30 Thread Jeremy Hanna
There has been a new mailing list created for those who are working on Cassandra clients above thrift and/or avro. You can subscribe by sending an email to client-dev-subscr...@cassandra.apache.org or using the link at the bottom of http://cassandra.apache.org The list is meant to give client

Re: Dumping

2010-08-30 Thread aaron morton
sstable2json discussed here http://wiki.apache.org/cassandra/Operations may be what you are after, or the snapshot feature. Not sure what you want to use the dump for. If you do not know the keys in the CF in advance take a look at get_range_slices (http://wiki.apache.org/cassandra/API) it all

Re: Job opening cassandra Barcelona, Spain

2010-08-30 Thread Norman Maurer
I think you should try jobs at apache.org too ;) Bye, Norman 2010/8/25 Dimitry Lvovsky : > Hi All, > Please forgive the job offer spam. > > We're looking to add a developer with  experience using Cassandra, to join > our team in Barcelona.  An ideal candidate  will have a strong CS background >

Re: Cassandra & HAProxy

2010-08-30 Thread Edward Capriolo
On Mon, Aug 30, 2010 at 1:02 PM, Dave Viner wrote: > Hi Edward, > By "down hard", I assume you mean that the machine is no longer responding > on the cassandra thrift port.  That makes sense (and in fact is what I'm > doing currently).  But, it seems like the real improvement is something that > w

Dumping

2010-08-30 Thread Mark
Is there an easy way to retrieve all values from a CF.. similar to a dump? How about retrieving all columns for a particular key? In the second use case a simple iteration would work using a start and finish but how would this be accomplished across all keys for a particular CF when you don'

Re: Cassandra & HAProxy

2010-08-30 Thread Dave Viner
Hi Edward, By "down hard", I assume you mean that the machine is no longer responding on the cassandra thrift port. That makes sense (and in fact is what I'm doing currently). But, it seems like the real improvement is something that would allow for a simple monitor that goes beyond the simple "

Re: NodeTool won't connect remotely

2010-08-30 Thread Allan Carroll
Thanks! That did it. Looks like the connection happens on 10036 and then the server negotiates a separate port for continued communication. Found this article once I knew what to look for. It also describes how to get more consistency on port numbers to allow for ssh tunneling and firewalls. F

Re: Cassandra & HAProxy

2010-08-30 Thread Edward Capriolo
On Mon, Aug 30, 2010 at 12:40 PM, Dave Viner wrote: > FWIW - we've been using HAProxy in front of a cassandra cluster in > production and haven't run into any problems yet.  It sounds like our > cluster is tiny in comparison to Anthony M's cluster.  But I just wanted to > mentioned that others out

Re: cassandra disk usage

2010-08-30 Thread Terje Marthinussen
On Mon, Aug 30, 2010 at 10:10 PM, Jonathan Ellis wrote: > column names are stored per cell > > (moving to user@) > I think that is already accommodated for in my numbers? What i listed was measured from the actual SSTable file (using the output from "strings ), so multiples of the supercolumn

Re: Cassandra & HAProxy

2010-08-30 Thread Dave Viner
FWIW - we've been using HAProxy in front of a cassandra cluster in production and haven't run into any problems yet. It sounds like our cluster is tiny in comparison to Anthony M's cluster. But I just wanted to mentioned that others out there are doing the same. One thing in this thread that I t

Re: need help on cassandra client

2010-08-30 Thread Dave Viner
HI Cassam, I'm using the perl Thrift API against 0.6.4 currently. It sounds like you are having trouble getting thrift installed and/or getting the perl bindings built. If you already have thrift installed, you can just run: % thrift --gen perl interface/cassandra.thrift The libraries are now

Re: NodeTool won't connect remotely

2010-08-30 Thread Juho Mäkinen
I think that JMX needs additional ports to function correctly. Try to disable all firewalls between the client and the server so that client can connect to any port in the server and try again. - Juho Mäkinen On Mon, Aug 30, 2010 at 7:07 PM, Allan Carroll wrote: > Hi, > > I'm trying to manage m

NodeTool won't connect remotely

2010-08-30 Thread Allan Carroll
Hi, I'm trying to manage my cassandra cluster from a remote box and having issues getting nodetool to connect. All the machines I'm using are running on AWS. Here's what happens when I try: /opt/apache-cassandra-0.6.4/bin/nodetool -h xxx.xxx.xxx.143 -p 10036 ring Error connecting to remote JMX

Re: Thrift + PHP: help!

2010-08-30 Thread Juho Mäkinen
On Mon, Aug 30, 2010 at 4:24 PM, Mike Peters wrote: > Have you considered instead of retrying the failing node, to iterate through > other nodes in your cluster? Yes, the $this->connect() does just that: it removes the previous node from the node list and gives the list back to thrift connection

Re: Thrift + PHP: help!

2010-08-30 Thread Mike Peters
Interesting! Thanks for sharing Have you considered instead of retrying the failing node, to iterate through other nodes in your cluster? If one node is failing (let's assume it's overloaded for a minute), you're probably going to be better off having the client send the insert to the next

Re: Thrift + PHP: help!

2010-08-30 Thread Juho Mäkinen
Yes, I've already planning to do so. The class has still some dependencies into our other functions which I need to first clear out. Basically each api call is wrapped inside a retry loop as we can assume that each operation can be retried as many times as needed: $tries = 0;

Re: cassandra disk usage

2010-08-30 Thread Jonathan Ellis
column names are stored per cell (moving to user@) On Mon, Aug 30, 2010 at 6:58 AM, Terje Marthinussen wrote: > Hi, > > Was just looking at a SSTable file after loading a dataset. The data load > has no updates of data  but: > - Columns can in some rare cases be added to existing super columns >

Re: Calls block when using Thrift API

2010-08-30 Thread Gary Dusbabek
If you're only interested in accessing data natively, I suggest you try the "fat client." It brings up a node that participates in gossip, exposes the StorageProxy API, but does not receive a token and so does not have storage responsibilities. StorageService.instance.initClient(); in 0.7 you wi

get_slice sometimes returns previous result on php

2010-08-30 Thread Juho Mäkinen
I've ran into a strange bug where get_slice returns the result from previous query. My application iterates over a set of columns inside a supercolumn and for some reason it sometimes (quite rarely but often enough that it shows up) the results gets "shifted" around so that the application gets the

Re: RowMutationVerbHandler.java (line 78) Error in row mutation

2010-08-30 Thread Gary Dusbabek
Is it possible this was a new node with a manual token and autobootstrap turned off? If not, could you give more details about the node? Gary. On Fri, Aug 27, 2010 at 17:58, B. Todd Burruss wrote: > i got the latest code this morning.  i'm testing with 0.7 > > > ERROR [ROW-MUTATION-STAGE:388]

Re: TException: Error: TSocket: timed out reading 1024 bytes from 10.1.1.27:9160

2010-08-30 Thread Mike Peters
Hi guys, There are several patches you need to apply to Thrift to completely resolve all timeout errors. Here's a list of them along with a link to download a patched thrift library: http://www.softwareprojects.com/resources/programming/t-php-thrift-library-for-cassandra-1982.html http://www.

Re: cassandra for a inbox search with high reading qps

2010-08-30 Thread Mike Peters
Chen, Have you considered using http://www.slideshare.net/otisg/lucandra Lucandra for Inbox search? We have a similar setup and are currently looking into using Lucandra over implementing the searching ourselves with pure Cassandra. -- View this message in context: http://cassandra-user-in

Re: Thrift + PHP: help!

2010-08-30 Thread Mike Peters
Juho, do you mind sharing your implementation with the group? We'd love to help as well with rewriting the thrift interface, specificaly TSocket.php which seems to be where the majority of the problems are lurking. Has anyone tried compiling native thrift support as described here https://wik

Re: column family names

2010-08-30 Thread Terje Marthinussen
Ah, sorry, I forgot that underscore was part of \w. That will do the trick for now. I do not see the big issue with file names though. Why not expand the allowed characters a bit and escape the file names? Maybe some sort of URL like escaping. Terje On Mon, Aug 30, 2010 at 6:29 PM, Aaron Morton

Re: column family names

2010-08-30 Thread Aaron Morton
Moving to the user list. The new restrictions were added as part of CASSANDRA-1377 for 0.6.5 and 0.7, AFAIK it's to ensure the file names created for the CFs can be correctly parsed. So it's probably not going to change. The names have to match the \w reg ex class, which includes the underscor