Re: Querying all keys in a column family

2012-02-26 Thread aaron morton
When you say query 1 million records in my mind i'm saying "dump 1 million records to another system as a back office job". Hadoop will split the job over multiple nodes and will assign a task to read the range "owned" by each node. From memory it uses CL ONE (by default) for the read so the n

Re: Querying all keys in a column family

2012-02-25 Thread Martin Arrowsmith
Hi Alexandru, Things got hectic and I put off the project until this weekend. I'm actually learning about Hadoop right now and how to implement it. I can respond to this thread when I have something running. In the meantime, I'd like to bump this email up and see if there are others who can provi

Re: Querying all keys in a column family

2012-02-24 Thread Alexandru Sicoe
Hi Aaron and Martin, Sorry about my previous reply, I thought you wanted to process only all the row keys in CF. I have a similar issue as Martin because I see myself being forced to hit more than a million rows with a query (I only get a few columns from every row). Aaron, we've talked about thi

Re: Querying all keys in a column family

2012-02-14 Thread aaron morton
If you want to process 1 million rows use Hadoop with Hive or Pig. If you use Hadoop you are not doing things in real time. You may need to rephrase the problem. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/02/2012, at 11:00 AM, Ma

Re: Querying all keys in a column family

2012-02-14 Thread Alexandru Sicoe
Hey Martin, Have you tried CQL query: "SELECT FIRST 0 * FROM cfName" ? Cheers, Alex On Mon, Feb 13, 2012 at 11:00 PM, Martin Arrowsmith < arrowsmith.mar...@gmail.com> wrote: > Hi Experts, > > My program is such that it queries all keys on Cassandra. I want to do > this as quick as possible, in o

Querying all keys in a column family

2012-02-13 Thread Martin Arrowsmith
Hi Experts, My program is such that it queries all keys on Cassandra. I want to do this as quick as possible, in order to get as close to real-time as possible. One solution I heard was to use the sstables2json tool, and read the data in as JSON. I understand that reading from each line in Cassan