Re: using hadoop + cassandra for CF mutations (delete)

2014-04-08 Thread William Oberman
I use PHP, and phpCassa to talk to cassandra from within my app. I'm using the below script's structure as a way to run a local mutation on each of my nodes: === describe_ring($keyspace); $startToken = null; $endToken = null; foreach ($ring as $ringDetails) { //There is an

Re: Fwd: using hadoop + cassandra for CF mutations (delete)

2014-04-07 Thread Suraj Nayak
e" (column family in cassandra) has something like a billion >>> rows >>> in it, and I want to say ~3TB of data. >>> -No matter what I tried(*), Pig/Hadoop decided this was worthy of 20 >>> tasks >>> >>> (*) I changed settings in the loadFun

Re: using hadoop + cassandra for CF mutations (delete)

2014-04-04 Thread William Oberman
Looking at the code, cassandra.input.split.size==Pig URL split_size, right? But, in cassandra 1.2.15 I'm wondering if there is a bug that would make the hadoop conf setting cassandra.input.split.size not be used unless you manually set the URI to splitSize=0 (because the abstract class defaults th

Re: using hadoop + cassandra for CF mutations (delete)

2014-04-04 Thread Paulo Ricardo Motta Gomes
You said you have tried the Pig URL split_size, but have you actually tried decreasing the value of cassandra.input.split.size hadoop property? The default is 65536, so you may want to decrease that to see if the number of mappers increase. But at some point, even if you lower that value it will st

using hadoop + cassandra for CF mutations (delete)

2014-04-04 Thread William Oberman
Hi, I have some history with cassandra + hadoop: 1.) Single DC + integrated hadoop = Was "ok" until I needed steady performance (the single DC was used in a production environment) 2.) Two DC's + integrated hadoop on 1 of 2 DCs = Was "ok" until my data grew and in AWS compute is expensive compared

Re: Data modelling for range retrieval. Was: Re: Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-14 Thread Aaron Morton
> Is it good practice then to find an attribute in my data that would allow me > to form wide row row keys with aprox. 1000 values each? You can do that using get_range_slice() via thrift. And via CQL 3 you use the token() function and Limit with a select statement. Check the DS docs for more in

Data modelling for range retrieval. Was: Re: Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-12 Thread Jan Algermissen
Aaron, On 12.08.2013, at 23:17, Aaron Morton wrote: >> As I do not have Billions of input records (but a max of 10 Milllion) the >> added benefit of scaling out the per-line processing is probably not worth >> the additional setup and operations effort of Hadoop. > I would start with a regul

Re: Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-12 Thread Aaron Morton
> As I do not have Billions of input records (but a max of 10 Milllion) the > added benefit of scaling out the per-line processing is probably not worth > the additional setup and operations effort of Hadoop. I would start with a regular app and then go to hadoop if needed, assuming you are on

Hadoop/Cassandra for data transformation (rather than analysis)?

2013-08-10 Thread Jan Algermissen
Hi, I have a specific use case to address with Cassandra and I can't get my head around whether using Hadoop on top creates any significant benefit or not. Situation: I have product data and each product 'contains' a number of articles (<100 / product), representing individual colors/sizes etc

Re: hadoop/cassandra integration using CL_ONE...

2013-07-29 Thread aaron morton
> Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job? That's the default. https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/hadoop/ConfigHelper.java#L383 > And more importantly is there a way to configure that such that if my RF

hadoop/cassandra integration using CL_ONE...

2013-07-26 Thread Hiller, Dean
Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job? And more importantly is there a way to configure that such that if my RF=3, that it only reads from 1 of the nodes in that 3. We have 12 nodes and ideally we would for example hope M/R runs on a2, a9, a5, a12 which happen

Re: Hadoop/Cassandra 1.2 timeouts

2013-06-26 Thread aaron morton
It's an inter node timeout waiting for the read to complete. Normally means the cluster is overloaded in some fashion, check for GC activity and/or overloaded IOPs. If you reduce the batch_size it should help. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Hadoop/Cassandra 1.2 timeouts

2013-06-24 Thread Brian Jeltema
I'm having problems with Hadoop job failures on a Cassandra 1.2 cluster due to Caused by: TimedOutException() 2013-06-24 11:29:11,953 INFO Driver -at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932) This is running on a 6-node cluster, RF=3

Re: Hadoop+Cassandra

2013-03-11 Thread oualid ait wafli
13/3/11 oualid ait wafli : > > Hi > > > > I need a tutorial for deployong Hadoop+Cassandra on single-nodes > > > > Thanks >

Re: Hadoop+Cassandra

2013-03-11 Thread Renato Marroquín Mogrovejo
Hi there, Check this out [1]. It´s kinda old but I think it will help you get started. Renato M. [1] http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr 2013/3/11 oualid ait wafli : > Hi > > I need a tutorial for deployong Hadoop+Cassandra on single-nodes > > Thanks

Hadoop+Cassandra

2013-03-11 Thread oualid ait wafli
Hi I need a tutorial for deployong Hadoop+Cassandra on single-nodes Thanks

Re: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Brian Jeltema
ues.apache.org/jira/browse/CASSANDRA-4813) > > Kind regards, > Pieter > > > -Original Message- > From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] > Sent: woensdag 30 januari 2013 13:58 > To: user@cassandra.apache.org > Subject: Re: cryptic exception in

RE: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Pieter Callewaert
@cassandra.apache.org Subject: Re: cryptic exception in Hadoop/Cassandra job Cassandra 1.1.5, using BulkOutputFormat Brian On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote: > Hi Brian, > > Which version of cassandra are you using? And are you using the BOF to write > to Cassandr

Re: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Brian Jeltema
- > From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] > Sent: woensdag 30 januari 2013 13:20 > To: user@cassandra.apache.org > Subject: cryptic exception in Hadoop/Cassandra job > > > I have a Hadoop/Cassandra map/reduce job that performs a simple > transforma

RE: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Pieter Callewaert
exception in Hadoop/Cassandra job I have a Hadoop/Cassandra map/reduce job that performs a simple transformation on a table with very roughly 1 billion columns spread across roughly 4 million rows. During reduction, I see a relative handful of the following: Exception in thread "Streami

Re: Hybrid Hadoop Cassandra Cluster

2013-01-18 Thread Jeremy Hanna
commercial product. Jeremy On Jan 18, 2013, at 6:01 AM, Naveen Reddy wrote: > Hi, > > I want to setup a hybrid Hadoop Cassandra Cluster. Can anyone point me to a > sample documentation for this ? > > Thank you > Naveen

Re: inconsistent hadoop/cassandra results

2013-01-10 Thread Michael Kjellman
I found that overall Hadoop input/output from Cassandra could use a little more QA and input from the community. (Especially with large datasets). There were some serious BOF bugs in 1.1 that have been resolved in 1.2. (Yay!) But, the problems in 1.1 weren't immediately apparent. Testing in my d

Re: inconsistent hadoop/cassandra results

2013-01-10 Thread aaron morton
> But this is the first time I've tried to use the > wide-row support, which makes me a little suspicious. The wide-row support is > not > very well documented, so maybe I'm doing something wrong there in ignorance. This was the area I was thinking about. Can you drill in and see a pattern. Are

Re: inconsistent hadoop/cassandra results

2013-01-09 Thread Brian Jeltema
Sorry if this is a duplicate - I was having mailer problems last night: > Assuming their were no further writes, running repair or using CL all should > have fixed it. > > Can you describe the inconsistency between runs? Sure. The job output is generated by a single reducer and consists of a

Re: inconsistent hadoop/cassandra results

2013-01-08 Thread aaron morton
Assuming their were no further writes, running repair or using CL all should have fixed it. Can you describe the inconsistency between runs? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 8/01/2013, at 2:16 AM, Br

Re: Hadoop + Cassandra

2012-01-06 Thread Jeremy Hanna
in order to get more statistics. > > I'll be glad to learn about any interesting things you learnt with your own > experiences with hadoop + Cassandra. > > Thanks in advance.

Hadoop + Cassandra

2012-01-06 Thread Alain RODRIGUEZ
em in order to get more statistics. I'll be glad to learn about any interesting things you learnt with your own experiences with hadoop + Cassandra. Thanks in advance.

RE: hadoop cassandra

2011-03-17 Thread Sagar Kohli
thanks Jeremy, its good pointer to start with regards Sagar From: Jeremy Hanna [jeremy.hanna1...@gmail.com] Sent: Thursday, March 17, 2011 7:34 PM To: user@cassandra.apache.org Subject: Re: hadoop cassandra You can start with a word count example that&#

Re: hadoop cassandra

2011-03-17 Thread Jeremy Hanna
You can start with a word count example that's only for hdfs. Then you can replace the reducer in that with the ReducerToCassandra that's in the cassandra word_count example. You need to match up your Mapper's output to the Reducer's input and set a couple of configuration variables to tell it

hadoop cassandra

2011-03-17 Thread Sagar Kohli
hi all, is there any example of hadoop and cassandra integration where input is from hdfs and out put to cassandra NOTE: i have gone through word count example provided with the source code, but it does not have above case.. regards Sagar Are you exploring a