I use PHP, and phpCassa to talk to cassandra from within my app. I'm using
the below script's structure as a way to run a local mutation on each of my
nodes:
===
describe_ring($keyspace);
$startToken = null;
$endToken = null;
foreach ($ring as $ringDetails) {
//There is an
e" (column family in cassandra) has something like a billion
>>> rows
>>> in it, and I want to say ~3TB of data.
>>> -No matter what I tried(*), Pig/Hadoop decided this was worthy of 20
>>> tasks
>>>
>>> (*) I changed settings in the loadFun
Looking at the code, cassandra.input.split.size==Pig URL split_size, right?
But, in cassandra 1.2.15 I'm wondering if there is a bug that would make
the hadoop conf setting cassandra.input.split.size not be used unless you
manually set the URI to splitSize=0 (because the abstract class defaults
th
You said you have tried the Pig URL split_size, but have you actually tried
decreasing the value of cassandra.input.split.size hadoop property? The
default is 65536, so you may want to decrease that to see if the number of
mappers increase. But at some point, even if you lower that value it will
st
Hi,
I have some history with cassandra + hadoop:
1.) Single DC + integrated hadoop = Was "ok" until I needed steady
performance (the single DC was used in a production environment)
2.) Two DC's + integrated hadoop on 1 of 2 DCs = Was "ok" until my data
grew and in AWS compute is expensive compared
> Is it good practice then to find an attribute in my data that would allow me
> to form wide row row keys with aprox. 1000 values each?
You can do that using get_range_slice() via thrift.
And via CQL 3 you use the token() function and Limit with a select statement.
Check the DS docs for more in
Aaron,
On 12.08.2013, at 23:17, Aaron Morton wrote:
>> As I do not have Billions of input records (but a max of 10 Milllion) the
>> added benefit of scaling out the per-line processing is probably not worth
>> the additional setup and operations effort of Hadoop.
> I would start with a regul
> As I do not have Billions of input records (but a max of 10 Milllion) the
> added benefit of scaling out the per-line processing is probably not worth
> the additional setup and operations effort of Hadoop.
I would start with a regular app and then go to hadoop if needed, assuming you
are on
Hi,
I have a specific use case to address with Cassandra and I can't get my head
around whether using Hadoop on top creates any significant benefit or not.
Situation:
I have product data and each product 'contains' a number of articles (<100 /
product), representing individual colors/sizes etc
> Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job?
That's the default.
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/hadoop/ConfigHelper.java#L383
> And more importantly is there a way to configure that such that if my RF
Is it possible to use CL_ONE with hadoop/cassandra when doing an M/R job? And
more importantly is there a way to configure that such that if my RF=3, that it
only reads from 1 of the nodes in that 3.
We have 12 nodes and ideally we would for example hope M/R runs on
a2, a9, a5, a12 which happen
It's an inter node timeout waiting for the read to complete. Normally means the
cluster is overloaded in some fashion, check for GC activity and/or overloaded
IOPs.
If you reduce the batch_size it should help.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
I'm having problems with Hadoop job failures on a Cassandra 1.2 cluster due to
Caused by: TimedOutException()
2013-06-24 11:29:11,953 INFO Driver -at
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
This is running on a 6-node cluster, RF=3
13/3/11 oualid ait wafli :
> > Hi
> >
> > I need a tutorial for deployong Hadoop+Cassandra on single-nodes
> >
> > Thanks
>
Hi there,
Check this out [1]. It´s kinda old but I think it will help you get started.
Renato M.
[1] http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr
2013/3/11 oualid ait wafli :
> Hi
>
> I need a tutorial for deployong Hadoop+Cassandra on single-nodes
>
> Thanks
Hi
I need a tutorial for deployong Hadoop+Cassandra on single-nodes
Thanks
ues.apache.org/jira/browse/CASSANDRA-4813)
>
> Kind regards,
> Pieter
>
>
> -Original Message-
> From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net]
> Sent: woensdag 30 januari 2013 13:58
> To: user@cassandra.apache.org
> Subject: Re: cryptic exception in
@cassandra.apache.org
Subject: Re: cryptic exception in Hadoop/Cassandra job
Cassandra 1.1.5, using BulkOutputFormat
Brian
On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote:
> Hi Brian,
>
> Which version of cassandra are you using? And are you using the BOF to write
> to Cassandr
-
> From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net]
> Sent: woensdag 30 januari 2013 13:20
> To: user@cassandra.apache.org
> Subject: cryptic exception in Hadoop/Cassandra job
>
>
> I have a Hadoop/Cassandra map/reduce job that performs a simple
> transforma
exception in Hadoop/Cassandra job
I have a Hadoop/Cassandra map/reduce job that performs a simple transformation
on a table with very roughly 1 billion columns spread across roughly 4 million
rows. During reduction, I see a relative handful of the following:
Exception in thread "Streami
commercial product.
Jeremy
On Jan 18, 2013, at 6:01 AM, Naveen Reddy wrote:
> Hi,
>
> I want to setup a hybrid Hadoop Cassandra Cluster. Can anyone point me to a
> sample documentation for this ?
>
> Thank you
> Naveen
I found that overall Hadoop input/output from Cassandra could use a little more
QA and input from the community. (Especially with large datasets). There were
some serious BOF bugs in 1.1 that have been resolved in 1.2. (Yay!) But, the
problems in 1.1 weren't immediately apparent. Testing in my d
> But this is the first time I've tried to use the
> wide-row support, which makes me a little suspicious. The wide-row support is
> not
> very well documented, so maybe I'm doing something wrong there in ignorance.
This was the area I was thinking about.
Can you drill in and see a pattern.
Are
Sorry if this is a duplicate - I was having mailer problems last night:
> Assuming their were no further writes, running repair or using CL all should
> have fixed it.
>
> Can you describe the inconsistency between runs?
Sure. The job output is generated by a single reducer and consists of a
Assuming their were no further writes, running repair or using CL all should
have fixed it.
Can you describe the inconsistency between runs?
Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 8/01/2013, at 2:16 AM, Br
in order to get more statistics.
>
> I'll be glad to learn about any interesting things you learnt with your own
> experiences with hadoop + Cassandra.
>
> Thanks in advance.
em in order to get more statistics.
I'll be glad to learn about any interesting things you learnt with your own
experiences with hadoop + Cassandra.
Thanks in advance.
thanks Jeremy, its good pointer to start with
regards
Sagar
From: Jeremy Hanna [jeremy.hanna1...@gmail.com]
Sent: Thursday, March 17, 2011 7:34 PM
To: user@cassandra.apache.org
Subject: Re: hadoop cassandra
You can start with a word count example that
You can start with a word count example that's only for hdfs. Then you can
replace the reducer in that with the ReducerToCassandra that's in the cassandra
word_count example. You need to match up your Mapper's output to the Reducer's
input and set a couple of configuration variables to tell it
hi all,
is there any example of hadoop and cassandra integration where input is from
hdfs and out put to cassandra
NOTE: i have gone through word count example provided with the source code, but
it does not have above case..
regards
Sagar
Are you exploring a
30 matches
Mail list logo