Re: cassandra + spark / pyspark

2014-09-11 Thread Oleg Ruchovets
ctly, and we can work with you to get you started. > > Regards, > Rohit > > > *Founder & CEO, **Tuplejump, Inc.* > > www.tuplejump.com > *The Data Engineering Platform* > > On Thu, Sep 11, 2014 at 8:09 PM, Oleg Ruchovets > wrote: > >&g

Re: cassandra + spark / pyspark

2014-09-11 Thread Oleg Ruchovets
Ok. DataStax , Startio are required mesos, hadoop yarn other third party to get spark cluster HA. What in case of calliope? Is it sufficient to have cassandra + calliope + spark to be able process aggregations? In my case we have quite a lot of data so doing aggregation only in memory - impossi

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
'm not saying that Storm is not the right fit, it > may be totally suitable for some usages. > > But if you want to avoid the SPOF thing and don't want to bring in > resource management frameworks, the Spark/Cassandra integration is an > interesting alternative. > >

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Interesting things actually: We have hadoop in our eco system. It has single point of failure and I am not sure about inter data center replication. Plan is to use cassandra - no single point of failure , there is data center replication. For aggregation/transformation using SPARK. BUT storm r

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
artition mapping directly to > Cassandra node having the primary partition range. I have still not played > with it into production though so I can't tell anything about stability. > > Maybe other guys on the list may give their thoughts about it ? > > Regards > > Duy Hai DO

Re: cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Great stuff Paco. Thanks for sharing. Couple of questions: Is it required additional installation to be HA like apache mesos? Are you supporting PySpark? How stable /ready for production ? Thanks Oleg. On Thu, Sep 11, 2014 at 12:01 AM, Francisco Madrid-Salvador < pmad...@stratio.com> wrote: >

Re: multi datacenter replication

2014-09-10 Thread Oleg Ruchovets
ocumentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_decomission_dc_t.html > > Hope you'll find everything you need. If some info is missing, come back > and ask. > > Ala

cassandra + spark / pyspark

2014-09-10 Thread Oleg Ruchovets
Hi , I try to evaluate different option of spark + cassandra and I have couple of questions: My aim is to use cassandra+spark without hadoop: 1) Is it possible to use only cassandra as input/output parameter for PySpark? 2) In case I'll use Spark (java,scala) is it possible to use only cass

multi datacenter replication

2014-09-10 Thread Oleg Ruchovets
Hi All. Is multi datacenter replication capability available in community addition? If yes can someone share the experience how stable is it and where can I read the best practice of it? Thanks Oleg.

hardware sizing for cassandra

2014-09-09 Thread Oleg Ruchovets
Hi , Where can I find the document with best practices about sizing for cassandra deployment? We have 1000 writes / reads per second. record size 1k. Questions: 1) how many machines do we need? 2) how many ram ,disc size / type? 3) What should be network? I understand that hardware