Code review - Spark SQL command-line client for Cassandra

2015-06-19 Thread Matthew Johnson
Hi all, I have been struggling with Cassandra’s lack of adhoc query support (I know this is an anti-pattern of Cassandra, but sometimes management come over and ask me to run stuff and it’s impossible to explain that it will take me a while when it would take about 10 seconds in MySQL) so I have

RE: Lucene index plugin for Apache Cassandra

2015-06-15 Thread Matthew Johnson
Hi Andres, This looks awesome, many thanks for your work on this. Just out of curiosity, how does this compare to the DSE Cassandra with embedded Solr? Do they provide very similar functionality? Is there a list of obvious pros and cons of one versus the other? Thanks! Matthew *From:* A

RE: How to store denormalized data

2015-06-03 Thread Matthew Johnson
fact make part of your analysis job? Kind of a pre-process/prep step? Regards, Shahab On Wed, Jun 3, 2015 at 10:48 AM, Matthew Johnson wrote: Hi all, I am trying to store some data (user actions in our application) for future analysis (probably using Spark). I understand best practice i

How to store denormalized data

2015-06-03 Thread Matthew Johnson
Hi all, I am trying to store some data (user actions in our application) for future analysis (probably using Spark). I understand best practice is to store it in denormalized form, and this will definitely make some of our future queries much easier. But I have a problem with denormalizing the d

RE: Start with single node, move to 3-node cluster

2015-05-26 Thread Matthew Johnson
PM, Matthew Johnson wrote: Hi gurus, We have ordered some hardware for a 3-node cluster, but its ETA is 6 to 8 weeks. In the meantime, I have been lent a single server that I can use. I am wondering what the best way is to set up my single node (SN), so I can then move to the 3-node cluster

Start with single node, move to 3-node cluster

2015-05-26 Thread Matthew Johnson
Hi gurus, We have ordered some hardware for a 3-node cluster, but its ETA is 6 to 8 weeks. In the meantime, I have been lent a single server that I can use. I am wondering what the best way is to set up my single node (SN), so I can then move to the 3-node cluster (3N) when the hardware arrives.

RE: Inserting null values

2015-04-29 Thread Matthew Johnson
partition, and most of my records are written exactly once. So, I just let the tombstones get written and they’ll eventually get compacted out and life will go on. It’s annoying and not ideal, but what can you do? On Apr 29, 2015, at 2:36 AM, Matthew Johnson wrote: Hi all, I have some

Inserting null values

2015-04-29 Thread Matthew Johnson
Hi all, I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT. I can see a few Jiras around CQL 3 supporti

RE: Best Practice to add a node in a Cluster

2015-04-27 Thread Matthew Johnson
Hi Neha, I guess it depends why you are adding a new node – do you need more storage capacity, do you want better resilience, or are you trying to increase performance? If you add a new node with the same amount of storage as the previous two, but you increase the RF, you will use up all of t

RE: Creating 'Put' requests

2015-04-24 Thread Matthew Johnson
thing other than increase GC pauses. On Fri, Apr 24, 2015 at 11:50 AM Phil Yang wrote: 2015-04-23 22:16 GMT+08:00 Matthew Johnson : In HBase, we do something like: Put put = new Put(id); put.add(myPojo.getTimestamp(), myPojo.getValue()); put.add(myPojo.getMySecondTimes

RE: timeout creating table

2015-04-23 Thread Matthew Johnson
Hi Jimmy, I have very limited experience with Cassandra so far, but from following some tutorials to create keyspaces, create tables, and insert data, it definitely seems to me like creating keyspaces and tables is way slower than inserting data. Perhaps a more experienced user can confirm if th

RE: Creating 'Put' requests

2015-04-23 Thread Matthew Johnson
to issue an ‘ALTER TABLE’ statement for every new column. I read one suggestions which is to use collections instead - so basically have a single pre-defined column which is a Map, say, and then add ‘timestamp : value’ into that map instead of a new column for every timestamp. Would you say this is

RE: Creating 'Put' requests

2015-04-23 Thread Matthew Johnson
sSimpleClientBoundStatements_t.html Jim Witschey Software Engineer in Test | jim.witsc...@datastax.com On Thu, Apr 23, 2015 at 9:28 AM, Matthew Johnson wrote: > Hi all, > > > > Currently looking at switching from HBase to Cassandra, and one big > difference so far is that in HBas

Creating 'Put' requests

2015-04-23 Thread Matthew Johnson
Hi all, Currently looking at switching from HBase to Cassandra, and one big difference so far is that in HBase, we create a ‘Put’ object, add to it a set of column/value pairs, and send the Put to the server. So far in Cassandra 2.1.4 the tutorials seem to suggest using CQL3, which I really like

RE: unsubscribe

2015-04-22 Thread Matthew Johnson
Hi Bill, To remove your address from the list, send a message to: Cheers, Matt *From:* Bill Tsay [mailto:bt...@splunk.com] *Sent:* 22 April 2015 15:36 *To:* user@cassandra.apache.org *Subject:* unsubscribe *From: *Mich Talebzadeh *Reply-To: *"user@cassandra.apache.org" *Da

RE: Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
do you expect - mostly read, or mostly write? On Wed, Apr 22, 2015 at 5:06 PM, Matthew Johnson wrote: Hi Ali, Brian, Thanks for the suggestion – we have previously used Solr (SolrCloud for distribution) for a lot of other products, presumably this will do the same job as ElasticSearch? Or does

RE: Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
ght find it better to use elasticsearch for your aggregate queries and analytics. Cassandra is more of just a data store. On Apr 22, 2015 4:42 PM, "Matthew Johnson" wrote: Hi all, Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers

Adhoc querying in Cassandra?

2015-04-22 Thread Matthew Johnson
Hi all, Currently we are setting up a “big” data cluster, but we are only going to have a couple of servers to start with but we need to be able to scale out quickly when usage ramps up. Previously we have used Hadoop/HBase for our big data cluster, but since we are starting this one on only two

RE: Connecting to Cassandra cluster in AWS from local network

2015-04-21 Thread Matthew Johnson
on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over

Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Matthew Johnson
Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the

RE: Adding nodes to existing cluster

2015-04-20 Thread Matthew Johnson
Hi Colin, To remove your address from the list, send a message to: Cheers, Matt *From:* Colin Clark [mailto:co...@clark.ws] *Sent:* 20 April 2015 14:10 *To:* user@cassandra.apache.org *Subject:* Re: Adding nodes to existing cluster unsubscribe On Apr 20, 2015, at 8:08 AM, C