Re: How to store denormalized data

2015-06-03 Thread Shahab Yunus
Suggestion or rather food for thought Do you expect to read/analyze the written data right away? Or will it be a batch process, kicked off later in time? What I am trying to say is that if the 'read/analysis' part is a) batch process and b) kicked off later in time, then #3 is a fine solution?

Re: Data model suggestions

2015-04-26 Thread Shahab Yunus
Interesting approach Oded. Is this something similar that has been described here: http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html Regards, Shahab On Sun, Apr 26, 2015 at 4:29 AM, Peer, Oded wrote: > I would maintain two tables. > > An “archive” table that holds all

Re: Machine Learning With Cassandra

2014-08-30 Thread Shahab Yunus
Spark is not storage, rather it is a streaming framework supposed to be run on big data, distributed architecture (a very high-level intro/definition). It provides batched version of in-memory map/reduce like jobs. It is not completely streaming like Storm but rather batches collection of tuples an

Re: Modeling multi-tenanted Cassandra schema

2013-11-13 Thread Shahab Yunus
Nate, (slightly OT), what client API/library is recommended now that Hector is sunsetting? Thanks. Regards, Shahab On Wed, Nov 13, 2013 at 9:28 AM, Nate McCall wrote: > You basically want option (c). Option (d) might work, but you would be > bending the paradigm a bit, IMO. Certainly do not u

Re: Deleting data using timestamp

2013-10-09 Thread Shahab Yunus
Ahh, yes, 'compaction'. I blanked out while mentioning repair and cleanup. That is in fact what needs to be done first and what I meant. Thanks Robert. Regards, Shahab On Wed, Oct 9, 2013 at 1:50 PM, Robert Coli wrote: > On Wed, Oct 9, 2013 at 7:35 AM, Ravikumar Govindarajan < > ravikumar.govi

Re: Deleting data using timestamp

2013-10-09 Thread Shahab Yunus
I might be missing something obvious here but can't you afford (time-wise) to run cleanup or repair after the deletion so that the deleted data is gone? Assuming that your columns are time-based data? Regards, Shahab On Wed, Oct 9, 2013 at 10:35 AM, Ravikumar Govindarajan < ravikumar.govindara.

Re: Deleting Row Key

2013-10-05 Thread Shahab Yunus
tool Regards, Shahab On Sat, Oct 5, 2013 at 7:06 PM, Shahab Yunus wrote: > Yes you can: > > http://hbase.apache.org/book/regions.arch.html#compaction > http://hbase.apache.org/book/important_configurations.html (Managed > Compaction section) > > Regards, > Shahab

Re: Deleting Row Key

2013-10-05 Thread Shahab Yunus
Yes you can: http://hbase.apache.org/book/regions.arch.html#compaction http://hbase.apache.org/book/important_configurations.html (Managed Compaction section) Regards, Shahab On Sat, Oct 5, 2013 at 6:02 PM, Sebastian Schmidt wrote: > Am 06.10.2013 00:00, schrieb Cem Cayiroglu: > > It will be

Re: get float column in cassandra mapreduce

2013-10-05 Thread Shahab Yunus
Couple of things which I could I think of. Other might have better ideas. 1- The exception is about encoding mismatch. Do you know what is your source files's encoding and what is your system's default? E.g. it can be ISO8859-1 in Windows, UTF-8 in Linux etc.and your file has something else. You c

Re: questions related to the SSTable file

2013-09-17 Thread Shahab Yunus
ich is the last change, right? > > > > Yes > > > > In MR world, each file COULD be processed by different Mapper, but will > be sent to the same reducer as both data will be shared same key. > > > > If that is the way you are writing it, then yes > > &g

Re: questions related to the SSTable file

2013-09-17 Thread Shahab Yunus
java8964, basically are you asking that what will happen if we put large amount of data in one column of one row at once? How will this blob of data representing one column and one row i.e. cell will be split into multiple SSTable? Or in such particular cases it will always be one extra large SSTab

Re: Cassandra nodetool could not resolve '127.0.0.1': unknown host

2013-09-17 Thread Shahab Yunus
Have you tried specifying your hostname (not localhost) in cassandra.yaml and start it? Regards, Shahab On Tue, Sep 17, 2013 at 8:39 AM, pradeep kumar wrote: > I am very new to cassandra. Just started exploring. > > I am running a single node cassandra server & facing a problem in seeing > stat

Re: VMs versus Physical machines

2013-09-12 Thread Shahab Yunus
ore columns at a time. Regards, Shahab On Thu, Sep 12, 2013 at 1:51 AM, Aaron Turner wrote: > > > > > On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus wrote: > >> Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we >> don't go the phys

Re: VMs versus Physical machines

2013-09-11 Thread Shahab Yunus
or Safety. > -- Benjamin Franklin > > > > On Wed, Sep 11, 2013 at 4:21 PM, Shahab Yunus wrote: > >> Hello, >> >> We are deciding whether to get VMs or physical machines for a Cassandra >> cluster. I know this is a very high-level question depending on lot

VMs versus Physical machines

2013-09-11 Thread Shahab Yunus
Hello, We are deciding whether to get VMs or physical machines for a Cassandra cluster. I know this is a very high-level question depending on lots of factors and in fact I want to know that how to tackle this is and what factors should we take into consideration while trying to find the answer.

Re: Help on Cassandra Limitaions

2013-09-06 Thread Shahab Yunus
Also, Sylvain, you have couple of great posts about relationships between CQL3/Thrift entities and naming issues: http://www.datastax.com/dev/blog/cql3-for-cassandra-experts http://www.datastax.com/dev/blog/thrift-to-cql3 I always refer to them when I get confuse :) Regards, Shahab On Fri, Sep

Re: Cassandra Reads

2013-09-06 Thread Shahab Yunus
It only reads till that column (a sequential scan, I believe) and do not read the whole row. It uses a row-level column index to reduce the amount of data read. Much more details at (first 2-3 are must-reads in fact): http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html http://www.d

Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Shahab Yunus
Thanks a lot. Regards, Shahab On Thu, Aug 1, 2013 at 8:32 PM, Robert Coli wrote: > On Thu, Aug 1, 2013 at 2:34 PM, Shahab Yunus wrote: > >> Can you shed some more light (or point towards some other resource) that >> why you think built-in Secondary Indexes should not

Re: Secondary Indexes On Partitioned Time Series Data Question

2013-08-01 Thread Shahab Yunus
Hi Robert, Can you shed some more light (or point towards some other resource) that why you think built-in Secondary Indexes should not be used easily or without much consideration? Thanks. Regards, Shahab On Thu, Aug 1, 2013 at 3:53 PM, Robert Coli wrote: > On Thu, Aug 1, 2013 at 12:49 PM, G

Re: VM dimensions for running Cassandra and Hadoop

2013-07-31 Thread Shahab Yunus
Hi Jan, One question...you say "- I must make sure the disks are directly attached, to prevent problems when multiple nodes flush the commit log at the same time" What do you mean by that? Thanks, Shahab On Wed, Jul 31, 2013 at 3:10 AM, Jan Algermissen wrote: > Jon, > > On 31.07.2013, at

Re: MapReduce response time and speed

2013-07-24 Thread Shahab Yunus
You have lot of questions there so I can't answer all but for the following: *"Can a user of the system define new jobs in an ad-hoc fashion (like a query) or do map reduce jobs need to be prepared by a developer (e.g. in RIAK you do a developer to compile-in the job when you need the perormance of

Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-23 Thread Shahab Yunus
See this as this was discussed earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html Regards, Shahab On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus wrote: > A basic quest

Re: Unable to describe table in CQL 3

2013-07-23 Thread Shahab Yunus
Rahul, See this as it was discussed earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Representation-of-dynamically-added-columns-in-table-column-family-schema-using-cqlsh-td7588997.html Regards, Shahab On Tue, Jul 23, 2013 at 2:51 PM, Rahul Gupta wrote: > I am using C

Re: Auto Discovery of Hosts by Clients

2013-07-22 Thread Shahab Yunus
//www.thelastpickle.com > > On 20/07/2013, at 9:32 AM, sankalp kohli wrote: > > With Auto discovery, you can provide the DC you are local to and it will > only use hosts from that. > > > On Fri, Jul 19, 2013 at 2:08 PM, Shahab Yunus wrote: > >> Hello, >> &

Re: Socket buffer size

2013-07-20 Thread Shahab Yunus
I think the former is for client communication to the nodes and the latter for communication between nodes themselves as evident by the name of the property. Please feel free to correct me if I am wrong. Regards, Shahab On Saturday, July 20, 2013, Mohammad Hajjat wrote: > Hi, > > What's the diff

Auto Discovery of Hosts by Clients

2013-07-19 Thread Shahab Yunus
Hello, I want my Thrift client(s) (using hector 1.1-3) to randomly connect to any node in the Cassandra (1.2.4) cluster. 1- One way is that I pass in a comma separated list of hosts and ports to the CassandraHostConfguration object. 2- The other option is that I configure the auto discovery of ho

Re: IllegalArgumentException on query with AbstractCompositeType

2013-07-13 Thread Shahab Yunus
Aaron Morton can confirm but I think one problem could be that to create an index on a field with small number of possible values is not good. Regards, Shahab On Sat, Jul 13, 2013 at 9:14 AM, Tristan Seligmann wrote: > On Fri, Jul 12, 2013 at 10:38 AM, aaron morton wrote: > >> CREATE INDEX ON c

Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Shahab Yunus
Thanks Eric for the explanation. Regards, Shahab On Fri, Jul 12, 2013 at 11:13 AM, Shahab Yunus wrote: > A basic question and it seems that I have a gap in my understanding. > > I have a simple table in Cassandra with multiple column families. I add > new columns to each of

Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Shahab Yunus
A basic question and it seems that I have a gap in my understanding. I have a simple table in Cassandra with multiple column families. I add new columns to each of these column families on the fly. When I view (using the 'DESCRIBE table' command) the schema of a particular column family, I see onl

Re: what happen if coordinator node fails during write

2013-06-29 Thread Shahab Yunus
Aaron, Can you explain a bit when you say that the client needs to support Atomic Batches in 1.2 and Hector doesn't support it? Does it mean that there is no way of using atomic batch of inserts through Hector? Or did I misunderstand you? Feel free to point me to any link or resource, thanks. Reg

Re: block size

2013-06-20 Thread Shahab Yunus
ing we have key cache enabled) > > ** ** > > ** ** > > *From:* Shahab Yunus [mailto:shahab.yu...@gmail.com] > *Sent:* 20 June 2013 14:32 > *To:* user@cassandra.apache.org > *Subject:* Re: block size > > ** ** > > Have you seen this? > > http://www.da

Re: block size

2013-06-20 Thread Shahab Yunus
Have you seen this? http://www.datastax.com/dev/blog/cassandra-file-system-design Regards, Shahab On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha wrote: > Hi – What is the block size for Cassandra ? is it taken from the OS > defaults ? >

Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
ther in the embedded database. > > > > Setup/tear down time is pretty reasonable. > > > > Ben > > > > From: Shahab Yunus [shahab.yu...@gmail.com] > > Sent: Wednesday, June 19, 2013 8:46 AM > > To: user@cassandra.apache.org > > Subject: Re: Unit Test

Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
ole... That is performance testing. > > When searching for the above, you will not get much luck if you are > looking for them in the context of "unit testing" as those things are > *outside the scope of unit testing" > > > On Wednesday, 19 June 2013, Shahab Yunus w

Re: Dropped mutation messages

2013-06-19 Thread Shahab Yunus
Hello Arthur, What do you mean by "The queries need to be lightened"? Thanks, Shahb On Tue, Jun 18, 2013 at 8:47 PM, Arthur Zubarev wrote: > Cem hi, > > as per http://wiki.apache.org/cassandra/FAQ#dropped_messages > > > Internode messages which are received by a node, but do not get not to b

Unit Testing Cassandra

2013-06-18 Thread Shahab Yunus
Hello, Can anyone suggest a good/popular Unit Test tools/frameworks/utilities out there for unit testing Cassandra stores? I am looking for testing from performance/load and monitoring perspective. I am using 1.2. Thanks a lot. Regards, Shahab

Re: Dynamic Columns Question Cassandra 1.2.5, Datastax Java Driver 1.0

2013-06-06 Thread Shahab Yunus
Dynamic columns are not supported in CQL3. We just had a discussion a day or two ago about this where Eric Stevens explained it. Please see this: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-3-returning-duplicate-keys-td7588181.html Regards, Shahab On Thu, Jun 6, 2013 at

Re: Multiple JBOD data directory

2013-06-05 Thread Shahab Yunus
Though, I am a newbie bust just had a thought regarding your question 'How will it handle requests for data which unavailable?', wouldn't the data be served in that case from other nodes where it has been replicated? Regards, Shahab On Wed, Jun 5, 2013 at 5:32 AM, Christopher Wirt wrote: > Hell

Re: CQL 3 returning duplicate keys

2013-06-05 Thread Shahab Yunus
or standard column families representing as one row per key/column > pair, you can read more about that here: > http://www.datastax.com/dev/blog/thrift-to-cql3 - this is also in the > "Mixing static and dynamic" section, a little farther down. > > > > On Tue, Jun 4, 2013

Re: CQL 3 returning duplicate keys

2013-06-04 Thread Shahab Yunus
Thanks Eric for the detailed explanation but can you point to a source or document for this restriction in CQL3 tables? Doesn't it take away the main feature of the NoSQL store? Or am I am missing something obvious here? Regards, Shahab On Tue, Jun 4, 2013 at 2:12 PM, Eric Stevens wrote: > If