Re: Why so many vnodes?

2013-06-10 Thread Milind Parikh
There are n vnodes regardless of the size of the physical cluster. Regards Milind On Jun 10, 2013 7:48 AM, "Theo Hultberg" wrote: > Hi, > > The default number of vnodes is 256, is there any significance in this > number? Since Cassandra's vnodes don't work like for example Riak's, where > there i

Re: ETL Tools to transfer data from Cassandra into other relational databases

2012-12-13 Thread Milind Parikh
Why would you use Cassandra for primary store of logging information? Have you considered Kafka ? You could , of course, then fan out the logs to both Cassandra (on a near real time basis ) and then on a daily basis (if you wish) extract the "deltas" from Kafka into a RDBMS; with no PIG/Hive etc.

RE: Cassandra Counters

2012-09-24 Thread Milind Parikh
IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and rea

Re: Data aggregation -- help me design a solution

2012-08-21 Thread Milind Parikh
1. Assuming that the majorirty of the line items are new and 2. The lookup of an existing line-item will dictate the performance of the system because reads are slower than writes in C*. 3. Assuming that you are using counters in C* Therefore eliminate that problem by implementing a bloom filte

Re: How to process new rows in parallel?

2012-08-03 Thread Milind Parikh
Kafka is relatively stable and has a active well-supported news-group as well. As discussed by Brian, you would be inverting the paradigm of store-process. Essentially in your original approach, you are storing the messages first and then processing them after the fact. In the Kafka model, you wou

Re: CounterColumns with double, min/max

2012-05-25 Thread Milind Parikh
On 1, countandra.org. On 2, the issue is a little more deep (we have investigated this at countandra). To approach it a little more comprehensively, the issue has more to do with events rather than counts (at least in IMO). A similar issue is about averages... countandra does sums and counts quite

Re: Flume and Cassandra

2012-02-22 Thread Milind Parikh
Coolwww.countandra.org calls them cascaded counters and it will be also based on Kafka. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Feb 22, 2012 7:22 PM, "Edward Capriolo" wrote: I have been

Re: Cassandra to Oracle?

2012-01-22 Thread Milind Parikh
My bad ~s/X:X-Value/Y:Y-Value/ after rereading the SELECT. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Jan 22, 2012 6:40 AM, "Milind Parikh" wrote: The composite-key approach with coun

Re: Cassandra to Oracle?

2012-01-22 Thread Milind Parikh
The composite-key approach with counters would work very well in this case. It will also obviate the concern of not knowing the exact column names apriori...although for efficiencies, you might to look at maintaining a secondary cachelike cf for lookup Depending on your data patterns(not to hi

Re: Data Model Question

2012-01-21 Thread Milind Parikh
I used rainbird as inspiration for Countandra (& some of publicly available data structures from rainbird preso). That said, there are significant differences between the two architectures. Additiomally as Cassandra begins to provide triggets, some very interesting things will become possible in Co

Re: How to store unique visitors in cassandra

2012-01-19 Thread Milind Parikh
You might want to look at the code in countandra.org; regardless of whether you use it. It use a model of dynamic composite keys (although static composite keys would have worked as well). For the actual query,only one row is hit. This of course only works bc the data model is attuned for the query

Announcing Countandra 0.5

2012-01-10 Thread Milind Parikh
Inspired by twitter's rainbird project, Countandra is a hierarchical distributed counting engine at scale. It provides a complete http based interface to both posting events and getting queries. The syntax of a event posting is done in a FORMS compatible way. The result of the query is emitted in

Re: data agility

2011-11-20 Thread Milind Parikh
For 99% of current applications requiing a persistent datastore, Oracle, PgSQL and MySQL variants will suffice. For the 1% of the applications, consider C* if (a) you have given up on distributed transactions ("ACID"LY; but NOT "BASE"ICLY) (b) wondering about this new fangled ho

Re: Multi DC setup

2011-10-10 Thread Milind Parikh
Why have two rings? Cassandra manages the replication for youone ring with physical nodes in two dc might be a better option. Of course, depending on the inter-dc failure characteristics, might need to endure split-brain for a while. /*** sent from my android...please pardo

Re: Queue suggestion in Cassandra

2011-09-16 Thread Milind Parikh
use zookeeper. Scott Fines has a great library on top of zk. On Fri, Sep 16, 2011 at 7:08 PM, Daning Wang wrote: > We try to implement an ordered queue system in Cassandra(ver 0.8.5). In > initial design we use a row as queue, a column for each item in queue. > that means creating new column w

Re: Using Cassandra as a client data store

2011-08-18 Thread Milind Parikh
Why not use couchdb for this use case? Milind /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Aug 18, 2011 9:07 PM, "Nicholas Neuberger" wrote: I've been using Cassandra as a database storage device

Predictable low RW latency, SLABS and STW GC

2011-07-22 Thread Milind Parikh
In order to be predicable @ big data scale, the intensity and periodicity of STW Garbage Collection has to be brought down. Assume that SLABS (Cass 2252) will be available in the main line at some time and assume that this will have the impact that other projects (hbase etc) are reporting. I womder

Re: one way to make counter delete work better

2011-06-14 Thread Milind Parikh
If I understand this correctly, then the epoch integer would be generated by each node. Since time always flows forward, the assumption would be, I suppose, that the epochs would be tagged with the node that generated them and additionally the counter would carry as much history as necessary (and p

Re: rainbird question (why is the 1minute buffer needed?)

2011-05-22 Thread Milind Parikh
I believe that the key reason is souped up performance for most recent data. And yes, "an intelligent flush" leaves you vulnerable to some data loss. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On May

Re: Cassandra Vs. Oracle Coherence

2011-05-20 Thread Milind Parikh
Other interesting flavors in a distributed cache terracotta, gemfire.together with a complex event processing engine. like OCEP drives a lot of low latency, high freq trading where nano seconds matter /*** sent from my android...please pardon occasional typos as

Re: nodes reference by hostname and not IP

2011-04-27 Thread Milind Parikh
Most likely because in the wild, you can't assume a reliable DNS. Just as an aside...This question comes up often in context of managing Cassandra clusters;especially in elastic situations. Most CMDBs assume a static name (host names/static IPs) for nodes. However this often proves to be mismatche

Re: IP address resolution in MultiDC setup (EC2)/VIP

2011-04-26 Thread Milind Parikh
At the risk of repeating the previous conclusions: (a) This configuration obviates the need for a patch that I had posted earlier. This is a good thing. (b) The reported latency(@Sasha) is less than ordinary latencies in EC2. The reasons behind this are not well understood. However I wouldn't look

Re: IP address resolution in MultiDC setup

2011-04-26 Thread Milind Parikh
You can't route traffic over private ips across data centers.this is the point of the patch. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Apr 26, 2011 6:59 AM, "pankaj soni" wrote: one last d

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
t; Just read your paper on this. Must say helped a great deal. > >> > >> 1 more query does amazon by default award both external and internal IP > >> address for each node? or we have to explicitly buy the external IP's? > >> > >> I am looking

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
thought / On Apr 25, 2011 7:43 AM, "Milind Parikh" wrote: I have authored exactly this paperplease search this ml. Please be aware about ec2's internal network as you design your deployment. Ec2 also does not support multicast; which is a pain,but not

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
unable to find documentation of any such deployment online. Because of this multi-regions the public-private IP address issue is important. pankaj On Mon, Apr 25, 2011 at 4:55 PM, Milind Parikh wrote: > > It will be thro...

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
It will be through an overlay n/w. unfortunately setting up such n/w is complex. Look @ something like openvpn. If multicast is supported, it will be easier. With complex software such as Cassandra, it is much better to go with the expected flow; rather than devicing your own flows.my2c. /***

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread Milind Parikh
I respond @ the speed of thought / On Apr 25, 2011 3:54 AM, "David Strauss" wrote: On Fri, 2011-04-22 at 13:31 -0700, Milind Parikh wrote: > Is there a chance of getting manual confli... You can actually already perform "manual conflict resolution" in

Re: Consistency model

2011-04-17 Thread Milind Parikh
Same process or not: only successful QR reads after successful QW will behave with this guarantee. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Apr 17, 2011 10:04 AM, "James Cipar" wrote: > For a

Re: Consistency model

2011-04-17 Thread Milind Parikh
William The issue is regarding whether you will see A or B; with any guarantee of either. The discussion implies no; until the QW is complete. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Apr 17, 20

Re: Consistency model

2011-04-17 Thread Milind Parikh
Successful reads after a successful write @Q have the property of once the read is seen @ one Q, the same read will be seen at any other Q. All others are details that will change with implementation; but,imo, are not bugs. James: in your case, I would think that you have not completed a successf

Re: EC2 - 2 regions

2011-03-23 Thread Milind Parikh
#x27;t think I need other ports for basic setup , right ? If anyone coud get 'nodetool repair' working with this patch (across regions), let me know. It may be I am doing something wrong. On Wed, Mar 23, 2011 at 1:08 AM, Milind Parikh wrote: > @aj > are you sure...

Re: EC2 - 2 regions

2011-03-22 Thread Milind Parikh
@aj are you sure that all ports are accessible from all node? @sasha I think that being able to have the semantics of address aNAT address can emable security from different perspective. Describing an overlay nw will take long hete. But that may solve your security concerns over the internet. /*

Re: EC2 - 2 regions

2011-03-21 Thread Milind Parikh
gt;>> > >>> Great work here. Can you provide the patch against the 2 files? > >>> > >>> Perhaps there's some way to incorporate it into the trunk of cassandra > so that this is feasible (in a future release) without patching the source > code. &

Conflict resolution in Cassandra

2011-03-14 Thread Milind Parikh
https://docs.google.com/document/d/13Yc2t4d07290TdiRmSTchuAk9sbp4BeqOpqeYhbcDFM/edit?hl=en There was an excellent session on vector clocks and synchronous writes in cassandra. Here are my gleanings out of it. /*** sent from my android...please pardon occasional typos as I resp