Re: Grep for what?

2010-04-05 Thread Benjamin Black
ps auxww | grep cassandra

(and upgrade to 0.6)

On Mon, Apr 5, 2010 at 10:20 AM, JEFFERY SCHMITZ  wrote:
> Warning this is a newbie question -
>
> On startup I get
>
> [r...@marduk bin]# ./cassandra -f
> Listening for transport dt_socket at address: 
> INFO - Sampling index for 
> /var/lib/cassandra/data/system/LocationInfo-1-Data.db
> INFO - Replaying /var/lib/cassandra/commitlog/CommitLog-1270047913578.log
> INFO - LocationInfo has reached its threshold; switching in a fresh Memtable
> INFO - Enqueuing flush of Memtable(LocationInfo)@2132679615
> INFO - Sorting Memtable(LocationInfo)@2132679615
> INFO - Writing Memtable(LocationInfo)@2132679615
> INFO - Completed flushing 
> /var/lib/cassandra/data/system/LocationInfo-2-Data.db
> INFO - Log replay complete
> INFO - Saved Token found: 75598148438183751486026363636316999593
> INFO - Starting up server gossip
> INFO - Cassandra starting up...
>
> Okay so its running I can't figure out what the PID is?? - grepping for 
> cassandra or apache turns up zip -
>
> Thanks -
>
> Jeffery
>
>


Re: boonfilters

2010-04-07 Thread Benjamin Black
Please read this:
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

On Wed, Apr 7, 2010 at 1:27 PM, S Ahmed  wrote:
> Just reading up on boonfilters, few questions.
>
> Basically boonfilters let give you a true/false if a particular key exists,
> and they *may* give you a false positive i.e. they key exists but never a
> false negative i.e. the key doesn't exist.
>
> The core of boonfilters is its hashing mechanism that marks the in-memory
> matrix/map if the key exists.
>
> 1. Is the only place boonfilters are used in Cassandra is when you want to
> see if a particular key exists in a particular node?
>
> 2. Are boonfilters a fixed size, or they adjust as to the # of keys?  any
> example size?
>
> 3. Boonfilters don't give false negatives:
>    So you hit a node, and perform a lookup in the boonfilter for a key.  It
> says "yes", but when you do a lookup the object returned is null, so then
> you flag that this node needs this particular key during replication.
>
>
> Have I grasp this concept?
>
> Really loving this project, learning allot from the code.  It would be great
> if someone could do a walkthrough of common functionality in a detailed way
> :)
>


Re: boonfilters

2010-04-07 Thread Benjamin Black
(Should mention: suggesting reading the Dynamo paper for general
background, not for Bloom filters, which are fantastically covered in
the Wikipedia entry).

On Wed, Apr 7, 2010 at 4:11 PM, Benjamin Black  wrote:
> Please read this:
> http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
>
> On Wed, Apr 7, 2010 at 1:27 PM, S Ahmed  wrote:
>> Just reading up on boonfilters, few questions.
>>
>> Basically boonfilters let give you a true/false if a particular key exists,
>> and they *may* give you a false positive i.e. they key exists but never a
>> false negative i.e. the key doesn't exist.
>>
>> The core of boonfilters is its hashing mechanism that marks the in-memory
>> matrix/map if the key exists.
>>
>> 1. Is the only place boonfilters are used in Cassandra is when you want to
>> see if a particular key exists in a particular node?
>>
>> 2. Are boonfilters a fixed size, or they adjust as to the # of keys?  any
>> example size?
>>
>> 3. Boonfilters don't give false negatives:
>>    So you hit a node, and perform a lookup in the boonfilter for a key.  It
>> says "yes", but when you do a lookup the object returned is null, so then
>> you flag that this node needs this particular key during replication.
>>
>>
>> Have I grasp this concept?
>>
>> Really loving this project, learning allot from the code.  It would be great
>> if someone could do a walkthrough of common functionality in a detailed way
>> :)
>>
>


set_keyspace()

2010-06-06 Thread Benjamin Black
Can someone enlighten me as to the purpose of set_keyspace() and the
elimination of the keyspace args from calls?  I understand there was a
discussion of the issue before I joined the list several months ago.
For those with several keyspaces, or many keyspaces, as when using a
keyspace per customer, it is a huge step backwards.  The change will
require either a significant increase in the number of calls or a
complete redesign and implementation of connection management.
Neither is attractive.

Some insight into this decision would be appreciated.


b


Re: 3-node balanced system

2010-06-17 Thread Benjamin Black
in ruby:

def token(nodes) 1.upto(nodes) {|i| p (2**127/nodes) * i}; end

>> token(3)
56713727820156410577229101238628035242
113427455640312821154458202477256070484
170141183460469231731687303715884105726


On Thu, Jun 17, 2010 at 12:08 PM, Lev Stesin  wrote:
> Hi,
>
> What is the correct procedure to create a well balanced cluster (in
> terms of key distribution). From what I understand whenever I add a
> new node its takes half from its neighbor. How can I make each node to
> contain 1/3 of the keys in a 3 node cluster? Thanks.
>
> --
> Lev
>


Re: cassandra increment counters, Jira #1072

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa  wrote:
>
> I think the underlying unanswered question is whether #1072 is a niche
> feature or whether it should be brought into trunk.
>

This should not be an unanswered question!  #1072 should be considered
essential, as it enables numerous use cases that currently require
bolting something like memcache or redis onto the side to handle
counters.

+1 on getting this into trunk ASAP.


b


Re: cassandra increment counters, Jira #1072

2010-08-13 Thread Benjamin Black
On Thu, Aug 12, 2010 at 8:54 PM, Jonathan Ellis  wrote:
> There are two concerns that give me pause.
>
> The first is that 1072 is tackling a use case that Cassandra already
> handles well: high volume of writes to a counter, with low volume
> reads.  (This can be done by inserting uuids into a counter row, and
> aggregating them either in the background or at read time or with some
> combination of these.  The counter rows can be sharded if necessary.)
>

This is simply not an acceptable alternative and just can't be called
handling it "well".  It is equivalent to "make the users do it", which
is the case for almost anything.  The reasons #1072 is so valuable:

1) Does not require _any_ user action.
2) Does not change the EC-centric model of Cassandra.
3) Meets the requirements of many, major users who would otherwise
have to use another storage system.

> The second is that the approach in 1072 resembles an entirely separate
> system that happens to use part of Cassandra infrastructure -- the
> thrift API, the MessagingService, the sstable format -- but isn't
> really part of it.  ConsistencyLevel is not respected, and special
> cases abound to weld things in that don't fit, e.g. the AES/Streaming
> business.
>

Then let's find ways to make it as elegant as it can be.  Ultimately,
this functionality needs to be in Cassandra or users will simply
migrate someplace else for this extremely common use case.


b


Re: cassandra increment counters, Jira #1072

2010-08-13 Thread Benjamin Black
On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis  wrote:
>>
>> This is simply not an acceptable alternative and just can't be called
>> handling it "well".
>
> What part is it handling poorly, at a technical level?  This is almost
> exactly what 1072 does internally -- we are concerned here with the
> high write, low read volume case.
>

Requiring clients directly manage the counter rows in order to
periodically compress or segment them.  Yes, you can emulate the
behavior.  No, that is not handling it well.

>>  It is equivalent to "make the users do it", which
>> is the case for almost anything.
>
> I strongly feel we should be in the business of providing building
> blocks, not special cases on top of that.  (But see below, I *do*
> think the 580 version vectors is the kind of building block we want!)
>

I agree, 580 is really valuable and should be in.  The problem for
high write rate, distributed counters is the requirement of read
before write inherent in such vector-based approaches.  Am I missing
some aspect of 580 that precludes that?

>>  The reasons #1072 is so valuable:
>>
>> 1) Does not require _any_ user action.
>
> This can be addressed at the library level.  Just as our first stab at
> ZK integration was a rather clunky patch; "cages" is better.
>

Certainly, but it would be hard to argue (and I am not) that the
tightly synchronized behavior of ZK is a good match for Cassandra
(mixing in Paxos could make for some neat options, but that's another
debate...).

>> 2) Does not change the EC-centric model of Cassandra.
>
> It does, though.  1072 is *not* a version vector-based approach --
> that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
> to Kelvin for writing that up!)
>

Nor is Cassandra right now.  I know 1072 isn't vector based, and I
think that is in its favor _for this application_.

> I'm referring in particular to reads requiring CL.ALL.  (My
> understanding is that in the previous design, a "master" replica was
> chosen and was always written to first.)  Both of these break "the
> EC-centric model" and that is precisely the objection I made when I
> said "ConsistencyLevel is not respected."  I don't think this is
> fixable in the 1072 approach.  I would be thrilled to be wrong.
>

It is EC in that the total for a counter is unknown until resolved on
read.  Yes, it does not respect CL, but since it can only be used in 1
way, I don't see that as a disadvantage.

>>> The second is that the approach in 1072 resembles an entirely separate
>>> system that happens to use part of Cassandra infrastructure -- the
>>> thrift API, the MessagingService, the sstable format -- but isn't
>>> really part of it.  ConsistencyLevel is not respected, and special
>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>> business.
>>
>> Then let's find ways to make it as elegant as it can be.  Ultimately,
>> this functionality needs to be in Cassandra or users will simply
>> migrate someplace else for this extremely common use case.
>
> This is what I've been pushing for.  The version vector approach to
> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
> EC-centric approach that addresses a case that we *don't* currently
> handle well (counters with a higher read volume than 1072).
>

Perhaps I missed something: does counting 580 require read before
counter update (local to the node, not a client read)?


b


Re: Locking in cassandra

2010-08-16 Thread Benjamin Black
This is the locking implementation:

http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/NotImplementedException.html

And you might benefit from reading these:

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency


b

On Mon, Aug 16, 2010 at 6:07 AM, Maifi Khan  wrote:
> Hi
> How is the locking implemented in cassandra? Say, I have 10 nodes and
> I want to write to 6 nodes which is (n+1)/2.
> Will it lock 6 nodes first and then start writing? Or will it write
> one by one and see if it can write to 6 nodes.
> How is this implemented? What package does this locking?
> Thanks in advance.
>
> thanks
>


Re: Order preserving partitioning strategy

2010-08-22 Thread Benjamin Black
https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/dht/OrderPreservingPartitioner.java

On Sun, Aug 22, 2010 at 10:46 AM, Hien. To Trong  wrote:
> Hi,
> I am developing a system with some features like cassandra.
> I want to add order preserving partitioning strategy, but I don't know how to 
> implement it.
>
> In cassandra paper - Cassandra - A Decentralized Structured Storage System
> "Cassandra partitions data across the cluster using consistent hashing but 
> uses an order pre-
> serving hash function (OPHF) to do so"
>
> I wonder:
>
> 1. Cassandra still use a hash function (the other strategy is random 
> partitioner) for OPP?
> If so, what is the algorithm of OPHF? is it a type of minimal perfect hash 
> function (MPHF)?
>
> I already read some papers about algorithms for MPHF which preserve the order 
> of hash value. However,
> the size of key space equals and hash value space are equal and much more 
> smaller than the size of key space
> (may be userid or usertaskid) in our application. How can I deal with that or 
> I went on the wrong track?
>
> 2. My system is simple. I have some servers and I use Berkeley DB to store 
> Key/Value (our data model is simple). Is OPP strategy useful
> when I don't have data model like cassandra? (column family for example).
>
> Thanks so much.


Re: Order preserving partitioning strategy

2010-08-23 Thread Benjamin Black
Use OPP and prefix keys with a randomizing element when range queries
will not be required.  For keys that will be queried in ranges, don't
use such a prefix.

On Mon, Aug 23, 2010 at 8:36 PM, Hien. To Trong  wrote:
> Hi,
> OrderPreservingPartitioner is efficient range queries but can cause unevently 
> distributed data.
> Does anyone has an idea of a HybridPartitioner which takes advantages of both 
> RandomPartitioner and OPP,
> or at least a partitioner trade off between them.


Re: handling client network connections in Cassandra

2010-09-01 Thread Benjamin Black
Think he means on the server side, yo.

On Wed, Sep 1, 2010 at 12:31 PM, Jonathan Ellis  wrote:
> You might want to build on top of something like Hector that handles
> the low level pooling, failover, etc. already instead of raw Thrift.
>
> On Wed, Sep 1, 2010 at 11:04 AM, Amol Deshpande
>  wrote:
>> As I've mentioned before, I'm looking at implementing a  protobuf
>> interface for clients to talk to Cassandra. Looking at the source, I
>> don't see a network thread/connection pool that I could easily piggyback
>> on. This is probably because both thrift and avro seem to have their own
>> internal connection management.
>>
>> Any opinions on apache MINA ?
>>
>> Thanks,
>> -amol
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: dropped messages

2010-09-24 Thread Benjamin Black
Complete information, including everything in tpstats, is available
for your monitoring systems via JMX.  For production clusters, it is
essential you at least collect the JMX stats, if not alarm on various
problems (such as backed up stages).


b

On Wed, Sep 22, 2010 at 6:47 AM, Carl Bruecken
 wrote:
>  On 9/22/10 9:37 AM, Jonathan Ellis wrote:
>>
>> it's easy to tell from tpstats which stage(s) are overloaded
>>
>> On Wed, Sep 22, 2010 at 8:29 AM, Carl Bruecken
>>   wrote:
>>>
>>>  With current implementation, it's impossible to tell from logs what the
>>> message types (verb) were dropped.  I read this was changed for spamming,
>>> but I think the behavior should be configurable, either aggregate counts
>>> of
>>> dropped messages or log individual occurrences with the message verb.
>>>
>>> One suggestion is to pass message into
>>> MessagingService.incrementDroppedMessages and have a configuration item
>>> or
>>> system property indicate the behavior.
>>>
>>
>>
> It's generally transient/bursty.  By the time I get around to checking
> tpstats the active/pending counts are all back to 0 and I have no record as
> to what occured.
>