Re: Cassandra on top of B-Tree

2010-03-28 Thread David Strauss
On 2010-03-28 21:11, Primal Wijesekera wrote:
> I am a master student in UBC CS dept. I along with one of my lab mates are 
> trying to implement the Cassandra on top of a B-Tree implementation rather 
> than of DHT approach that we have right now. We hope to do benchmarking the 
> two approaches and really want to see which one scales better. 
> 
> In the lab we already have a project (which is not yet completed) on 
> developing a Distributed B-Tree on top of a Sinfonia like system. We would be 
> trying to integrate the Cassandra source with the B-tree preserving the rest 
> of the Cassandra logic.
> 
> Since we are still in its very early stage of this experiment, thought of 
> getting your expert thoughts and comments on this and we were wondering 
> whether this could be a potential GSoc project as well.

I'm sorry, but it doesn't make much sense to run Cassandra on top of a
B-tree. Reorganizing indexes when writing goes against one of
Cassandra's primary design goals: streaming writes to disk as
efficiently as possible.

http://wiki.apache.org/cassandra/FAQ#reads_slower_writes

Additionally, there are *so many* other systems that do use B-tree
already. Why add it to Cassandra?

You may want to look at Project Voldemort, which can already distribute
data across servers similarly to Cassandra but (optionally) with
B-tree-based storage on each box. MongoDB also supports sharded data
with B-tree-based indexes. Finally, HBase is a distributed B-tree.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: writing and reading data

2010-04-04 Thread David Strauss
On 2010-04-05 02:23, S Ahmed wrote:
> For starters, I want to learn how keys are read and written from disk.

See "read" and "write":
http://wiki.apache.org/cassandra/ArchitectureOverview

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: boonfilters

2010-04-07 Thread David Strauss
On 2010-04-07 20:34, Peter Schüller wrote:
> Re-sizing a bloom filter implies re-creating it from scratch.

Not necessarily. Depending on your hash, you can sometimes shrink
without regeneration (and without other penalties). It's also sometimes
possible to enlarge the bloom filter without regeneration at the cost of
increased false positives you wouldn't have if you regenerated.

Here's a trivial counterexample to your statement, starting with a bloom
filter with two positions: odd and even.

I can shrink it down to a single bit with an "or" operation. I can
enlarge it to modulo 10 by filling in the odd bits if my current odd bit
is filled and the same with the even numbers and bit.

In the case of the shrink operation, I'm not regenerating or losing any
accuracy. In the case of the enlarge operation, I'll get considerably
more false positives than I would on regeneration, but my operation is
still correct.

Even with complex or cryptographic hashes, a bloom filter based on
using, say, the first X bits might be expandable or shrinkable without
regeneration.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Announcing Riptano professional Cassandra support and services

2010-04-26 Thread David Strauss
On 2010-04-26 19:58, Jonathan Ellis wrote:
> Short version: Matt Pfeil and I have founded http://riptano.com to
> provide production Cassandra support, training, and professional
> services.  Yes, we're hiring.
> 
> Long version: 
> http://spyced.blogspot.com/2010/04/and-now-for-something-completely.html
> 
> We're happy to answer questions on- or off-list.

Does this mean you're no longer with Rackspace?

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Generated code?

2010-06-14 Thread David Strauss
On 2010-06-15 03:58, Masood Mortazavi wrote:
> Hi,
> 
> My assumption is that what one finds in
> 
>   interface/thrift/gen-java
> 
> is actually generated code.
> 
> If so, why is it checked in as source under SVN?
> 
> (Certainly, the avro generated code doesn't seem to be checked in.)
> 
> Regards,
> Masood
> 

It simplifies the end user's build process. If the code isn't in
Subversion, then you'd need to get all the Thrift dependencies and do
the generation yourself just to build Cassandra. Sure, there are other
methods that don't involve checking into Subversion, but they're more
complex.

-- 
David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Cassandra Hack-a-thon in Austin

2010-08-25 Thread David Strauss
On Wed, 2010-08-25 at 09:58 -0500, Eric Evans wrote:
> Some of us from Rackspace are going offsite and heads-down for a day of
> hacking to see how much of the Avro support in Cassandra we can get
> knocked out.  We'll be at Austin Cowork
> (http://www.coworkaustin.com/about.php) on September 1 from 8am to 6pm,
> and we have space enough for 4 more if anyone is interested.
> 
> If you're in the Austin area and would like to join us, shoot me an
> email and give me an idea which area(s) you think you might like to work
> on.  The bigger areas I see are:
> 
> * RPC method implementations
> * Functional tests
> * Client support

I'd like to stop by and work on client support for PHP, Python, or C++.

-- 
David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]