Load balancing - Order preserving partition - New Keys prob.

2010-09-16 Thread Hien. To Trong
Hi all,
Thanks you for your comments.
I already read some papers about load balancing in P2P which some of them allow 
range query.
During this time, I find another problem: "NEW KEYS"

Karger - Simple efficient load balancing algorithms for p2p systems. (support 
OPP)
John Byers - Simple load balancing for distributed hash table.
Third: Ananth Rao - Load balancing in structured P2P systems.
I think the first paper is the best to get the background.

Prefix Hash Tree: An Indexing Data Structure over Distributed Hash Tables
and Range queries over DHTs
These paper use PHT to allow range query.

James Aspnes - Skip Graphs (most recent as I known)
and Distributed Balanced Tables, Not making a hash of it All.

All of these papers deal with load balancing and range query probs by creating 
schemes or strategies 
based on only "load - number of key on each nodes", not care about new keys and 
highly-accessed keys. 
HOWEVER, in fact, "the most recent keys are likely accessed more frequently".

I suppose, we have 400.000 "NEW KEYS" in 2 recent days (are likely accessed 
more frequently). --> A scheme: these new keys are uniformly partitioned and 
divided into some successive nodes. For example, there are 10 node (N1...N10), 
keys in day 1 will be put into node_1_2, keys in day 2 will be put into 
node_3_4 and so on... But, the prob is that number of keys and nodes 
increase/decrease day by day (not fixed) --> the number of node used to store 
keys in each day may increase/decrease (1 or 3 node for example).

Does any one have any ideas or know papers to deal with this prob (new keys and 
highly-accessed keys)?

Thanks a lot.

Re: when will cassandra 0.7 be realeased?

2010-09-16 Thread Jonathan Ellis
I don't recommend using the betas for anything but testing.

We should see 0.7 final in October.

On Wed, Sep 15, 2010 at 10:27 PM, Chen Xinli  wrote:
> Hi,
>
> We are going to use cassandra in our production env, and want to use the
> feature defining keyspace on the fly.
> When the 0.7 version will be released? or just beta is ok?
>
> Thanks.
>
> --
> Best Regards,
> Chen Xinli
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Load balancing - Order preserving partition - New Keys prob.

2010-09-16 Thread Jonathan Ellis
If new keys don't follow roughly the same distribution as the ones you
already have, then yes, you're going to be load balancing a lot.

On Thu, Sep 16, 2010 at 2:24 AM, Hien. To Trong  wrote:
> Hi all,
> Thanks you for your comments.
> I already read some papers about load balancing in P2P which some of them
> allow range query.
> During this time, I find another problem: "NEW KEYS"
>
> Karger - Simple efficient load balancing algorithms for p2p systems.
> (support OPP)
> John Byers - Simple load balancing for distributed hash table.
> Third: Ananth Rao - Load balancing in structured P2P systems.
> I think the first paper is the best to get the background.
>
> Prefix Hash Tree: An Indexing Data Structure over Distributed Hash Tables
> and Range queries over DHTs
> These paper use PHT to allow range query.
>
> James Aspnes - Skip Graphs (most recent as I known)
> and Distributed Balanced Tables, Not making a hash of it All.
>
> All of these papers deal with load balancing and range query probs by
> creating schemes or strategies
> based on only "load - number of key on each nodes", not care about new keys
> and highly-accessed keys.
> HOWEVER, in fact, "the most recent keys are likely accessed more
> frequently".
>
> I suppose, we have 400.000 "NEW KEYS" in 2 recent days (are likely accessed
> more frequently). --> A scheme: these new keys are uniformly partitioned and
> divided into some successive nodes. For example, there are 10 node
> (N1...N10), keys in day 1 will be put into node_1_2, keys in day 2 will be
> put into node_3_4 and so on... But, the prob is that number of keys and
> nodes increase/decrease day by day (not fixed) --> the number of node used
> to store keys in each day may increase/decrease (1 or 3 node for example).
>
> Does any one have any ideas or know papers to deal with this prob (new keys
> and highly-accessed keys)?
>
> Thanks a lot.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: when will cassandra 0.7 be realeased?

2010-09-16 Thread Chen Xinli
Thanks Jonathan. I can do testing with beta, then upgrade to final version
in Oct.

2010/9/17 Jonathan Ellis 

> I don't recommend using the betas for anything but testing.
>
> We should see 0.7 final in October.
>
> On Wed, Sep 15, 2010 at 10:27 PM, Chen Xinli  wrote:
> > Hi,
> >
> > We are going to use cassandra in our production env, and want to use the
> > feature defining keyspace on the fly.
> > When the 0.7 version will be released? or just beta is ok?
> >
> > Thanks.
> >
> > --
> > Best Regards,
> > Chen Xinli
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
Best Regards,
Chen Xinli