Re: how RandomPartitioner calculate tokens

2013-01-30 Thread Manu Zhang
On Wed 30 Jan 2013 05:47:59 PM CST, Sylvain Lebresne wrote: I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or OrderPreservingP

Re: how RandomPartitioner calculate tokens

2013-01-30 Thread Sylvain Lebresne
I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or OrderPreservingPartitioner) is pretty much only a hash function that defi

how RandomPartitioner calculate tokens

2013-01-30 Thread Manu Zhang
ple data center deployments, tokens are calculated per data center so that the hash range is evenly divide for the nodes in each data center." *This is understandable, but when I go to the getToken method of RandomPartitioner, I can't find any datacenter-aware token calculation*

Re: RandomPartitioner to Murmur3Partitioner

2013-01-03 Thread Edward Capriolo
is value size in bytes) 4 times (to make it hot) RandomPartitioner: average op rate is 845. Murmur3Partitioner: average op rage is 721." Then later: "I have removed ThreadLocal declaration from the M3P (and cleaned whitespace errors) which was the bottleneck, after re-running tests with

Re: RandomPartitioner to Murmur3Partitioner

2013-01-03 Thread Sylvain Lebresne
On Thu, Jan 3, 2013 at 10:21 AM, Alain RODRIGUEZ wrote: > > Does this mean that there absolutely no way to switch to the new > partitioner for people that are already using Cassandra ? > Yes, that is what this means. -- Sylvain

Re: RandomPartitioner and the token limits

2012-10-08 Thread aaron morton
ading the wiki of operations > (http://wiki.apache.org/cassandra/Operations) I noticed something > strange. When using RandomPartitioner, tokens are integers in the > range [0,2**127] (both limits included) but keys are converted into > this range using MD5. MD5 has 128 bits, so, tokens s

RandomPartitioner and the token limits

2012-10-03 Thread Carlos Pérez Miguel
Hello, Reading the wiki of operations (http://wiki.apache.org/cassandra/Operations) I noticed something strange. When using RandomPartitioner, tokens are integers in the range [0,2**127] (both limits included) but keys are converted into this range using MD5. MD5 has 128 bits, so, tokens should

Re: Effect of rangequeries with RandomPartitioner

2012-07-10 Thread aaron morton
9/07/2012, at 7:24 PM, prasenjit mukherjee wrote: > Thanks for the response. Further questions inline.. > > On Mon, Jul 9, 2012 at 11:50 AM, samal wrote: >>> 1. With RandomPartitioner, on a given node, are the keys sorted by >>> their hash_values or original/unhashe

Re: Effect of rangequeries with RandomPartitioner

2012-07-09 Thread prasenjit mukherjee
Thanks for the response. Further questions inline.. On Mon, Jul 9, 2012 at 11:50 AM, samal wrote: >> 1. With RandomPartitioner, on a given node, are the keys sorted by >> their hash_values or original/unhashed keys ? > > hash value, 1. Based on the second answer in http:/

Re: Effect of rangequeries with RandomPartitioner

2012-07-08 Thread samal
inline resp. On Mon, Jul 9, 2012 at 10:18 AM, prasenjit mukherjee wrote: > Thanks Aaron for your response. Some follow up > questions/assumptions/clarifications : > > 1. With RandomPartitioner, on a given node, are the keys sorted by > their hash_values or original/unhashed

Re: Effect of rangequeries with RandomPartitioner

2012-07-08 Thread prasenjit mukherjee
Thanks Aaron for your response. Some follow up questions/assumptions/clarifications : 1. With RandomPartitioner, on a given node, are the keys sorted by their hash_values or original/unhashed keys ? 2. With RandomPartitioner, on a given node, are the columns (for a given key) always sorted by

Re: Effect of rangequeries with RandomPartitioner

2012-07-08 Thread aaron morton
for background http://wiki.apache.org/cassandra/FAQ#range_rp It maps the start key to a token, and then scans X rows from their on CL number of nodes. Rows are stored in token order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/07/20

Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread Edward Capriolo
On Sat, Jul 7, 2012 at 11:17 AM, prasenjit mukherjee wrote: > Have 2 questions : > > 1. In RP on a given node, are the rows ordered by hash(key) or key ? > If the rows on a node are ordered by hash(key) then essentially it has > to be implemented by a full-scan on that node. > > 2. In RP, How does

Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Have 2 questions : 1. In RP on a given node, are the rows ordered by hash(key) or key ? If the rows on a node are ordered by hash(key) then essentially it has to be implemented by a full-scan on that node. 2. In RP, How does a cassandra node route a client's range-query request ? The range is dis

Re: Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread Edward Capriolo
On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee wrote: > Wondering how a rangequery request is handled if RP is used. Will the > receiving node do a fan-out to all the nodes in the ring or it will > just execute the rangequery on its own local partition ? > > -Prasenjit With RP the data is s

Effect of RangeQuery with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Wondering how a rangequery request is handled if RP is used. Will the receiving node do a fan-out to all the nodes in the ring or it will just execute the rangequery on its own local partition ? -Prasenjit

Effect of rangequeries with RandomPartitioner

2012-07-07 Thread prasenjit mukherjee
Wondering how a rangequery request is handled if RP is used. Will the receiving node do a fan-out to all the nodes in the ring or it will just execute the rangequery on its own local partition ? -- Sent from my mobile device

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-25 Thread Safdar Kureishy
Got it. Thanks Jake. Will do. Safdar On Mon, Jun 25, 2012 at 4:16 PM, Jake Luciani wrote: > Hi Sarfar, > > Yes you should make it a multiple. The issue is each shard 'sticks' to a > given node but there is no way to guarantee 5 random keys will equally > distribute across 5 nodes. The idea

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-25 Thread Jake Luciani
Hi Sarfar, Yes you should make it a multiple. The issue is each shard 'sticks' to a given node but there is no way to guarantee 5 random keys will equally distribute across 5 nodes. The idea is eventually they will as you add more and more keys. So increasing shards at once can make that happe

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Hi Jake, Thanks. Yes, I forgot to mention also that I had raised the solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should I have raised it to 10 or 15 (multiple of 5)? I have added all the documents that I needed to the index now. It appears the distribution became more even

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Jake Luciani
Hi Safdar, If you want to get better utilization of the cluster raise the solandra.shards.at.once param in solandra.properties -Jake On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy wrote: > Hi, > > I've searched online but was unable to find any leads for the problem > below. This mailing

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Thanks. Oh, I forgot to mention that I'm using cassandra 1.1.0-beta2...in case that question comes up. Hoping someone can offer some more feedback on the likelyhood of this behavior ... Thanks again, Safdar On Jun 24, 2012 9:22 PM, "Dave Brosius" wrote: > Well it sounds like this doesn't apply t

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
Well it sounds like this doesn't apply to you. if you had set up your column family in cql as PRIMARY KEY (domain_name, path) or something like that and where looking at lots and lots of url pages (domain_name + path), but from a very small number domain_names, then the partitioner be

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
Hi Dave, Would you mind elaborating a bit more on that, preferably with an example? AFAIK, Solandra uses the unique id of the Solr document as the input for calculating the md5 hash for shard/node assignment. In this case the ids are just millions of varied web URLs that do *not* adhere to any reg

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Dave Brosius
If i read what you are saying, you are _not_ using composite keys? That's one thing that could do it, if the first part of the composite key had a very very low cardinality. On 06/24/2012 11:00 AM, Safdar Kureishy wrote: Hi, I've searched online but was unable to find any leads for the proble

Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster

2012-06-24 Thread Safdar Kureishy
An additional detail is that the CPU utilization on those nodes is proportional to the load below, so machines 9.9.9.1 and 9.9.9.3 experience a fraction of CPU load as compared to the remaining 3 nodes. This might further point to the possibility that the keys are hashing minimally to the token ran

Re: Row iteration using RandomPartitioner

2012-04-02 Thread Jake Luciani
Correct. Random partitioner order is md5 token order. If you make no changes you will get the same order On Apr 2, 2012, at 7:53 AM, wrote: > Hi, > > Bit of a silly question, is row iteration using the RandomPartitioner > deterministic? I don't particularly care

Row iteration using RandomPartitioner

2012-04-02 Thread christopher-t.ng
Hi, Bit of a silly question, is row iteration using the RandomPartitioner deterministic? I don't particularly care what the order is relative to the row keys (obviously there isn't one, it's the RandomPartitioner), but if I run a full iteration over all rows in a CF twice, assumin

Re: List all keys with RandomPartitioner

2012-02-22 Thread R. Verlangen
__ > > From: Franc Carter > >To: user@cassandra.apache.org > >Sent: Wednesday, February 22, 2012 9:24 AM > >Subject: Re: List all keys with RandomPartitioner > > > > > >On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti > wrote: > > > >I need to iter

Re: List all keys with RandomPartitioner

2012-02-22 Thread Rafael Almeida
> > From: Franc Carter >To: user@cassandra.apache.org >Sent: Wednesday, February 22, 2012 9:24 AM >Subject: Re: List all keys with RandomPartitioner > > >On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti >wrote: > >I need to iter

Re: List all keys with RandomPartitioner

2012-02-22 Thread Flavio Baronti
Il 2/22/2012 12:24 PM, Franc Carter ha scritto: On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti mailto:f.baro...@list-group.com>> wrote: I need to iterate over all the rows in a column family stored with RandomPartitioner. When I reach the end of a key slice, I need to find the

Re: List all keys with RandomPartitioner

2012-02-22 Thread Franc Carter
On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti wrote: > I need to iterate over all the rows in a column family stored with > RandomPartitioner. > When I reach the end of a key slice, I need to find the token of the last > key in order to ask for the next slice. > I saw in an old

Re: List all keys with RandomPartitioner

2012-02-22 Thread Henrik Schröder
eads/trunk /Henrik On Wed, Feb 22, 2012 at 10:47, Flavio Baronti wrote: > I need to iterate over all the rows in a column family stored with > RandomPartitioner. > When I reach the end of a key slice, I need to find the token of the last > key in order to ask for the next slice. > I saw

List all keys with RandomPartitioner

2012-02-22 Thread Flavio Baronti
I need to iterate over all the rows in a column family stored with RandomPartitioner. When I reach the end of a key slice, I need to find the token of the last key in order to ask for the next slice. I saw in an old email that the token for a specific key can be recoveder through

Re: Unbalanced cluster with RandomPartitioner

2012-01-23 Thread aaron morton
Setting a token outside of the partitioner range sounds like a bug. It's mostly an issue with the RP, but I guess a custom partitioner may also want to validate tokens are within a range. Can you report it to https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron Morto

Re: Unbalanced cluster with RandomPartitioner

2012-01-21 Thread Marcel Steinbach
I thought about our issue again and was thinking, maybe the describeOwnership should take into account, if a token is outside the partitioners maximum token range? To recap our problem: we had tokens, that were apart by 12.5% of the token range 2**127, however, we had an offset on each token, w

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
Thanks for all the responses! I found our problem: Using the Random Partitioner, the key range is from 0..2**127.When we added nodes, we generated the keys and out of convenience, we added an offset to the tokens because the move was easier like that. However, we did not execute the modulo 2**1

Re: Unbalanced cluster with RandomPartitioner

2012-01-20 Thread Marcel Steinbach
On 19.01.2012, at 20:15, Narendra Sharma wrote: > I believe you need to move the nodes on the ring. What was the load on the > nodes before you added 5 new nodes? Its just that you are getting data in > certain token range more than others. With three nodes, it was also imbalanced. What I don't

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Narendra Sharma
I believe you need to move the nodes on the ring. What was the load on the nodes before you added 5 new nodes? Its just that you are getting data in certain token range more than others. -Naren On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach wrote: > On 18.01.2012, at 02:19, Maki Watanabe wro

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread aaron morton
Load reported from node tool ring is the live load, which means SSTables that the server has open and will read from during a request. This will include tombstones, expired and over written data. nodetool ctstats also includes "dead" load, which is sstables that are in use but still on disk.

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
2012/1/19 aaron morton : > If you have performed any token moves the data will not be deleted until you > run nodetool cleanup. We did that after adding nodes to the cluster. And then, the cluster wasn't balanced either. Also, does the "Load" really account for "dead" data, or is it just live data?

Re: Unbalanced cluster with RandomPartitioner

2012-01-19 Thread Marcel Steinbach
On 18.01.2012, at 02:19, Maki Watanabe wrote: > Are there any significant difference of number of sstables on each nodes? No, no significant difference there. Actually, node 8 is among those with more sstables but with the least load (20GB) On 17.01.2012, at 20:14, Jeremiah Jordan wrote: > Are yo

Re: Unbalanced cluster with RandomPartitioner

2012-01-18 Thread aaron morton
If you have performed any token moves the data will not be deleted until you run nodetool cleanup. To get a baseline I would run nodetool compact to do major compaction and purge any tomb stones as others have said. Cheers - Aaron Morton Freelance Developer @aaronmorton http:

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Maki Watanabe
Are there any significant difference of number of sstables on each nodes? 2012/1/18 Marcel Steinbach : > We are running regular repairs, so I don't think that's the problem. > And the data dir sizes match approx. the load from the nodetool. > Thanks for the advise, though. > > Our keys are digits

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Jeremiah Jordan
Are you deleting data or using TTL's? Expired/deleted data won't go away until the sstable holding it is compacted. So if compaction has happened on some nodes, but not on others, you will see this. The disparity is pretty big 400Gb to 20GB, so this probably isn't the issue, but with our dat

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Marcel Steinbach
We are running regular repairs, so I don't think that's the problem. And the data dir sizes match approx. the load from the nodetool. Thanks for the advise, though. Our keys are digits only, and all contain a few zeros at the same offsets. I'm not that familiar with the md5 algorithm, but I doub

Re: Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Mohit Anchlia
Have you tried running repair first on each node? Also, verify using df -h on the data dirs On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach wrote: > Hi, > > we're using RP and have each node assigned the same amount of the token > space. The cluster looks like that: > > Address         Status

Unbalanced cluster with RandomPartitioner

2012-01-17 Thread Marcel Steinbach
Hi, we're using RP and have each node assigned the same amount of the token space. The cluster looks like that: Address Status State LoadOwnsToken 205648943402372032879374446

randompartitioner cluster unbalanced

2011-08-31 Thread David Hawthorne
a brand new cluster we just brought up and started loading data into a few days ago. It's using the RandomPartitioner, RF=3 on everything, and we're doing QUORUM writes. All keyspaces and CFs are for counter super columns. All keys are moderately sized ascii strings with good variati

Re: RandomPartitioner

2011-02-14 Thread Matthew Dennis
xt: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025659.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. >

Re: RandomPartitioner

2011-02-14 Thread mcasandra
ill go in Node 2? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025659.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: RandomPartitioner

2011-02-14 Thread mcasandra
ove or change token to 0 if I started with IntitialToken as default (unset). -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025380.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: RandomPartitioner

2011-02-14 Thread Dan Kuebrich
ned to nodes in circular fashion, for eg: hash ABC to FGH goes to node A and then hash IJKLM-OPQR goes to node B? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025203.html Sent from the cassandra-u...@incubator.apache.o

RandomPartitioner

2011-02-14 Thread mcasandra
icked randomly and assigned to nodes in circular fashion, for eg: hash ABC to FGH goes to node A and then hash IJKLM-OPQR goes to node B? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025203.html Sent from the cassan

Re: Can I retrieve specific key range from a table in RandomPartitioner?

2010-10-01 Thread ilndinesh
I cam through the same problem. I have set the end key same as start key, it worked. (Kinda temp fix...) -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-retrieve-specific-key-range-from-a-table-in-RandomPartitioner-tp5415347p5591681.html

Re: Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread Aaron Morton
at 4:44 PM, ChingShen <chingshenc...@gmail.com> wrote: Hi all,   Can I retrieve specific key range from a table in RandomPartitioner? Because I always got below exception:Exception in thread "main" InvalidRequestException(why:start key's md5 sorts after end key's md5.

Re: Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread ChingShen
[1000]); List results = client.get_range_slices(keyspace, parent, predicate, k, ConsistencyLevel.ONE); On Thu, Aug 12, 2010 at 4:44 PM, ChingShen wrote: > Hi all, > >Can I retrieve specific key range from a table in RandomPartitioner? > Because I always got below exception: > Exce

Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread ChingShen
Hi all, Can I retrieve specific key range from a table in RandomPartitioner? Because I always got below exception: Exception in thread "main" InvalidRequestException(why:start key's md5 sorts after end key's md5. this is not allowed; you probably should not specify e

Re: Trying To Understand get_range_slices Results When Using RandomPartitioner

2010-04-26 Thread Schubert Zhang
RandomPartioner is for row-keys. #1 no #2 yes #3 yes On Sat, Apr 24, 2010 at 4:33 AM, Larry Root wrote: > I trying to better understand how using the RandomPartitioner will affect > my ability to select ranges of keys. Consider my simple example where we > have many online gam

Trying To Understand get_range_slices Results When Using RandomPartitioner

2010-04-23 Thread Larry Root
I trying to better understand how using the RandomPartitioner will affect my ability to select ranges of keys. Consider my simple example where we have many online games across different game genres (GameType). These games need to store data for each one of their users. With that in mind consider

Re: RandomPartitioner doubts

2010-04-21 Thread Jonathan Ellis
library. I did some tests on my one-node > development installation about using get_range method to scan the whole CF. > > What I want to prove is if a CF with RandomPartitioner can be used with > get_range getting a fixed number of keys at a time, until all are requested. > I kno

RandomPartitioner doubts

2010-04-21 Thread Lucas Di Pentima
Hello, I'm using Cassandra 0.6.1 and ruby's library. I did some tests on my one-node development installation about using get_range method to scan the whole CF. What I want to prove is if a CF with RandomPartitioner can be used with get_range getting a fixed number of keys at a time,