On Wed 30 Jan 2013 05:47:59 PM CST, Sylvain Lebresne wrote:
I'll admit that this part of the DataStax documentation is a bit
confusing (and
I'll reach to the doc writers to make sure this is improved).
The partitioner (being it RandomPartitioner, Murmur3Partitioner or
OrderPreservingP
I'll admit that this part of the DataStax documentation is a bit confusing
(and
I'll reach to the doc writers to make sure this is improved).
The partitioner (being it RandomPartitioner, Murmur3Partitioner or
OrderPreservingPartitioner) is pretty much only a hash function that defi
ple data center deployments, tokens are calculated per data
center so that the hash range is evenly divide for the nodes in each
data center." *This is understandable, but when I go to the getToken
method of RandomPartitioner, I can't find any datacenter-aware token
calculation*
is value size in bytes) 4 times (to
make it hot)
RandomPartitioner: average op rate is 845.
Murmur3Partitioner: average op rage is 721."
Then later:
"I have removed ThreadLocal declaration from the M3P (and cleaned
whitespace errors) which was the bottleneck, after re-running tests with
On Thu, Jan 3, 2013 at 10:21 AM, Alain RODRIGUEZ wrote:
>
> Does this mean that there absolutely no way to switch to the new
> partitioner for people that are already using Cassandra ?
>
Yes, that is what this means.
--
Sylvain
ading the wiki of operations
> (http://wiki.apache.org/cassandra/Operations) I noticed something
> strange. When using RandomPartitioner, tokens are integers in the
> range [0,2**127] (both limits included) but keys are converted into
> this range using MD5. MD5 has 128 bits, so, tokens s
Hello,
Reading the wiki of operations
(http://wiki.apache.org/cassandra/Operations) I noticed something
strange. When using RandomPartitioner, tokens are integers in the
range [0,2**127] (both limits included) but keys are converted into
this range using MD5. MD5 has 128 bits, so, tokens should
9/07/2012, at 7:24 PM, prasenjit mukherjee wrote:
> Thanks for the response. Further questions inline..
>
> On Mon, Jul 9, 2012 at 11:50 AM, samal wrote:
>>> 1. With RandomPartitioner, on a given node, are the keys sorted by
>>> their hash_values or original/unhashe
Thanks for the response. Further questions inline..
On Mon, Jul 9, 2012 at 11:50 AM, samal wrote:
>> 1. With RandomPartitioner, on a given node, are the keys sorted by
>> their hash_values or original/unhashed keys ?
>
> hash value,
1. Based on the second answer in
http:/
inline resp.
On Mon, Jul 9, 2012 at 10:18 AM, prasenjit mukherjee
wrote:
> Thanks Aaron for your response. Some follow up
> questions/assumptions/clarifications :
>
> 1. With RandomPartitioner, on a given node, are the keys sorted by
> their hash_values or original/unhashed
Thanks Aaron for your response. Some follow up
questions/assumptions/clarifications :
1. With RandomPartitioner, on a given node, are the keys sorted by
their hash_values or original/unhashed keys ?
2. With RandomPartitioner, on a given node, are the columns (for a
given key) always sorted by
for background
http://wiki.apache.org/cassandra/FAQ#range_rp
It maps the start key to a token, and then scans X rows from their on CL number
of nodes. Rows are stored in token order.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 7/07/20
On Sat, Jul 7, 2012 at 11:17 AM, prasenjit mukherjee
wrote:
> Have 2 questions :
>
> 1. In RP on a given node, are the rows ordered by hash(key) or key ?
> If the rows on a node are ordered by hash(key) then essentially it has
> to be implemented by a full-scan on that node.
>
> 2. In RP, How does
Have 2 questions :
1. In RP on a given node, are the rows ordered by hash(key) or key ?
If the rows on a node are ordered by hash(key) then essentially it has
to be implemented by a full-scan on that node.
2. In RP, How does a cassandra node route a client's range-query
request ? The range is dis
On Sat, Jul 7, 2012 at 9:26 AM, prasenjit mukherjee
wrote:
> Wondering how a rangequery request is handled if RP is used. Will the
> receiving node do a fan-out to all the nodes in the ring or it will
> just execute the rangequery on its own local partition ?
>
> -Prasenjit
With RP the data is s
Wondering how a rangequery request is handled if RP is used. Will the
receiving node do a fan-out to all the nodes in the ring or it will
just execute the rangequery on its own local partition ?
-Prasenjit
Wondering how a rangequery request is handled if RP is used. Will the
receiving node do a fan-out to all the nodes in the ring or it will
just execute the rangequery on its own local partition ?
--
Sent from my mobile device
Got it. Thanks Jake. Will do.
Safdar
On Mon, Jun 25, 2012 at 4:16 PM, Jake Luciani wrote:
> Hi Sarfar,
>
> Yes you should make it a multiple. The issue is each shard 'sticks' to a
> given node but there is no way to guarantee 5 random keys will equally
> distribute across 5 nodes. The idea
Hi Sarfar,
Yes you should make it a multiple. The issue is each shard 'sticks' to a
given node but there is no way to guarantee 5 random keys will equally
distribute across 5 nodes. The idea is eventually they will as you add
more and more keys. So increasing shards at once can make that happe
Hi Jake,
Thanks. Yes, I forgot to mention also that I had raised the
solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should
I have raised it to 10 or 15 (multiple of 5)? I have added all the
documents that I needed to the index now. It appears the distribution
became more even
Hi Safdar,
If you want to get better utilization of the cluster raise the
solandra.shards.at.once param in solandra.properties
-Jake
On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy wrote:
> Hi,
>
> I've searched online but was unable to find any leads for the problem
> below. This mailing
Thanks.
Oh, I forgot to mention that I'm using cassandra 1.1.0-beta2...in case that
question comes up.
Hoping someone can offer some more feedback on the likelyhood of this
behavior ...
Thanks again,
Safdar
On Jun 24, 2012 9:22 PM, "Dave Brosius" wrote:
> Well it sounds like this doesn't apply t
Well it sounds like this doesn't apply to you.
if you had set up your column family in cql as PRIMARY KEY
(domain_name, path) or something like that and where looking at lots
and lots of url pages (domain_name + path), but from a very small number
domain_names, then the partitioner be
Hi Dave,
Would you mind elaborating a bit more on that, preferably with an example?
AFAIK, Solandra uses the unique id of the Solr document as the input for
calculating the md5 hash for shard/node assignment. In this case the ids
are just millions of varied web URLs that do *not* adhere to any reg
If i read what you are saying, you are _not_ using composite keys?
That's one thing that could do it, if the first part of the composite
key had a very very low cardinality.
On 06/24/2012 11:00 AM, Safdar Kureishy wrote:
Hi,
I've searched online but was unable to find any leads for the proble
An additional detail is that the CPU utilization on those nodes is
proportional to the load below, so machines 9.9.9.1 and 9.9.9.3 experience
a fraction of CPU load as compared to the remaining 3 nodes. This might
further point to the possibility that the keys are hashing minimally to the
token ran
Correct. Random partitioner order is md5 token order. If you make no changes
you will get the same order
On Apr 2, 2012, at 7:53 AM, wrote:
> Hi,
>
> Bit of a silly question, is row iteration using the RandomPartitioner
> deterministic? I don't particularly care
Hi,
Bit of a silly question, is row iteration using the RandomPartitioner
deterministic? I don't particularly care what the order is relative to
the row keys (obviously there isn't one, it's the RandomPartitioner),
but if I run a full iteration over all rows in a CF twice, assumin
__
> > From: Franc Carter
> >To: user@cassandra.apache.org
> >Sent: Wednesday, February 22, 2012 9:24 AM
> >Subject: Re: List all keys with RandomPartitioner
> >
> >
> >On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti
> wrote:
> >
> >I need to iter
>
> From: Franc Carter
>To: user@cassandra.apache.org
>Sent: Wednesday, February 22, 2012 9:24 AM
>Subject: Re: List all keys with RandomPartitioner
>
>
>On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti
>wrote:
>
>I need to iter
Il 2/22/2012 12:24 PM, Franc Carter ha scritto:
On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti mailto:f.baro...@list-group.com>> wrote:
I need to iterate over all the rows in a column family stored with
RandomPartitioner.
When I reach the end of a key slice, I need to find the
On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti wrote:
> I need to iterate over all the rows in a column family stored with
> RandomPartitioner.
> When I reach the end of a key slice, I need to find the token of the last
> key in order to ask for the next slice.
> I saw in an old
eads/trunk
/Henrik
On Wed, Feb 22, 2012 at 10:47, Flavio Baronti wrote:
> I need to iterate over all the rows in a column family stored with
> RandomPartitioner.
> When I reach the end of a key slice, I need to find the token of the last
> key in order to ask for the next slice.
> I saw
I need to iterate over all the rows in a column family stored with
RandomPartitioner.
When I reach the end of a key slice, I need to find the token of the last key
in order to ask for the next slice.
I saw in an old email that the token for a specific key can be recoveder through
Setting a token outside of the partitioner range sounds like a bug. It's mostly
an issue with the RP, but I guess a custom partitioner may also want to
validate tokens are within a range.
Can you report it to https://issues.apache.org/jira/browse/CASSANDRA
Thanks
-
Aaron Morto
I thought about our issue again and was thinking, maybe the describeOwnership
should take into account, if a token is outside the partitioners maximum token
range?
To recap our problem: we had tokens, that were apart by 12.5% of the token
range 2**127, however, we had an offset on each token, w
Thanks for all the responses!
I found our problem:
Using the Random Partitioner, the key range is from 0..2**127.When we added
nodes, we generated the keys and out of convenience, we added an offset to the
tokens because the move was easier like that.
However, we did not execute the modulo 2**1
On 19.01.2012, at 20:15, Narendra Sharma wrote:
> I believe you need to move the nodes on the ring. What was the load on the
> nodes before you added 5 new nodes? Its just that you are getting data in
> certain token range more than others.
With three nodes, it was also imbalanced.
What I don't
I believe you need to move the nodes on the ring. What was the load on the
nodes before you added 5 new nodes? Its just that you are getting data in
certain token range more than others.
-Naren
On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach wrote:
> On 18.01.2012, at 02:19, Maki Watanabe wro
Load reported from node tool ring is the live load, which means SSTables that
the server has open and will read from during a request. This will include
tombstones, expired and over written data.
nodetool ctstats also includes "dead" load, which is sstables that are in use
but still on disk.
2012/1/19 aaron morton :
> If you have performed any token moves the data will not be deleted until you
> run nodetool cleanup.
We did that after adding nodes to the cluster. And then, the cluster
wasn't balanced either.
Also, does the "Load" really account for "dead" data, or is it just live data?
On 18.01.2012, at 02:19, Maki Watanabe wrote:
> Are there any significant difference of number of sstables on each nodes?
No, no significant difference there. Actually, node 8 is among those with more
sstables but with the least load (20GB)
On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
> Are yo
If you have performed any token moves the data will not be deleted until you
run nodetool cleanup.
To get a baseline I would run nodetool compact to do major compaction and purge
any tomb stones as others have said.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http:
Are there any significant difference of number of sstables on each nodes?
2012/1/18 Marcel Steinbach :
> We are running regular repairs, so I don't think that's the problem.
> And the data dir sizes match approx. the load from the nodetool.
> Thanks for the advise, though.
>
> Our keys are digits
Are you deleting data or using TTL's? Expired/deleted data won't go
away until the sstable holding it is compacted. So if compaction has
happened on some nodes, but not on others, you will see this. The
disparity is pretty big 400Gb to 20GB, so this probably isn't the issue,
but with our dat
We are running regular repairs, so I don't think that's the problem.
And the data dir sizes match approx. the load from the nodetool.
Thanks for the advise, though.
Our keys are digits only, and all contain a few zeros at the same offsets. I'm
not that familiar with the md5 algorithm, but I doub
Have you tried running repair first on each node? Also, verify using
df -h on the data dirs
On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
wrote:
> Hi,
>
> we're using RP and have each node assigned the same amount of the token
> space. The cluster looks like that:
>
> Address Status
Hi,
we're using RP and have each node assigned the same amount of the token space.
The cluster looks like that:
Address Status State LoadOwnsToken
205648943402372032879374446
a brand new cluster we just brought up and started loading data into a
few days ago. It's using the RandomPartitioner, RF=3 on everything, and we're
doing QUORUM writes. All keyspaces and CFs are for counter super columns. All
keys are moderately sized ascii strings with good variati
xt:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025659.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>
ill go in Node 2?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025659.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
ove or change token to 0 if I started with
IntitialToken as default (unset).
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025380.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
ned to nodes in circular fashion, for eg: hash ABC to FGH goes to
node A and then hash IJKLM-OPQR goes to node B?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025203.html
Sent from the cassandra-u...@incubator.apache.o
icked randomly
and assigned to nodes in circular fashion, for eg: hash ABC to FGH goes to
node A and then hash IJKLM-OPQR goes to node B?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RandomPartitioner-tp6025203p6025203.html
Sent from the cassan
I cam through the same problem. I have set the end key same as start key, it
worked. (Kinda temp fix...)
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-I-retrieve-specific-key-range-from-a-table-in-RandomPartitioner-tp5415347p5591681.html
at 4:44 PM, ChingShen <chingshenc...@gmail.com> wrote:
Hi all, Can I retrieve specific key range from a table in RandomPartitioner? Because I always got below exception:Exception in thread "main" InvalidRequestException(why:start key's md5 sorts after end key's md5.
[1000]);
List results = client.get_range_slices(keyspace, parent,
predicate, k, ConsistencyLevel.ONE);
On Thu, Aug 12, 2010 at 4:44 PM, ChingShen wrote:
> Hi all,
>
>Can I retrieve specific key range from a table in RandomPartitioner?
> Because I always got below exception:
> Exce
Hi all,
Can I retrieve specific key range from a table in RandomPartitioner?
Because I always got below exception:
Exception in thread "main" InvalidRequestException(why:start key's md5 sorts
after end key's md5. this is not allowed; you probably should not specify
e
RandomPartioner is for row-keys.
#1 no
#2 yes
#3 yes
On Sat, Apr 24, 2010 at 4:33 AM, Larry Root wrote:
> I trying to better understand how using the RandomPartitioner will affect
> my ability to select ranges of keys. Consider my simple example where we
> have many online gam
I trying to better understand how using the RandomPartitioner will affect my
ability to select ranges of keys. Consider my simple example where we have
many online games across different game genres (GameType). These games need
to store data for each one of their users. With that in mind consider
library. I did some tests on my one-node
> development installation about using get_range method to scan the whole CF.
>
> What I want to prove is if a CF with RandomPartitioner can be used with
> get_range getting a fixed number of keys at a time, until all are requested.
> I kno
Hello,
I'm using Cassandra 0.6.1 and ruby's library. I did some tests on my one-node
development installation about using get_range method to scan the whole CF.
What I want to prove is if a CF with RandomPartitioner can be used with
get_range getting a fixed number of keys at a time,
62 matches
Mail list logo