Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.
Of course response size and other factors come
Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.
On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford
wrote:
> On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jon
On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky
wrote:
> Hmmm... that multipl-gets section is not present in the 2.0 doc:
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
>
> Was that intentional – is that anti-pattern no lon
batches” as an anti-pattern:
http://www.slideshare.net/mattdennis
-- Jack Krupansky
From: Peter Sanford
Sent: Wednesday, June 11, 2014 7:34 PM
To: user@cassandra.apache.org
Subject: Re: Large number of row keys in query kills cluster
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma wrote:
The
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma
wrote:
> The big problem seems to have been requesting a large number of row keys
> combined with a large number of named columns in a query. 20K rows with 20K
> columns destroyed my cluster. Splitting it into slices of 100 sequential
> queries fix
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma
wrote:
> Is there any documentation on this? Obviously these limits will vary by
> cluster capacity, but for new users it would be great to know that you can
> run into problems with large queries, and how they present themselves when
> you hit the
The big problem seems to have been requesting a large number of row keys
combined with a large number of named columns in a query. 20K rows with 20K
columns destroyed my cluster. Splitting it into slices of 100 sequential
queries fixed the performance issue.
When updating 20K rows at a time, I saw
I'm using Astyanax with a query like this:
clusterContext
.getClient()
.getKeyspace("instruments")
.prepareQuery(INSTRUMENTS_CF)
.setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
.getKeySlice(new String[] {
"ROW1",
"ROW2",
// 20,000 keys here...
"ROW2"
})
.ex
Perhaps if you described both the schema and the query in more detail, we
could help... e.g. did the query have an IN clause with 2 keys? Or is
the key compound? More detail will help.
On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma wrote:
> I didn't explain clearly - I'm not requesting 200
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting
in a full scan), I'm requesting 2 specific rows by key.
On Jun 10, 2014 6:02 PM, "DuyHai Doan" wrote:
> Hello Jeremy
>
> Basically what you are doing is to ask Cassandra to do a distributed full
> scan on all the part
Hello Jeremy
Basically what you are doing is to ask Cassandra to do a distributed full
scan on all the partitions across the cluster, it's normal that the nodes
are somehow stressed.
How did you make the query? Are you using Thrift or CQL3 API?
Please note that there is another way to get al
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
unresp
12 matches
Mail list logo