Re: CQL 'IN' predicate

2013-11-07 Thread Nate McCall
Regardless of the size of 'k', each key gets turned into a ReadCommand internally and is (eventually) delegated to StorageProxy#fetchRows: https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/service/StorageProxy.java#L852 You still send the same number of ReadComma

Re: CQL 'IN' predicate

2013-11-07 Thread Dan Gould
Thanks--that's what I was wondering. So, if I understand you correctly, it sounds like a single SELECT ... WHERE foo in (k items); can tie up k threads rather than 1 thread per node which can starve other tasks on a cluster. AFAICT, there's no way to say "this query should be limited to

Re: CQL 'IN' predicate

2013-11-07 Thread Nate McCall
Sorry - I meant more that it would be worth experimenting with (much) smaller page sizes as opposed to getting back one giant page. This would cut down the read command overhead but you would still have the parsing to deal with. Aaron has a good point though. A big query is still a bad idea if it

Re: CQL 'IN' predicate

2013-11-06 Thread Aaron Morton
> If one big query doesn't cause problems Every row you read becomes a (roughly) RF number of tasks in the cluster. If you ask for 100 rows in one query it will generate 300 tasks that are processed by the read thread pool which as a default of 32 threads. If you ask for a lot of rows and the n

Re: CQL 'IN' predicate

2013-11-06 Thread Dan Gould
Thanks Nate, I assume 10k is the return limit. I don't think I'll ever get close to 10k matches to the IN query. That said, you're right: to be safe I'll increase the limit to match the number of items on the IN. I didn't know CQL supported stored procedures, but I'll take a look. I suppo

Re: CQL 'IN' predicate

2013-11-06 Thread Nate McCall
Unless you explicitly set a page size (i'm pretty sure the query is converted to a paging query automatically under the hood) you will get capped at the default of 10k which might get a little weird semantically. That said, you should experiment with explicit page sizes and see where it gets you (i

CQL 'IN' predicate

2013-11-06 Thread Dan Gould
I was wondering if anyone had a sense of performance/best practices around the 'IN' predicate. I have a list of up to potentially ~30k keys that I want to look up in a table (typically queries will have <500, but I worry about the long tail). Most of them will not exist in the table, but, say, a