Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html On Fri, Apr 4, 2014 at 11:34 AM, Apoorva Gaurav wrote: > > > On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > >> >> On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav < >> apoorva.gau...@myntra.com> wrot

Re: Read performance in map data type

2014-04-04 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > > On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav > wrote: > >> If we store the same data as a json using text data type i.e (studentID >> int, subjectMarksJson text) we are getting a latency of ~10ms from the same >> client for even bigger. I

Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav wrote: > If we store the same data as a json using text data type i.e (studentID > int, subjectMarksJson text) we are getting a latency of ~10ms from the same > client for even bigger. I understand that json is not the preferred storage > for cassand

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 3:32 AM, Robert Coli wrote: > On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav > wrote: > >> At the client side we are getting a latency of ~350ms, we are using >> datastax driver 2.0.0 and have kept the fetch size as 500. And these are >> coming while reading rows having ~

Re: Read performance in map data type

2014-04-03 Thread Robert Coli
On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav wrote: > At the client side we are getting a latency of ~350ms, we are using > datastax driver 2.0.0 and have kept the fetch size as 500. And these are > coming while reading rows having ~200 columns. > And you're sure that the 300ms between what ca

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
client side socket limit : 64K client side maximum connection per host : 8 read consistency level : Quorum On Thu, Apr 3, 2014 at 12:59 PM, Shrikar archak wrote: > How about the client side socket limits? Cassandra client side maximum > connection per host and read consistency level? > > ~Shrik

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
How about the client side socket limits? Cassandra client side maximum connection per host and read consistency level? ~Shrikar On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav wrote: > At the client side we are getting a latency of ~350ms, we are using > datastax driver 2.0.0 and have kept the

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
At the client side we are getting a latency of ~350ms, we are using datastax driver 2.0.0 and have kept the fetch size as 500. And these are coming while reading rows having ~200 columns. On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak wrote: > Hi Apoorva, > As per the cfhistogram there are som

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
Hi Apoorva, As per the cfhistogram there are some rows which have more than 75k columns and around 150k reads hit 2 SStables. Are you sure that you are seeing more than 500ms latency? The cfhistogram should the worst read performance was around 51ms which looks reasonable with many reads hitting

Re: Read performance in map data type

2014-04-02 Thread Apoorva Gaurav
Hello Shrikar, We are still facing read latency issue, here is the histogram http://pastebin.com/yEvMuHYh On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav wrote: > Hello Shrikar, > > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post whic

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
I've observed that reducing fetch size results in better latency (isn't that obvious :-)), tried from fetch size varying from 100 to 1, seeing a lot of errors for 1. Haven't tried modifying the number of columns. Let me start a new thread focused on fetch size. On Wed, Apr 2, 2014 at 9:5

Re: Read performance in map data type

2014-04-01 Thread Sourabh Agrawal
>From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we ha

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
Thanks Sourabh, I've modelled my table as "studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)" as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature.

Re: Read performance in map data type

2014-04-01 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav wrote: > Thanks Robert, Is there a workaround, as in our test setups we keep > dropping and recreating tables. > Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob

Re: Read performance in map data type

2014-03-31 Thread Apoorva Gaurav
Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli wrote: > On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav > wrote: > >> Yes primary key is (studentID, subjectID). I had dropped the test table, >> r

Re: Read performance in map data type

2014-03-31 Thread Robert Coli
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav wrote: > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post which will share the cfhistogram. In such > case is there any practical limit on the rows I should fetch, for e.g. > should I do >

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi, I don't think there is a problem with the driver. Regarding the schema, you may want to choose between wide rows and skinny rows. http://stackoverflow.com/questions/19039123/cassandra-wide-vs-skinny-rows-for-large-columns http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html When

Re: Read performance in map data type

2014-03-29 Thread Apoorva Gaurav
Hello Sourabh, I'd prefer to do query like select * from marks_table where studentID = ? and subjectID in (?, ?, ??) but if its costly then can happily delegate the responsibility to the application layer. Haven't tried 2.x java driver for this specific issue but tried it once earlier and fou

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi Apoorva, Do you always query on studentID only or do you need to query on both studentID and subjectID? Also, I think using the latest driver (2.x) can make querying large number of rows efficient. http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 On Sat, Mar 29, 2

Re: Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for e.g. should I do select * form marks_table where studentID =

Re: Read performance in map data type

2014-03-28 Thread Shrikar archak
Hi Apoorva, I assume this is the table with studentId and subjectId as primary keys and not other like like marks in that. create table marks_table(studentId int, subjectId int, marks int, PRIMARY KEY(studentId,subjectId)); Also could you give the cfhistogram stats? nodetool cfhistograms mark

Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello All, We've a schema which can be modeled as (studentID, subjectID, marks) where combination of studentID and subjectID is unique. Number of studentID can go up to 100 million and for each studentID we can have up to 10k subjectIDs. We are using apahce cassandra 2.0.4 and datastax java driv