Perfect! Thanks for the response Sylvain! On Friday, March 1, 2013, Sylvain Lebresne wrote:
> On Fri, Mar 1, 2013 at 5:16 PM, Adam Venturella > <aventure...@gmail.com<javascript:_e({}, 'cvml', 'aventure...@gmail.com');> > > wrote: > >> My ColumnFamily is defined as follows: >> >> >> CREATE TABLE UserProfileHistory( >> username text, >> timestamp bigint, -- millis since epoch >> data text, -- JSON >> PRIMARY KEY (username, timestamp) >> ) WITH CLUSTERING ORDER BY (timestamp DESC); >> >> >> Each insert on the username adds to the wide row. The most recent profile >> history being able to be retrieved by >> >> SELECT * FROM UserProfileHistory WHERE username=:username LIMIT 1; >> >> For some reporting needs I need to fetch the entire history, and I need >> to do it in ASC order instead of DESC. >> >> One option is to do the sorting in code, collect N results, sort on the >> timestamps accordingly. Given the row is of N length, that could start to >> put an undo memory burden in my application layer, and I would like to >> avoid that if possible opting instead for Cassandra to perform the work. >> >> >> So I am leaning towards this option: >> >> 2) min timestamp seek + ORDER BY >> >> To start the process my initial timestamp would be >> 01-01-1970T12:00:00+0000 (assume that is in milliseconds, aka 0) I would >> then issue my query: >> >> SELECT * FROM UserProfileHistory WHERE username=:username AND timestamp > >> :milliseconds ORDER BY timestamp ASC LIMIT 100 >> >> Once I have those initial results I would just pick my last timestamp >> from the result set and + 1 on it and run the query again until I received >> 0 results. >> >> >> The CQL works and returns my results as I expect. This will probably only >> be run once every 24 hours, maybe every 12 hours; point being, not often. >> >> Am I setting myself up for a disaster down the line? >> > > No. > > Paging over a partition key like you do in reverse order of the clustering > order by is slightly slower than doing it in the clustering order, but not > by a whole lot. It's slightly slower because 1) there will be backward seek > underneath between the on-disk index block (not a huge deal) and 2) there > is some reversing of lists going on before returning each query (again, not > a huge deal). You'll be totally fine, especially if that query is not the > one on which latency is the most critical. > > -- > Sylvain >