Hi, IMHO you can do this with date range queries and (date) facets. The DateMathParser will allow you to normalize dates on min/hours/days. If you hit a limit there, then just add a field with an integer for either min/hour/day. This way you'll loose the month information - which is sometimes what you want.
You probably want the document entity to be a query with fields: query user (id? if you have that) sessionid date the most popular query within a date range is the query that was logged most times? Do a search on the date range: q=date:[start TO end] with facet on the query which gives you the count similar to "group by & count" aggregation functionality in an RDBMS. You can do multiple facets at the same time but be carefull what you are querying for - it will impact the facet count. You can use functions to change the base of each facet. http://wiki.apache.org/solr/SimpleFacetParameters Cheers, Chantal On Tue, 2010-07-27 at 01:43 +0200, Mark wrote: > We are thinking about using Cassandra to store our search logs. Can > someone point me in the right direction/lend some guidance on design? I > am new to Cassandra and I am having trouble wrapping my head around some > of these new concepts. My brain keeps wanting to go back to a RDBMS design. > > We will be storing the user query, # of hits returned and their session > id. We would like to be able to answer the following questions. > > - What is the n most popular queries and their counts within the last x > (mins/hours/days/etc). Basically the most popular searches within a > given time range. > - What is the most popular query within the last x where hits = 0. Same > as above but with an extra "where" clause > - For session id x give me all their other queries > - What are all the session ids that searched for 'foos' > > We accomplish the above functionality w/ MySQL using 2 tables. One for > the raw search log information and the other to keep the > aggregate/running counts of queries. > > Would this sort of ad-hoc querying be better implemented using Hadoop + > Hive? If so, should I be storing all this information in Cassandra then > using Hadoop to retrieve it? > > Thanks for your suggestions