Re: Design questions/Schema Help

Chantal Ackermann Tue, 27 Jul 2010 00:01:20 -0700

Hi,

IMHO you can do this with date range queries and (date) facets.
The DateMathParser will allow you to normalize dates on min/hours/days.
If you hit a limit there, then just add a field with an integer for
either min/hour/day. This way you'll loose the month information - which
is sometimes what you want.


You probably want the document entity to be a query with fields:
query
user (id? if you have that)
sessionid
date

the most popular query within a date range is the query that was logged
most times? Do a search on the date range:
q=date:[start TO end]
with facet on the query which gives you the count similar to "group by &
count" aggregation functionality in an RDBMS. You can do multiple facets
at the same time but be carefull what you are querying for - it will
impact the facet count. You can use functions to change the base of each
facet.

http://wiki.apache.org/solr/SimpleFacetParameters

Cheers,
Chantal

On Tue, 2010-07-27 at 01:43 +0200, Mark wrote:
> We are thinking about using Cassandra to store our search logs. Can 
> someone point me in the right direction/lend some guidance on design? I 
> am new to Cassandra and I am having trouble wrapping my head around some 
> of these new concepts. My brain keeps wanting to go back to a RDBMS design.
> 
> We will be storing the user query, # of hits returned and their session 
> id. We would like to be able to answer the following questions.
> 
> - What is the n most popular queries and their counts within the last x 
> (mins/hours/days/etc). Basically the most popular searches within a 
> given time range.
> - What is the most popular query within the last x where hits = 0. Same 
> as above but with an extra "where" clause
> - For session id x give me all their other queries
> - What are all the session ids that searched for 'foos'
> 
> We accomplish the above functionality w/ MySQL using 2 tables. One for 
> the raw search log information and the other to keep the 
> aggregate/running counts of queries.
> 
> Would this sort of ad-hoc querying be better implemented using Hadoop + 
> Hive? If so, should I be storing all this information in Cassandra then 
> using Hadoop to retrieve it?
> 
> Thanks for your suggestions

Re: Design questions/Schema Help

Reply via email to