text search and data aggregation, thoughts?

Esteban Donato Wed, 12 Oct 2011 10:22:31 -0700

Hello community,

 Let me explain my case.  I need to implement a feature that combines
text search and data aggregation.  That is, the app will let the users
search for products and set a date range.  As result, I need to show
them products that matched the search + some data aggregated for that
period of time (like number of visits, number of sells, etc). For
instance, if a user searches for "ipod" and sets dates from "01/01/11"
to "01/10/11", I have to show all the products that match with "ipod"
along with the visits, sells, etc for those products in those 10 days.


 For that purpose, I have 2 tables in a DB: one with the all the
products and the other with the visits and sells per day and product.

 What do you think is the best approach?.  These are my options so far:

Option 1: index both products and visits per day in the same index as
result of doing a join between 2 tables.  That way I can resolve
product search + stats (for aggregated data) in 1 query and with 1
index.  The drawback is having a huge index that grows in documents on
daily basis where just a few of them (one per product) are searchable.
 Also I can see other problems if, for instance, I want to implement
facet navigation over some attributes of the product, like product
category, since the count for each element won't reflect the number of
product for that category but rather the number of products x days
those products have been visited.

Option 2: have 2 cores, one with products, other with visits per day.
Here I can use the product core/index for product searching, pick the
product ids and request stats in the visits core for those ids, then
join both and return everything.  I was checking the "join" feature in
Solr 4.0 to implement this ( http://wiki.apache.org/solr/Join ) , but
as far as I know this feature doesn't return fields from both indexes
(I would need product attributes from product index + stats from
visits index), so I would have to perform both queries manually.  The
drawback here is code complexity (I need to maintain a code that
queries and joins to both indexes) and eventually performance (2
queries per request and eventually with a long list of ids depending
on product match result)

Option 3: 1 core with products and keep visits in the DB.  This option
would work exactly as option 2  with the alternative of calculating
the stats in the DB.  Same drawbacks as option 2.

  Do you guys see any other option or have faced this problem before?
If so what is your advise?.  I think the best approach is to separate
concerns and have Solr for product matching and something else
(high-scalable db?) for stats calculations, but on the other hand it
would be nice to have Solr performs both operations and joins results
behind the scenes.  To make things even more challenging, users could
eventually sort by calculated fields.

Thanks in advance
Esteban

text search and data aggregation, thoughts?

Reply via email to