Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data

Brett Hoerner Tue, 01 Oct 2013 12:47:00 -0700

I'm interesting in using the new custom sharding features in the
collections API to search a rolling window of event data. I'd appreciate a
spot/sanity check of my plan/understanding.


Say I only care about the last 7 days of events and I have thousands per
second (billions per week).

Am I correct that I could create a new shard for each hour, and send events
that happen in those hour with the ID (uniqueKey) of
`new_event_hour!event_id` so that each hour block of events goes into one
shard?

I *always* query these events by the time in which they occurred, which is
another TrieInt field that I index with every document. So at query time I
would need to calculate the range the user cared about and send something
like _route_=hour1&_route_=hour2 if I wanted to only query those two
shards. (I *can* set multiple _route_ arguments in one query, right? And
Solr will handle merging results like it would with any other cores?)

Some scheduled task would drop and delete shards after they were more than
7 days old.

Does all of that make sense? Do you see a smarter way to do large
"time-oriented" search in SolrCloud?

Thanks!

Advice for using Solr 4.5 custom sharding to handle rolling time-oriented event data

Reply via email to