Re: hot shard concept

Shawn Heisey Mon, 29 Oct 2012 10:18:24 -0700

On 10/29/2012 7:55 AM, Dmitry Kan wrote:

Hi everyone,


at this year's Berlin Buzz words conference someone (sematext?) have
described a technique of a hot shard. The idea is to have a slim shard to
maximize the update throughput during a day (when millions of docs need to
be posted) and make sure the indexed documents are immediately searchable.
In the end of the day the day's documents are moved to cold shards. If I'm
not mistaken, this was implemented for ElasticSearch. I'm currently
implementing something similar (but pretty tailored to our logical sharding
use case) for Solr (3.x). The feature set looks roughly like this:

1) front end solr (query router) is aware of the hot shard: it directs the
incoming queries to the hot and "cold" shards.
2) new incoming documents are directed first to the hot shard and then
periodically (like once a day or once a week) moved over to the closest in
time cold shard. And for that...
3) hot shard index is being partitioned low level using Lucene's
IndexReader / IndexWriter with the implementation based on [1], [2] and
customized to logical (time-based) sharding.


The question is: is doing index partitioning low-level a good way of
implementing the hot shard concept? That is, is there anything better
operationally-wise from the point of view of disaster recovery / search
cluster support? Am I missing some obvious SOLR-ish solution?
Doing instead the periodical hot shard cleaning and re-posting its source
documents to the closest cold shard is less modular and hence more
complicated operationally for us.

Please let me know, if you need more details or if the problem isn't clear
enough. Thanks.

[1]
http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html
[2] https://github.com/HON-Khresmoi/hash-based-index-splitter

This is exactly how I set up my indexing, been that way since early 2010when we first started using Solr 1.4.0. Now we are on 3.5 and anupgrade to 4.1 (branch_4x) is in the works. Coming from terminologyused in our previous search product, we call the hot shard an"incremental" shard. My SolrJ indexing application takes care of allmanagement of which documents are in the incremental and which documentsare in the large shards. We call the large ones "static" shards becausedeletes and the occasional reinsert are the only updates that theyreceive, except for the daily distribute process.

We don't do anything to send queries to the hot shard "first" ... it issimply listed first in the shards parameter on what we call the brokercore. Average response time on the incremental is single digit, theother shards average at about 30 to 40 milliseconds. Median numbers(SOLR-1972 patch) are much better.


Thanks,
Shawn

Re: hot shard concept

Reply via email to