Hi Shawn,

Thanks for sharing your story. Let me get it right:

How do you keep the incremental shard slim enough over time, do you
periodically redistribute the documents from it onto cold shards? If yes,
how technically you do it: the Lucene low-level way or Solr / SolrJ way?

-dmitry

On Mon, Oct 29, 2012 at 7:17 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 10/29/2012 7:55 AM, Dmitry Kan wrote:
>
>> Hi everyone,
>>
>> at this year's Berlin Buzz words conference someone (sematext?) have
>> described a technique of a hot shard. The idea is to have a slim shard to
>> maximize the update throughput during a day (when millions of docs need to
>> be posted) and make sure the indexed documents are immediately searchable.
>> In the end of the day the day's documents are moved to cold shards. If I'm
>> not mistaken, this was implemented for ElasticSearch. I'm currently
>> implementing something similar (but pretty tailored to our logical
>> sharding
>> use case) for Solr (3.x). The feature set looks roughly like this:
>>
>> 1) front end solr (query router) is aware of the hot shard: it directs the
>> incoming queries to the hot and "cold" shards.
>> 2) new incoming documents are directed first to the hot shard and then
>> periodically (like once a day or once a week) moved over to the closest in
>> time cold shard. And for that...
>> 3) hot shard index is being partitioned low level using Lucene's
>> IndexReader / IndexWriter with the implementation based on [1], [2] and
>> customized to logical (time-based) sharding.
>>
>>
>> The question is: is doing index partitioning low-level a good way of
>> implementing the hot shard concept? That is, is there anything better
>> operationally-wise from the point of view of disaster recovery / search
>> cluster support? Am I missing some obvious SOLR-ish solution?
>> Doing instead the periodical hot shard cleaning and re-posting its source
>> documents to the closest cold shard is less modular and hence more
>> complicated operationally for us.
>>
>> Please let me know, if you need more details or if the problem isn't clear
>> enough. Thanks.
>>
>> [1]
>> http://blog.foofactory.fi/**2008/01/regenerating-equally-**
>> sized-shards-from.html<http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html>
>> [2] 
>> https://github.com/HON-**Khresmoi/hash-based-index-**splitter<https://github.com/HON-Khresmoi/hash-based-index-splitter>
>>
>
> This is exactly how I set up my indexing, been that way since early 2010
> when we first started using Solr 1.4.0.  Now we are on 3.5 and an upgrade
> to 4.1 (branch_4x) is in the works.  Coming from terminology used in our
> previous search product, we call the hot shard an "incremental" shard.  My
> SolrJ indexing application takes care of all management of which documents
> are in the incremental and which documents are in the large shards.  We
> call the large ones "static" shards because deletes and the occasional
> reinsert are the only updates that they receive, except for the daily
> distribute process.
>
> We don't do anything to send queries to the hot shard "first" ... it is
> simply listed first in the shards parameter on what we call the broker
> core.  Average response time on the incremental is single digit, the other
> shards average at about 30 to 40 milliseconds. Median numbers (SOLR-1972
> patch) are much better.
>
> Thanks,
> Shawn
>
>

Reply via email to