Hi Shawn, Thanks for sharing your story. Let me get it right:
How do you keep the incremental shard slim enough over time, do you periodically redistribute the documents from it onto cold shards? If yes, how technically you do it: the Lucene low-level way or Solr / SolrJ way? -dmitry On Mon, Oct 29, 2012 at 7:17 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 10/29/2012 7:55 AM, Dmitry Kan wrote: > >> Hi everyone, >> >> at this year's Berlin Buzz words conference someone (sematext?) have >> described a technique of a hot shard. The idea is to have a slim shard to >> maximize the update throughput during a day (when millions of docs need to >> be posted) and make sure the indexed documents are immediately searchable. >> In the end of the day the day's documents are moved to cold shards. If I'm >> not mistaken, this was implemented for ElasticSearch. I'm currently >> implementing something similar (but pretty tailored to our logical >> sharding >> use case) for Solr (3.x). The feature set looks roughly like this: >> >> 1) front end solr (query router) is aware of the hot shard: it directs the >> incoming queries to the hot and "cold" shards. >> 2) new incoming documents are directed first to the hot shard and then >> periodically (like once a day or once a week) moved over to the closest in >> time cold shard. And for that... >> 3) hot shard index is being partitioned low level using Lucene's >> IndexReader / IndexWriter with the implementation based on [1], [2] and >> customized to logical (time-based) sharding. >> >> >> The question is: is doing index partitioning low-level a good way of >> implementing the hot shard concept? That is, is there anything better >> operationally-wise from the point of view of disaster recovery / search >> cluster support? Am I missing some obvious SOLR-ish solution? >> Doing instead the periodical hot shard cleaning and re-posting its source >> documents to the closest cold shard is less modular and hence more >> complicated operationally for us. >> >> Please let me know, if you need more details or if the problem isn't clear >> enough. Thanks. >> >> [1] >> http://blog.foofactory.fi/**2008/01/regenerating-equally-** >> sized-shards-from.html<http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html> >> [2] >> https://github.com/HON-**Khresmoi/hash-based-index-**splitter<https://github.com/HON-Khresmoi/hash-based-index-splitter> >> > > This is exactly how I set up my indexing, been that way since early 2010 > when we first started using Solr 1.4.0. Now we are on 3.5 and an upgrade > to 4.1 (branch_4x) is in the works. Coming from terminology used in our > previous search product, we call the hot shard an "incremental" shard. My > SolrJ indexing application takes care of all management of which documents > are in the incremental and which documents are in the large shards. We > call the large ones "static" shards because deletes and the occasional > reinsert are the only updates that they receive, except for the daily > distribute process. > > We don't do anything to send queries to the hot shard "first" ... it is > simply listed first in the shards parameter on what we call the broker > core. Average response time on the incremental is single digit, the other > shards average at about 30 to 40 milliseconds. Median numbers (SOLR-1972 > patch) are much better. > > Thanks, > Shawn > >