Michael, Interesting, I'm still unfamiliar with limitations (if any) of aliasing. Does architecture utilize realtime get? On Nov 18, 2014 11:49 AM, "Michael Della Bitta" < michael.della.bi...@appinions.com> wrote:
> We're achieving some success by treating aliases as collections and > collections as shards. > > More specifically, there's a read alias that spans all the collections, > and a write alias that points at the 'latest' collection. Every week, I > create a new collection, add it to the read alias, and point the write > alias at it. > > Michael > > On 11/14/14 07:06, Toke Eskildsen wrote: > >> Patrick Henry [patricktheawesomeg...@gmail.com] wrote: >> >> I am working with a Solr collection that is several terabytes in size >>> over >>> several hundred millions of documents. Each document is very rich, and >>> over the past few years we have consistently quadrupled the size our >>> collection annually. Unfortunately, this sits on a single node with >>> only a >>> few hundred megabytes of memory - so our performance is less than ideal. >>> >> I assume you mean gigabytes of memory. If you have not already done so, >> switching to SSDs for storage should buy you some more time. >> >> [Going for SolrCloud] We are in a continuous adding documents and never >>> change >>> existing ones. Based on that, one individual recommended for me to >>> implement custom hashing and route the latest documents to the shard with >>> the least documents, and when that shard fills up add a new shard and >>> index >>> on the new shard, rinse and repeat. >>> >> We have quite a similar setup, where we produce a never-changing shard >> once every 8 days and add it to our cloud. One could also combine this >> setup with a single live shard, for keeping the full index constantly up to >> date. The memory overhead of running an immutable shard is smaller than a >> mutable one and easier to fine-tune. It also allows you to optimize the >> index down to a single segment, which requires a bit less processing power >> and saves memory when faceting. There's a description of our setup at >> http://sbdevel.wordpress.com/net-archive-search/ >> >> From an administrative point of view, we like having complete control >> over each shard. We keep track of what goes in it and in case of schema or >> analyze chain changes, we can re-build each shard one at a time and deploy >> them continuously, instead of having to re-build everything in one go on a >> parallel setup. Of course, fundamental changes to the schema would require >> a complete re-build before deploy, so we hope to avoid that. >> >> - Toke Eskildsen >> > >