RE: Handling growth

Patrick Henry Wed, 19 Nov 2014 16:44:45 -0800

Good eye, that should have been gigabytes.  When adding to the new shard,
is the shard already part of the the collection?  What mechanism have you
found useful in accomplishing this (i.e. routing)?
On Nov 14, 2014 7:07 AM, "Toke Eskildsen" <t...@statsbiblioteket.dk> wrote:


> Patrick Henry [patricktheawesomeg...@gmail.com] wrote:
>
> >I am working with a Solr collection that is several terabytes in size over
> > several hundred millions of documents.  Each document is very rich, and
> > over the past few years we have consistently quadrupled the size our
> > collection annually.  Unfortunately, this sits on a single node with
> only a
> > few hundred megabytes of memory - so our performance is less than ideal.
>
> I assume you mean gigabytes of memory. If you have not already done so,
> switching to SSDs for storage should buy you some more time.
>
> > [Going for SolrCloud]  We are in a continuous adding documents and never
> change
> > existing ones.  Based on that, one individual recommended for me to
> > implement custom hashing and route the latest documents to the shard with
> > the least documents, and when that shard fills up add a new shard and
> index
> > on the new shard, rinse and repeat.
>
> We have quite a similar setup, where we produce a never-changing shard
> once every 8 days and add it to our cloud. One could also combine this
> setup with a single live shard, for keeping the full index constantly up to
> date. The memory overhead of running an immutable shard is smaller than a
> mutable one and easier to fine-tune. It also allows you to optimize the
> index down to a single segment, which requires a bit less processing power
> and saves memory when faceting. There's a description of our setup at
> http://sbdevel.wordpress.com/net-archive-search/
>
> From an administrative point of view, we like having complete control over
> each shard. We keep track of what goes in it and in case of schema or
> analyze chain changes, we can re-build each shard one at a time and deploy
> them continuously, instead of having to re-build everything in one go on a
> parallel setup. Of course, fundamental changes to the schema would require
> a complete re-build before deploy, so we hope to avoid that.
>
> - Toke Eskildsen
>

RE: Handling growth

Reply via email to