Re: Handling growth

Michael Della Bitta Thu, 20 Nov 2014 07:36:12 -0800

The collections we index under this multi-collection alias does not usereal time get, no. We have other collections behind single-collectionaliases where get calls seem to work, but I'm not clear whether thecalls are real time. Seems like it would be easy for you to test, butjust be aware that there's multiple things you'd have to prove:


1. Whether get calls are real-time

2. Whether they work against multi-collection aliases as opposed tosingle collection aliases.

Also be aware that there were some issues with alias visibility andsolrj clients prior to ~4.5 or so, and I believe there were early issueswith writing to aliases prior to then as well. I'd suggest using arelatively modern release.


Michael

On 11/19/14 19:56, Patrick Henry wrote:

Michael,

Interesting, I'm still unfamiliar with limitations (if any) of aliasing.
Does architecture utilize realtime get?
On Nov 18, 2014 11:49 AM, "Michael Della Bitta" <
michael.della.bi...@appinions.com> wrote:

We're achieving some success by treating aliases as collections and
collections as shards.

More specifically, there's a read alias that spans all the collections,
and a write alias that points at the 'latest' collection. Every week, I
create a new collection, add it to the read alias, and point the write
alias at it.

Michael

On 11/14/14 07:06, Toke Eskildsen wrote:

Patrick Henry [patricktheawesomeg...@gmail.com] wrote:

  I am working with a Solr collection that is several terabytes in size

over
several hundred millions of documents.  Each document is very rich, and
over the past few years we have consistently quadrupled the size our
collection annually.  Unfortunately, this sits on a single node with
only a
few hundred megabytes of memory - so our performance is less than ideal.

I assume you mean gigabytes of memory. If you have not already done so,
switching to SSDs for storage should buy you some more time.

  [Going for SolrCloud]  We are in a continuous adding documents and never

change
existing ones.  Based on that, one individual recommended for me to
implement custom hashing and route the latest documents to the shard with
the least documents, and when that shard fills up add a new shard and
index
on the new shard, rinse and repeat.

We have quite a similar setup, where we produce a never-changing shard
once every 8 days and add it to our cloud. One could also combine this
setup with a single live shard, for keeping the full index constantly up to
date. The memory overhead of running an immutable shard is smaller than a
mutable one and easier to fine-tune. It also allows you to optimize the
index down to a single segment, which requires a bit less processing power
and saves memory when faceting. There's a description of our setup at
http://sbdevel.wordpress.com/net-archive-search/

  From an administrative point of view, we like having complete control
over each shard. We keep track of what goes in it and in case of schema or
analyze chain changes, we can re-build each shard one at a time and deploy
them continuously, instead of having to re-build everything in one go on a
parallel setup. Of course, fundamental changes to the schema would require
a complete re-build before deploy, so we hope to avoid that.

- Toke Eskildsen

Re: Handling growth

Reply via email to