Re: SolrCloud: Adding a very large collection to a pre-existing cluster

Erick Erickson Tue, 21 Jun 2016 23:12:18 -0700

One other option is to index "somewhere else", then use the collections API
to "addreplica"s on your prod cluster. Then perhaps delete replica on the
nodes that are "somewhere else".

Best,
Erick
On Jun 21, 2016 4:27 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote:

There’s no official way of doing #1, but there are some less official ways:
1. The Backup/Restore API provides some hooks into loading pre-existing
data dirs into an existing collection. Lots of caveats.
2. If you don’t have many shards, there’s always rsync/reload.
3. There are some third-party tools that help with this kind of thing:
a. https://github.com/whitepages/solrcloud_manager (primarily a command
line tool)
b. https://github.com/bloomreach/solrcloud-haft (primarily a library)

For #2, absolutely. Spin up some new nodes in your cluster, and then use
the “createNodeSet” parameter when creating the new collection to restrict
to those new nodes:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api1

On 6/21/16, 12:33 PM, "Kelly, Frank" <frank.ke...@here.com> wrote:

>We have about 200 million documents (~70 GB) we need to keep indexed
across 3 collections.
>
>Currently 2 of the 3 collections are already indexed (roughly 90m docs).
>
>We'd like to create the remaining collection (about 100 m documents) but
minimizing the performance impact on the existing collections on Solr
servers during that Time.
>
>Is there some way to do this either by
>
>  1.  Creating the collection in another environment and shipping the
(underlying Lucene) index files
>  2.  Creating the collection on (dedicated) new machines that we add to
the SolrCloud cluster?
>
>Thoughts, comments or suggestions appreciated,
>
>Best
>
>-Frank Kelly
>

Re: SolrCloud: Adding a very large collection to a pre-existing cluster

Reply via email to