solr-user,妳好

I keep forgeting to mention one thing along the discussion session. Our data is 
Chinese news articles and we use CJK tokenizer (i.e. 2-gram) currently. The 
time spent to indexing is quite slow, compared to indexing english articles. 
That's why I am so worrying about indexing performance on 10M Chinese docs and 
turn to study SolrCloud. It could also be the reason why we index 1M docs kinda 
slow. Frankly, we didn't delve into writing a better-performance Chinese 
tokenizer in past years due to some policy reason (However, we did make a plan 
to write one next year using MMSeg  algorithm or 1-ngram+query-preprocessor). 

----- Original Message ----- 
From: Erick Erickson 
To: solr-user 
Date: 2015-09-04, 00:07:43
Subject: Re: Re: Re: concept and choice: custom sharding or auto sharding?


bq: If you switch to SolrCloud, will you still keep numShards parameter to 1

yes. Although if you want to add more replicas you might want to specify that.

For 10M documents, I wouldn't be very fancy. Indexing them shouldn't take
very long, and I think your time would be better spent on other things than
trying to get fancy with splitshard and the like. Just create a
SolrCloud cluster
with as many replicas as you want and index from scratch unless it's
prohibitively expensive.

I can index 200M docs on my local Mac Pro in a couple of hours. Is it really
worth trying to do something you'll probably never do again (i.e. SPLITSHARD)?

If you really don't want to re-index _and_ you have only one shard in the
master/slave setup, here's what I'd do to migrate
1> create a new SolrCloud cluster with exactly one node (i.e. the "leader").
2> shut it down
3> copy the index from your master/slave to the new node, completely
     replacing the data directory
4> bring the node back up and check it.
5> use the collecitons API ADDREPLICA command to bring up as many
    replicas as you want, they'll pull down the index and from that point on
    you should be good.
5a> In this case, it'll actually do a complete replication from the leader to
     the followers, but thereafter incremental updates will be sent to all

     the nodes in the cluster rather than the older style master/slave
     occasional replication.

Best,
Erick

On Thu, Sep 3, 2015 at 8:54 AM, scott chu <scott....@udngroup.com> wrote:
>
> solr-user,妳好
>
> If you switch to SolrCloud, will you still keep numShards parameter to 1? If
> you are migrating to SolrCloud and going to split that single shard into

> multple shards, Wouldn't you have to reindex the data? Is it possible just
> put that single shard into SolrCloud and call SPLITSHARD API to split it?
>
> I ask this cause I'd like to try first use master-slave architecture, like
> Eric suggest that 10M is not a "vast" thing. Then later, I might migrate it
> to SolrCloud possibly because I want to take advange of the Zookeeper
> functionality for HA/DR.
>
> ----- Original Message -----
> From: Toke Eskildsen
> To: solr-user
> Date: 2015-09-03, 18:33:39
> Subject: Re: Re: concept and choice: custom sharding or auto sharding?
>
> On Thu, 2015-09-03 at 18:24 +0800, Scott Chu wrote:
>> Do you use master-slave or SolrCloud for that single shard?
>
> Due to legacy reasons we are just using 2 fully independent Solrs, each
> indexing independently, with an Apache load balancer in front for the
> searches. It does give us the occasional hiccup, so we'll be switching
> to SolrCloud at some point.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>
>
> -----
> 未在此訊息中找到病毒。
> 已透過 AVG 檢查 - www.avg.com
> 版本: 2015.0.6086 / 病毒庫: 4409/10567 - 發佈日期: 09/03/15
>
>
>
>


-----
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6086 / 病毒庫: 4409/10567 - 發佈日期: 09/03/15




 

Reply via email to