solr-user,妳好 I keep forgeting to mention one thing along the discussion session. Our data is Chinese news articles and we use CJK tokenizer (i.e. 2-gram) currently. The time spent to indexing is quite slow, compared to indexing english articles. That's why I am so worrying about indexing performance on 10M Chinese docs and turn to study SolrCloud. It could also be the reason why we index 1M docs kinda slow. Frankly, we didn't delve into writing a better-performance Chinese tokenizer in past years due to some policy reason (However, we did make a plan to write one next year using MMSeg algorithm or 1-ngram+query-preprocessor).
----- Original Message ----- From: Erick Erickson To: solr-user Date: 2015-09-04, 00:07:43 Subject: Re: Re: Re: concept and choice: custom sharding or auto sharding? bq: If you switch to SolrCloud, will you still keep numShards parameter to 1 yes. Although if you want to add more replicas you might want to specify that. For 10M documents, I wouldn't be very fancy. Indexing them shouldn't take very long, and I think your time would be better spent on other things than trying to get fancy with splitshard and the like. Just create a SolrCloud cluster with as many replicas as you want and index from scratch unless it's prohibitively expensive. I can index 200M docs on my local Mac Pro in a couple of hours. Is it really worth trying to do something you'll probably never do again (i.e. SPLITSHARD)? If you really don't want to re-index _and_ you have only one shard in the master/slave setup, here's what I'd do to migrate 1> create a new SolrCloud cluster with exactly one node (i.e. the "leader"). 2> shut it down 3> copy the index from your master/slave to the new node, completely replacing the data directory 4> bring the node back up and check it. 5> use the collecitons API ADDREPLICA command to bring up as many replicas as you want, they'll pull down the index and from that point on you should be good. 5a> In this case, it'll actually do a complete replication from the leader to the followers, but thereafter incremental updates will be sent to all the nodes in the cluster rather than the older style master/slave occasional replication. Best, Erick On Thu, Sep 3, 2015 at 8:54 AM, scott chu <scott....@udngroup.com> wrote: > > solr-user,妳好 > > If you switch to SolrCloud, will you still keep numShards parameter to 1? If > you are migrating to SolrCloud and going to split that single shard into > multple shards, Wouldn't you have to reindex the data? Is it possible just > put that single shard into SolrCloud and call SPLITSHARD API to split it? > > I ask this cause I'd like to try first use master-slave architecture, like > Eric suggest that 10M is not a "vast" thing. Then later, I might migrate it > to SolrCloud possibly because I want to take advange of the Zookeeper > functionality for HA/DR. > > ----- Original Message ----- > From: Toke Eskildsen > To: solr-user > Date: 2015-09-03, 18:33:39 > Subject: Re: Re: concept and choice: custom sharding or auto sharding? > > On Thu, 2015-09-03 at 18:24 +0800, Scott Chu wrote: >> Do you use master-slave or SolrCloud for that single shard? > > Due to legacy reasons we are just using 2 fully independent Solrs, each > indexing independently, with an Apache load balancer in front for the > searches. It does give us the occasional hiccup, so we'll be switching > to SolrCloud at some point. > > - Toke Eskildsen, State and University Library, Denmark > > > > > ----- > 未在此訊息中找到病毒。 > 已透過 AVG 檢查 - www.avg.com > 版本: 2015.0.6086 / 病毒庫: 4409/10567 - 發佈日期: 09/03/15 > > > > ----- 未在此訊息中找到病毒。 已透過 AVG 檢查 - www.avg.com 版本: 2015.0.6086 / 病毒庫: 4409/10567 - 發佈日期: 09/03/15