Re: Re: concept and choice: custom sharding or auto sharding?

scott chu Wed, 02 Sep 2015 08:22:26 -0700

 
solr-user，妳好

Do you mean I only have to put 10M documents in one index and copy it to many 
slaves in a classic Solr master-slave architecture to provide querying serivce 
on internet, and it won't have obvious downgrade of query performance? But I 
did have add 1M document into one index on master and provide 2 slaves to serve 
querying service on internet, the query performance is kinda sad. Why do you 
say: "at 10M documents there's rarely a need to shard at all?" Do I provide too 
few slaves? What amount of documents is suitable for a need for shard in 
SolrCloud?


----- Original Message ----- 
From: Erick Erickson 
To: solr-user 
Date: 2015-09-02, 23:00:29
Subject: Re: concept and choice: custom sharding or auto sharding?


Frankly, at 10M documents there's rarely a need to shard at all.
Why do you think you need to? This seems like adding
complexity for no good reason. Sharding should only really
be used when you have too many documents to fit on a single
shard as it adds some overhead, restricts some
possibilities (cross-core join for instance, a couple of
grouping options don't work in distributed mode etc.).

You can still run SolrCloud and have it manage multiple
_replicas_ of a single shard for HA/DR.

So this seems like an XY problem, you're asking specific
questions about shard routing because you think it'll
solve some problem without telling us what the problem
is.

Best,
Erick

On Wed, Sep 2, 2015 at 7:47 AM, scott chu <scott....@udngroup.com> wrote:
> I post a question on Stackoverflow 
> http://stackoverflow.com/questions/32343813/custom-sharding-or-auto-sharding-on-solrcloud:
> However, since this is a mail-list, I repost the question below to request 
> for suggestion and more subtle concept of SolrCloud's behavior on document 
> routing.
> I want to establish a SolrCloud clsuter for over 10 millions of news 
> articles. After reading this article in Apache Solr Refernce guide: Shards 
> and Indexing Data in SolrCloud, I have a plan as follows:
> Add prefix ED2001! to document ID where ED means some newspaper source and 
> 2001 is the year part in published date of news article, i.e. I want to put 
> all news articles of specific news paper source published in specific year to 
> a shard.
> Create collection with router.name set to compositeID.
> Add documents?
> Query Collection?
> Practically, I got some questions:
> How to add doucments based on this plan? Do I have to specify special 
> parameters when updating the collection/core?
> Is this called "custom sharding"? If not, what is "custom sharding"?
> Is auto sharding a better choice for my case since there's a shard-splitting 
> feature for auto sharding when the shard is too big?
> Can I query without _router_ parameter?
> EDIT @ 2015/9/2:
> This is how I think SolrCloud will do: "The amount of news articles of 
> specific newspaper source of specific year tends to be around a fix number, 
> e.g. Every year ED has around 80,000 articles, so each shard's size won't 
> increase dramatically. For the next year's news articles of ED, I only have 
> to add prefix 'ED2016!' to document ID, SolrCloud will create a new shard for 
> me (which contains all ED2016 articles), and later the Leader will spread the 
> replica of this new shard to other nodes (per replica per node other than 
> leader?)". Am I right? If yes, it seems no need for shard-splitting.


-----
未在此訊息中找到病毒。
已透過 AVG 檢查 - www.avg.com
版本: 2015.0.6086 / 病毒庫: 4409/10562 - 發佈日期: 09/02/15

Re: Re: concept and choice: custom sharding or auto sharding?

Reply via email to