Yes, shard splitting will only help in managing large clusters and to improve query performance. In my case as index size is fully grown (no capacity to hold in the existing shards) across the collection adding a new shard will help and for which I have to re index.
> On 01-Aug-2015, at 6:34 pm, Upayavira <u...@odoko.co.uk> wrote: > > Erm, that doesn't seem to make sense. Seems like you are talking about > *merging* shards. > > Say you had two shards, 3m docs each: > > shard1: 3m docs > shard2: 3m docs > > If you split shard1, you would have: > > shard1_0: 1.5m docs > shard1_1: 1.5m docs > shard2: 3m docs > > You could, of course, then split shard2. You could also split shard1 > into three parts instead, if you preferred: > > shard1_0: 1m docs > shard1_1: 1m docs > shard1_2: 1m docs > shard2: 3m docs > > Upayavira > >> On Sun, Aug 2, 2015, at 12:25 AM, Nagasharath wrote: >> If my current shard is holding 3 million documents will the new subshard >> after splitting also be able to hold 3 million documents? >> If that is the case After shard splitting the sub shards should hold 6 >> million documents if a shard is split in to two. Am I right? >> >>> On 01-Aug-2015, at 5:43 pm, Upayavira <u...@odoko.co.uk> wrote: >>> >>> >>> >>>> On Sat, Aug 1, 2015, at 11:29 PM, naga sharathrayapati wrote: >>>> I am using solrj to index documents >>>> >>>> i agree with you regarding the index update but i should not see any >>>> deleted documents as it is a fresh index. Can we actually identify what >>>> are >>>> those deleted documents? >>> >>> If you post doc 1234, then you post doc 1234 a second time, you will see >>> a deletion in your index. If you don't want deletions to show in your >>> index, be sure NEVER to update a document, only add new ones with >>> absolutely distinct document IDs. >>> >>> You cannot see (via Solr) which docs are deleted. You could, I suppose, >>> introspect the Lucene index, but that would most definitely be an expert >>> task. >>> >>>> if there is no option of adding shards to existing collection i do not >>>> like >>>> the idea of re indexing the whole data (worth hours) and we have gone >>>> with >>>> good number of shards but there is a rapid increase of size in data over >>>> the past few days, do you think is it worth logging a ticket? >>> >>> You can split a shard. See the collections API: >>> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 >>> >>> What would you want to log a ticket for? I'm not sure that there's >>> anything that would require that. >>> >>> Upayavira