Hi,

Recently we had noticed that one of the largest collection (shards = 6 ; 
replication factor =3) which holds up to 1TB of data & nearly 3.2 billion of 
docs is taking longer time to index than it used to before. To see the indexing 
time difference, we created another collection using largest collection configs 
(schema.xml and solrconfig.xml files) and loaded the collection with up to 100 
million docs which is ~60G of data. Later we tried to index exactly same 25 
million docs data file on these two collections which clearly showed timing 
difference. BTW, we are running on Solr 7.7.1 version.

Original largest collection has completed indexing in ~100mins
Newly created collection (which has 100 million docs) has completed in ~70mins

This indexing time difference is due to the amount of data that each collection 
hold? If yes, how to increase indexing performance on larger data collection? 
adding more shards can help here?

Also, is there any threshold numbers for a single shard can hold in terms of 
size and number of docs before adding a new shard?

Any answers would really help!!


Thanks & Regards,
Vinodh

DTCC DISCLAIMER: This email and any files transmitted with it are confidential 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify us 
immediately and delete the email and any attachments from your system. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email.

Reply via email to