Hi, Recently we had noticed that one of the largest collection (shards = 6 ; replication factor =3) which holds up to 1TB of data & nearly 3.2 billion of docs is taking longer time to index than it used to before. To see the indexing time difference, we created another collection using largest collection configs (schema.xml and solrconfig.xml files) and loaded the collection with up to 100 million docs which is ~60G of data. Later we tried to index exactly same 25 million docs data file on these two collections which clearly showed timing difference. BTW, we are running on Solr 7.7.1 version.
Original largest collection has completed indexing in ~100mins Newly created collection (which has 100 million docs) has completed in ~70mins This indexing time difference is due to the amount of data that each collection hold? If yes, how to increase indexing performance on larger data collection? adding more shards can help here? Also, is there any threshold numbers for a single shard can hold in terms of size and number of docs before adding a new shard? Any answers would really help!! Thanks & Regards, Vinodh DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.