I see that I do need to reindex my Solr index. The index consists of 20 million documents with a few hundred new documents added per minute (social media data). The documents are mostly smaller than 1KiB of data, but some may go as large as 10 KiB. All the data is text, and all indexed fields are stored.
To reindex, I am considering adding a 'last_indexed' field, and having a Python or Java application pull out N results every T seconds when sorting on "last_indexed asc". How might I determine a good values for N and T? I would like to know when the Solr index is 'overloaded', or whatever happens to Solr when it is being pushed beyond the limits of its hardware. What should I be looking at to know if Solr is over stressed? Is looking at CPU and memory good enough? Is there a way to measure I/O to the disk on which the Solr index is stored? Bear in mind that while the reindex is happening, clients will be performing searches and a few hundred documents will be written per minute. Note that the machine running Solr is an EC2 instance running on Amazon Web Services, and that the 'disk' on which the Solr index is stored in an EBS volume. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com