Number of rows in SQL Table (Indexed till now using Solr): 1 million Total Size of Data in the table: 4GB Total Index Size: 3.5 GB
Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing What is the best practices with respect to distributing the index? What I mean to say here is when should I distribute and what is the magic number that I can have for index size per instance? For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to index for me. So for 20 million roughly it would take 60 -70 hrs. That would be too much. What would be the best distributed architecture for my case? It will be great if people may share their best practices and experience. Thanks!! </PRE> <BR> ******************************************************************************************<BR>This message may contain confidential or proprietary information intended only for the use of the<BR>addressee(s) named above or may contain information that is legally privileged. If you are<BR>not the intended addressee, or the person responsible for delivering it to the intended addressee,<BR>you are hereby notified that reading, disseminating, distributing or copying this message is strictly<BR>prohibited. If you have received this message by mistake, please immediately notify us by<BR>replying to the message and delete the original message and any copies immediately thereafter.<BR> <BR> Thank you.~<BR> ******************************************************************************************<BR> FAFLD<BR> <PRE>