Number of rows in SQL Table (Indexed till now using Solr): 1 million
Total Size of Data in the table: 4GB
Total Index Size: 3.5 GB

Total Number of Rows that I have to index: 20 Million (approximately 100 GB 
Data) and growing

What is the best practices with respect to distributing the index? What I mean 
to say here is when should I distribute and what is the magic number that I can 
have for index size per instance?

For 1 million itself Solr instance running on a VM is taking roughly 2.5 hrs to 
index for me. So for 20 million roughly it would take 60 -70 hrs. That would be 
too much.

What would be the best distributed architecture for my case? It will be great 
if people may share their best practices and experience.

Thanks!!
</PRE>
<BR>
******************************************************************************************<BR>This
 message may contain confidential or proprietary information intended only for 
the use of the<BR>addressee(s) named above or may contain information that is 
legally privileged. If you are<BR>not the intended addressee, or the person 
responsible for delivering it to the intended addressee,<BR>you are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictly<BR>prohibited. If you have received this message by mistake, please 
immediately notify us by<BR>replying to the message and delete the original 
message and any copies immediately thereafter.<BR>
<BR>
Thank you.~<BR>
******************************************************************************************<BR>
FAFLD<BR>
<PRE>

Reply via email to