Thanks, Erick for the info.

For indexing I agree the more time is consumed in data acquisition which in our 
case from Database.  For indexing currently we are using the manual process 
i.e. Solr dashboard Data Import but now looking to automate.  How do you 
suggest to automate the index part. Do you recommend to use SolrJ or should we 
try to automate using Curl?


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, January 24, 2014 2:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Can't be done with the information you provided, and can only be guessed at 
even with more comprehensive information.

Here's why:

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Also, at a guess, your indexing speed is so slow due to data acquisition; I 
rather doubt you're being limited by raw Solr indexing. If you're using SolrJ, 
try commenting out the
server.add() bit and running again. My guess is that your indexing speed will 
be almost unchanged, in which case it's the data acquisition process is where 
you should concentrate efforts. As a comparison, I can index 11M Wikipedia docs 
on my laptop in 45 minutes without any attempts at parallelization.


Best,
Erick

On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar 
<susheel.ku...@thedigitalgroup.net> wrote:
> Hi,
>
> Currently we are indexing 10 million document from database (10 db data 
> entities) & index size is around 8 GB on windows virtual box. Indexing in one 
> shot taking 12+ hours while indexing parallel in separate cores & merging 
> them together taking 4+ hours.
>
> We are looking to scale to 100+ million documents and looking for 
> recommendation on servers requirements on below parameters for a Production 
> environment. There can be 200+ users performing search same time.
>
> No of physical servers (considering solr cloud) Memory requirement 
> Processor requirement (# cores) Linux as OS oppose to windows
>
> Thanks in advance.
> Susheel
>

Reply via email to