can you post the complete solrconfig.xml file and schema.xml files to review all of your settings that would impact your indexing performance.
Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar < susheel.ku...@thedigitalgroup.net> wrote: > Thanks, Svante. Your indexing speed using db seems to really fast. Can you > please provide some more detail on how you are indexing db records. Is it > thru DataImportHandler? And what database? Is that local db? We are > indexing around 70 fields (60 multivalued) but data is not populated always > in all fields. The average size of document is in 5-10 kbs. > > -----Original Message----- > From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf Of > svante karlsson > Sent: Friday, January 24, 2014 5:05 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr server requirements for 100+ million documents > > I just indexed 100 million db docs (records) with 22 fields (4 > multivalued) in 9524 sec using libcurl. > 11 million took 763 seconds so the speed drops somewhat with increasing > dbsize. > > We write 1000 docs (just an arbitrary number) in each request from two > threads. If you will be using solrcloud you will want more writer threads. > > The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with one SSD > and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual machine. > > /svante > > > > > 2014/1/24 Susheel Kumar <susheel.ku...@thedigitalgroup.net> > > > Thanks, Erick for the info. > > > > For indexing I agree the more time is consumed in data acquisition > > which in our case from Database. For indexing currently we are using > > the manual process i.e. Solr dashboard Data Import but now looking to > > automate. How do you suggest to automate the index part. Do you > > recommend to use SolrJ or should we try to automate using Curl? > > > > > > -----Original Message----- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Friday, January 24, 2014 2:59 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr server requirements for 100+ million documents > > > > Can't be done with the information you provided, and can only be > > guessed at even with more comprehensive information. > > > > Here's why: > > > > > > http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we > > -dont-have-a-definitive-answer/ > > > > Also, at a guess, your indexing speed is so slow due to data > > acquisition; I rather doubt you're being limited by raw Solr indexing. > > If you're using SolrJ, try commenting out the > > server.add() bit and running again. My guess is that your indexing > > speed will be almost unchanged, in which case it's the data > > acquisition process is where you should concentrate efforts. As a > > comparison, I can index 11M Wikipedia docs on my laptop in 45 minutes > > without any attempts at parallelization. > > > > > > Best, > > Erick > > > > On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar < > > susheel.ku...@thedigitalgroup.net> wrote: > > > Hi, > > > > > > Currently we are indexing 10 million document from database (10 db > > > data > > entities) & index size is around 8 GB on windows virtual box. Indexing > > in one shot taking 12+ hours while indexing parallel in separate cores > > & merging them together taking 4+ hours. > > > > > > We are looking to scale to 100+ million documents and looking for > > recommendation on servers requirements on below parameters for a > > Production environment. There can be 200+ users performing search same > time. > > > > > > No of physical servers (considering solr cloud) Memory requirement > > > Processor requirement (# cores) Linux as OS oppose to windows > > > > > > Thanks in advance. > > > Susheel > > > > > >