can you post the complete solrconfig.xml file and schema.xml files to
review all of your settings that would impact your indexing performance.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar <
susheel.ku...@thedigitalgroup.net> wrote:

> Thanks, Svante. Your indexing speed using db seems to really fast. Can you
> please provide some more detail on how you are indexing db records. Is it
> thru DataImportHandler? And what database? Is that local db?  We are
> indexing around 70 fields (60 multivalued) but data is not populated always
> in all fields. The average size of document is in 5-10 kbs.
>
> -----Original Message-----
> From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf Of
> svante karlsson
> Sent: Friday, January 24, 2014 5:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr server requirements for 100+ million documents
>
> I just indexed 100 million db docs (records) with 22 fields (4
> multivalued) in 9524 sec using libcurl.
> 11 million took 763 seconds so the speed drops somewhat with increasing
> dbsize.
>
> We write 1000 docs (just an arbitrary number) in each request from two
> threads. If you will be using solrcloud you will want more writer threads.
>
> The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with one SSD
> and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual machine.
>
> /svante
>
>
>
>
> 2014/1/24 Susheel Kumar <susheel.ku...@thedigitalgroup.net>
>
> > Thanks, Erick for the info.
> >
> > For indexing I agree the more time is consumed in data acquisition
> > which in our case from Database.  For indexing currently we are using
> > the manual process i.e. Solr dashboard Data Import but now looking to
> > automate.  How do you suggest to automate the index part. Do you
> > recommend to use SolrJ or should we try to automate using Curl?
> >
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Friday, January 24, 2014 2:59 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr server requirements for 100+ million documents
> >
> > Can't be done with the information you provided, and can only be
> > guessed at even with more comprehensive information.
> >
> > Here's why:
> >
> >
> > http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we
> > -dont-have-a-definitive-answer/
> >
> > Also, at a guess, your indexing speed is so slow due to data
> > acquisition; I rather doubt you're being limited by raw Solr indexing.
> > If you're using SolrJ, try commenting out the
> > server.add() bit and running again. My guess is that your indexing
> > speed will be almost unchanged, in which case it's the data
> > acquisition process is where you should concentrate efforts. As a
> > comparison, I can index 11M Wikipedia docs on my laptop in 45 minutes
> > without any attempts at parallelization.
> >
> >
> > Best,
> > Erick
> >
> > On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar <
> > susheel.ku...@thedigitalgroup.net> wrote:
> > > Hi,
> > >
> > > Currently we are indexing 10 million document from database (10 db
> > > data
> > entities) & index size is around 8 GB on windows virtual box. Indexing
> > in one shot taking 12+ hours while indexing parallel in separate cores
> > & merging them together taking 4+ hours.
> > >
> > > We are looking to scale to 100+ million documents and looking for
> > recommendation on servers requirements on below parameters for a
> > Production environment. There can be 200+ users performing search same
> time.
> > >
> > > No of physical servers (considering solr cloud) Memory requirement
> > > Processor requirement (# cores) Linux as OS oppose to windows
> > >
> > > Thanks in advance.
> > > Susheel
> > >
> >
>

Reply via email to