By removing both the stopwordfilterFactory and SynonymfilterFactory, the indexing time per doc has reduced drastically to 2 to 5 ms per doc. Next I will try out StreamingServer. Any distinct advantages of using StreamingServer
Thanks, Kalyan Manepalli -----Original Message----- From: Manepalli, Kalyan [mailto:kalyan.manepa...@orbitz.com] Sent: Wednesday, July 01, 2009 3:41 PM To: solr-user@lucene.apache.org Subject: RE: Tips on speeding the indexing process Regarding the analysis, we do couple of things during indexing. First is use a dictionary text file for stopword filter factory. Secondly we use synonym text file for SynonymfilterFactory. I will test the indexing speed by temporarily removing both of them. Thanks, Kalyan Manepalli -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, July 01, 2009 3:31 PM To: solr-user@lucene.apache.org Subject: Re: Tips on speeding the indexing process Kalyan, 150/200 ms per 1 document to index seems too long, but it really depends on how much analysis is going on and size of docs. 32 threads seems too high, unless your Solr server really has 32 cores. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: "Manepalli, Kalyan" <kalyan.manepa...@orbitz.com> > To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> > Sent: Wednesday, July 1, 2009 4:21:30 PM > Subject: RE: Tips on speeding the indexing process > > Here are some specs for my indexer. > Indexer is custom Java code that reads data from DB and other services builds > the solrDocument and submits it using SolrJ via Http. Indexer is doing a bit > of > work for building the documents. The overhead is around 30 to 40ms. For every > document addition solr takes around 150 to 200 ms. > I tried the bulk addition approach with 1000 documents at time. But found out > that solr just take the same amount of time. I commit and optimize only once > at > the end. I currently use 32 threads in production environment to get that > speed > of 2hrs. > > > Thanks, > Kalyan Manepalli > > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Wednesday, July 01, 2009 3:11 PM > To: solr-user@lucene.apache.org > Subject: Re: Tips on speeding the indexing process > > > Kalyan, > > Using SolrJ? Use the StreamingServer, it's nice and fast. > Alternatively, start multiple indexing threads (match the number of Solr > server > CPU cores) and index from there. > Send batches of docs, not one by one. > Don't commit or optimize until you are done. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: "Manepalli, Kalyan" > > To: "solr-user@lucene.apache.org" > > Sent: Wednesday, July 1, 2009 3:42:45 PM > > Subject: Tips on speeding the indexing process > > > > Hi, > > I have a very generic question regarding indexing. In my > > current > > app, I have about 450,000 docs each doc size around 2k. The total indexing > time > > is around 2hrs. > > Now due to multi language support, the number of documents is increasing to > 2.0 > > million. The total indexing time is exceeding 6 hrs. > > I wanted to know if there are any general tips to speedup the indexing > process. > > > > Thanks, > > Kalyan Manepalli