Hi Robert, SolrJ is sending data over a socket so that might explain some of the lag. Are is your SolrJ app and the Solr server running on the same physical machine?
I thought Mark M's idea sounded good. One other idea: When initializing SolrJ's connection for normal searching you probably use HttpSolrServer. But when doing massive updates, you might consider using ConcurrentUpdateSolrServer instead. -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 On Wed, Nov 28, 2012 at 10:02 AM, Robert Stewart <bstewart...@gmail.com>wrote: > I have a project where I am porting existing application from direct > Lucene API usage to using SOLR and SOLRJ client API. > > The problem I have is that indexing is 2-5x slower using SOLRJ+SOLR > than using direct Lucene API. > > I am creating batches of documents between 200 and 500 documents per > call to add() using SOLRJ. > > I tried adjusting SOLR parameters for indexing but did not make any > difference. > > Documents are identical (same fields) in both cases. > > Nearly identical settings for tokenizing/analyzing/indexing/storing > for each field with Lucene and SOLR. > > What could be the possible bottleneck in this case? Can there > significant over-head unpacking batch of documents in request? Is > there some SOLR over-head in update handler? > > I have tried both SOLR 3.6 and 4.0 with very similar results. > > When using SOLR 4.0 I have transaction logging (for NRT search) turned off. > > I am also NOT using a unique ID field. > > Performance for indexing 200 documents is around 250ms on SOLR, about > 60ms on Lucene. > > I see that response time wrapping call to SOLRJ API add() method, and > response time logged in SOLR log is nearly the same, so there is very > little network overhead in this case. > > Is this typical amount of overhead to use SOLRJ+SOLR vs local Lucene API? > > The reason it matters in this case is application needs to rebuilt > index once per day which currently takes about 45 minutes. Using > SOLRJ+SOLR it will take several hours, which is a show stopper in this > case. > > Thanks. >