One difference is that Solr will call update rather than add by default. If you are willing to ensure unique id's, you can specify overwrite=false (I think thats the one) and it will use add instead.
- Mark On Wed, Nov 28, 2012 at 1:02 PM, Robert Stewart <bstewart...@gmail.com> wrote: > I have a project where I am porting existing application from direct > Lucene API usage to using SOLR and SOLRJ client API. > > The problem I have is that indexing is 2-5x slower using SOLRJ+SOLR > than using direct Lucene API. > > I am creating batches of documents between 200 and 500 documents per > call to add() using SOLRJ. > > I tried adjusting SOLR parameters for indexing but did not make any > difference. > > Documents are identical (same fields) in both cases. > > Nearly identical settings for tokenizing/analyzing/indexing/storing > for each field with Lucene and SOLR. > > What could be the possible bottleneck in this case? Can there > significant over-head unpacking batch of documents in request? Is > there some SOLR over-head in update handler? > > I have tried both SOLR 3.6 and 4.0 with very similar results. > > When using SOLR 4.0 I have transaction logging (for NRT search) turned off. > > I am also NOT using a unique ID field. > > Performance for indexing 200 documents is around 250ms on SOLR, about > 60ms on Lucene. > > I see that response time wrapping call to SOLRJ API add() method, and > response time logged in SOLR log is nearly the same, so there is very > little network overhead in this case. > > Is this typical amount of overhead to use SOLRJ+SOLR vs local Lucene API? > > The reason it matters in this case is application needs to rebuilt > index once per day which currently takes about 45 minutes. Using > SOLRJ+SOLR it will take several hours, which is a show stopper in this > case. > > Thanks. -- - Mark