Re: Indexing performance with solrj vs. direct lucene API

Mark Miller Wed, 28 Nov 2012 10:55:30 -0800

One difference is that Solr will call update rather than add by
default. If you are willing to ensure unique id's, you can specify
overwrite=false (I think thats the one) and it will use add instead.


- Mark

On Wed, Nov 28, 2012 at 1:02 PM, Robert Stewart <bstewart...@gmail.com> wrote:
> I have a project where I am porting existing application from direct
> Lucene API usage to using SOLR and SOLRJ client API.
>
> The problem I have is that indexing is 2-5x slower using SOLRJ+SOLR
> than using direct Lucene API.
>
> I am creating batches of documents between 200 and 500 documents per
> call to add() using SOLRJ.
>
> I tried adjusting SOLR parameters for indexing but did not make any
> difference.
>
> Documents are identical (same fields) in both cases.
>
> Nearly identical settings for tokenizing/analyzing/indexing/storing
> for each field with Lucene and SOLR.
>
> What could be the possible bottleneck in this case?   Can there
> significant over-head unpacking batch of documents in request?  Is
> there some SOLR over-head in update handler?
>
> I have tried both SOLR 3.6 and 4.0 with very similar results.
>
> When using SOLR 4.0 I have transaction logging (for NRT search) turned off.
>
> I am also NOT using a unique ID field.
>
> Performance for indexing 200 documents is around 250ms on SOLR, about
> 60ms on Lucene.
>
> I see that response time wrapping call to SOLRJ API add() method, and
> response time logged in SOLR log is nearly the same, so there is very
> little network overhead in this case.
>
> Is this typical amount of overhead to use SOLRJ+SOLR vs local Lucene API?
>
> The reason it matters in this case is application needs to rebuilt
> index once per day which currently takes about 45 minutes.  Using
> SOLRJ+SOLR it will take several hours, which is a show stopper in this
> case.
>
> Thanks.



-- 
- Mark

Re: Indexing performance with solrj vs. direct lucene API

Reply via email to