delta imports are likely to be far slower that the full imports
because it makes one db call per changed row. if you can write the
"query" in such a way that it gives only the changed rows, then write
a separate entity (directly under <document>) and just run a
full-import with that entity only.

On Tue, Aug 18, 2009 at 6:32 AM, Matthew
Painter<matthew.pain...@archives.govt.nz> wrote:
> Hi,
>
> We are using Solr's DataImportHandler to populate the Solr index from a
> SQL Server database of nearly 4,000,000 rows. Whereas the population
> itself is very fast (around 1000 rows per second), the delta import is
> only processing around one row a second.
>
> Is this a known performance issue? We are using Solr 1.3.
>
> For reference, the abridged entity configuration (cuts indicated by
> '...') is below:
>
>  <entity name="id" transformer="ClobTransformer" pk="oid"
>            query="select archwaypublic.getSolrIdentifier(oid, 'agency')
> as oid, oid as realoid, archwaypublic.getSolrIdentifier(oid, 'agency')
> as id, code, name, ..."
>   deltaQuery="select oid from publicagency with (nolock) where
> modifiedtime > '${dataimporter.last_index_time}'"
>   deletedPkQuery="select archwaypublic.getSolrIdentifier(entityoid,
> 'agency') as oid from pendingsolrdeletions with (nolock) where
> entitytype='agency'">
>
> ...
> </entity>
>
> Thanks,
> Matt
>
> This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) 
> and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, 
> please do not use, disclose, copy or distribute the message or the 
> information it contains.  Instead, please notify me as soon as possible and 
> delete the e-mail, including any attachments.  Thank you.
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to