I was wary of the potential maintenance issues and clutter involved with 
copying each entity block as suggested below (they're all large and there are 
around ten of them), so I just modifying the main full import query to be of 
the syntax:

query="select x,y,z from table where modifiedtime > 
'${dataimporter.last_index_time}'"

It appears to work fine. I suspect this isn't the way that it's *supposed* to 
be used, however may be worth mentioning in the wiki as an alternative way to 
use the DataImportHandler for situations like mine where the dataset is large 
and data is reasonably volatile and where the current delta query code isn't 
appropriate for performance raesons.

M

PS. 22 hours later, and I killed the original delta import query ;)



-----Original Message-----
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??????? ??????
Sent: Tuesday, 18 August 2009 5:11 p.m.
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - very slow delta import

delta imports are likely to be far slower that the full imports
because it makes one db call per changed row. if you can write the
"query" in such a way that it gives only the changed rows, then write
a separate entity (directly under <document>) and just run a
full-import with that entity only.

On Tue, Aug 18, 2009 at 6:32 AM, Matthew
Painter<matthew.pain...@archives.govt.nz> wrote:
> Hi,
>
> We are using Solr's DataImportHandler to populate the Solr index from a
> SQL Server database of nearly 4,000,000 rows. Whereas the population
> itself is very fast (around 1000 rows per second), the delta import is
> only processing around one row a second.
>
> Is this a known performance issue? We are using Solr 1.3.
>
> For reference, the abridged entity configuration (cuts indicated by
> '...') is below:
>
>  <entity name="id" transformer="ClobTransformer" pk="oid"
>            query="select archwaypublic.getSolrIdentifier(oid, 'agency')
> as oid, oid as realoid, archwaypublic.getSolrIdentifier(oid, 'agency')
> as id, code, name, ..."
>   deltaQuery="select oid from publicagency with (nolock) where
> modifiedtime > '${dataimporter.last_index_time}'"
>   deletedPkQuery="select archwaypublic.getSolrIdentifier(entityoid,
> 'agency') as oid from pendingsolrdeletions with (nolock) where
> entitytype='agency'">
>
> ...
> </entity>
>
> Thanks,
> Matt
>
> This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) 
> and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, 
> please do not use, disclose, copy or distribute the message or the 
> information it contains.  Instead, please notify me as soon as possible and 
> delete the e-mail, including any attachments.  Thank you.
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
This e-mail message and any attachments are CONFIDENTIAL to the addressee(s) 
and may also be LEGALLY PRIVILEGED.  If you are not the intended addressee, 
please do not use, disclose, copy or distribute the message or the information 
it contains.  Instead, please notify me as soon as possible and delete the 
e-mail, including any attachments.  Thank you.

Reply via email to