Michael,

The SolrEntityProcessor looks very intriguing, but it won't work with the released 1.4 version. If that's OK with you and it looks like it'll do what you want, feel free to ignore the rest of this.

I'm also using MySQL as an import source for Solr. I was unable to use the last_index_time because my database doesn't have a field I can match against it. I believe you can use something similar to the method that I came up with. The point of this post is to show you how to inject values from outside Solr into a DIH request rather than have Solr provide the milestone that indicates new content.

Here's a simplified version of my URL template and entity configuration in data-config.xml. The did field in my database is an autoincrement BIGINT serving as my private key, but something similar could likely be cooked up with timestamps too:

http://HOST:PORT/solr/CORE/dataimport?command=COMMAND&dataTable=DATATABLE&minDid=MINDID&maxDid=MAXDID

----

<entity name="dataTable" pk="did"
query="SELECT * FROM ${dataimporter.request.dataTable} WHERE did &gt; ${dataimporter.request.minDid} AND did &lt;= ${dataimporter.request.maxDid}"
deltaQuery="SELECT MAX(did) FROM ${dataimporter.request.dataTable}"
deltaImportQuery="SELECT * FROM ${dataimporter.request.dataTable} WHERE did &gt; ${dataimporter.request.minDid} AND did &lt;= ${dataimporter.request.maxDid}">
</entity>

----

If I am doing a full-import, I set minDid to zero and maxDid to the highest value in the database. For a delta-import, minDid comes from the maxDid value stored after the last successful import.

The deltaQuery is required, but in my case, is a throw-away query that just tells Solr the delta-import needs to be run. My query and deltaImportQuery are identical, though yours may not be.

Good luck, no matter how you choose to approach this.

Shawn


On 4/18/2010 9:02 PM, Michael Tibben wrote:
I don't really understand how this will help. Can you elaborate ?

Do you mean that the last_index_time can be imported from somewhere outside solr? But I need to be able to *set* what last_index_time is stored in dataimport.properties, not get properties from somewhere else



On 18/04/10 10:02, Lance Norskog wrote:
The SolrEntityProcessor allows you to query a Solr instance and use
the results as DIH properties. You would have to create your own
regular query to do the delta-import instead of using the delta-import
feature.

Reply via email to