On 9/4/2011 12:16 PM, Kissue Kissue wrote:
I was reading about DIH on the this Wiki link :
http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config
The following was said about entity primary key: "is *optional* and only
needed when using delta-imports". Does this mean that the primary key is
mandatory for delta imports? I am asking because i am going to be importing
from a view with no primary key.

I believe what it means is that you have to specify a field to be the primary key, and that it must exist in all three queries that you defined - query, deltaQuery and deltaImportQuery. In my case, query and deltaImportQuery are identical, and deltaQuery is "SELECT 1 AS did". The only thing this query does is tell the DIH that there is something to do for a delta-import, which it then uses deltaImportQuery to do. I keep track of which documents are new outside of Solr and pass values for the query in via the dataimport URL.

As you might surmise, did is the primary key in my dataimport config file. I couldn't say what would happen if your query results have duplicate values in the primary key field. In my case, did actually is is the primary key in the database, but I don't think that's required. I use different fields for primary key and uniqueKey. This allows us a little extra flexibility in the index.

Hopefully you do still have a field that is unique (even if it's not a primary key) that you can use as the primary key in your config file. It's a good idea to have such a thing available to serve as the uniqueKey in schema.xml, for automatic overwrites (delete and reinsert) of documents that change.

Thanks,
Shawn

Reply via email to