Hi Paul,
yes, I did and I just verified in the code. The deletedPkQuery is used
to collect all primary keys of the root entity that shall be deleted
from the index.
The deletion is done on the SOLR writer by unique ID:
writer.deleteDoc(deletedKey.get(root.pk)); //DocBuilder
delCmd.id = id.toString(); // SOLR Writer deleteDoc()
delCmd.fromPending = true;
delCmd.fromCommitted = true;
processor.processDelete(delCmd);
// RunUpdateProcessorFactory
@Override
public void processDelete(DeleteUpdateCommand cmd) throws IOException {
if( cmd.id != null ) {
updateHandler.delete(cmd); // writer.deleteDoc() uses that
}
else {
updateHandler.deleteByQuery(cmd); // I would like to use that
}
super.processDelete(cmd);
}
My problem is that the ids I have to delete are those that do not exist
in the database anymore. So, I have no means to return them by DB query.
That is why I would like to use a different field that a group of
documents has in common, and that would allow me to get hold of the
outdated documents in the index. (But I have to find out the value of
that other field by DB query.)
Cheers,
Chantal
Noble Paul നോബിള് नोब्ळ् schrieb:
did you explore the deletedPkQuery ?
On Wed, Aug 5, 2009 at 11:46 AM, Chantal
Ackermann<chantal.ackerm...@btelligent.de> wrote:
Hi all,
the database from which I populate the SOLR index is refreshed
"partially". Subsets of the data is deleted and readded for a certain
group identifier. Is it possible to do something alike in a (delta) import
of the DataImportHandler?
Example:
SOLR-Index:
groupID: 1, PK: 1, refreshDate: [before last_index_time]
groupID: 1, PK: 2, refreshDate: [before last_index_time]
groupID: 1, PK: 3, refreshDate: [before last_index_time]
Refreshed DB:
groupID: 1, PK: 1, refreshDate: [after last_index_time]
groupID: 1, PK: 5, refreshDate: [after last_index_time]
groupID: 1, PK: 30, refreshDate: [after last_index_time]
(PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)
deleteQuery="groupID:1"
(An attribute of the entity element that the DocBuilder (1.3) reads and
sends as query once, before the delta import, unchanged to the SOLR
writer to delete documents.)
After that, the delta import loads data with groupID=1 from the DB.
Could I plug into SOLR with maybe a custom processor to achieve
something in the direction of:
deleteInput="select FIELD_VALUE from TABLE where CHANGED_DATE >
'${dataimporter.last_index_time}' group by FIELD_VALUE"
deleteQuery="field:${my_entity.FIELD_VALUE}"
FIELD_VALUE is not the primary key, and the "deleteInput" query can
return multiple rows.
I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
help me. In those cases it looks like the delete is run per entity. I
want the delete to run before the (delta)import, once.
If that impression is wrong, I'll happily switch to 1.4, of course.
Cheers!
Chantal
--
Chantal Ackermann
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com