: Thats what I thought. I think I'll take the time to add something to the : DIH to prevent such things. Maybe a parameter that will cause the import : to bail out if the documents to index are less than X % of the total : number of documents already in the index.
the devils in the details though ... to do an efficient "full-import" DIH deletes hte index before it starts indexing anything, and for an arbitrary datasource with an arbitrary set of entities and sub entities and various layers of logic it seems like it would be infeasible to know how many rows you are going to get before you actually start. I think this sort of thing would pretty much have to be done post-import (w/o doing the initial delete), counting the number of docs adding, and deleting all of the ones older then that (using a deleteQuery based on a timestamp field) if the number is above a percentage threshold. Of course: none of this helps you with the possibility that you have plenty of docs, but they all contain useless data (maybe some nested entity query failed so you have no searchable text) ... logic for sanity checking an index tends to be fairly domain specific. -Hoss