: Thats what I thought. I think I'll take the time to add something to the 
: DIH to prevent such things. Maybe a parameter that will cause the import 
: to bail out if the documents to index are less than X % of the total 
: number of documents already in the index.

the devils in the details though ... to do an efficient "full-import" DIH 
deletes hte index before it starts indexing anything, and for an 
arbitrary datasource with an arbitrary set of entities and sub entities 
and various layers of logic it seems like it would be infeasible to know 
how many rows you are going to get before you actually start.

I think this sort of thing would pretty much have to be done post-import 
(w/o doing the initial delete), counting the number of docs adding, and 
deleting all of the ones older then that (using a deleteQuery based on a 
timestamp field) if the number is above a percentage threshold.

Of course: none of this helps you with the possibility that you have 
plenty of docs, but they all contain useless data (maybe some nested 
entity query failed so you have no searchable text) ... logic for sanity 
checking an index tends to be fairly domain specific.



-Hoss

Reply via email to