: Suppose I wanted to use this log approach. Does anyone have : suggestions about the best way to do it? The approach that first comes : to mind is to store the log as a separate DB table, and to maintain
it largely depends on your DB schema and the mechanisms you use to update your data. In most instances i deal with, we never actually "delete" rows from the authoritative database table that controls the ID space -- we just mark those logical objects as deleted (using an enumerated status field) and we have another field that records the lastModified time of any logical object. each batch run just looks for any logical object whose lastModified time is greater then the timestamp of the last batch run -- for each object, either reindex or delete depending on the status (in rare cases an object is modified after it is deleted, but sending a superfulous delete is almost inconsequential) we have some other more complex datamodels that we deal with in more complex ways ... but the underlying theme is the same ... know when stuff changed, know what stuff is "live" and what stuff is "dead" ... keep a "lastRunTime" and compare everything to it. -Hoss