: That said, writing your own update request handler : that detected this case isn't very difficult, : extend UpdateRequestProcessorFactory/UpdateRequestProcessor : and use it as a plugin.
i can't find the thread at the moment, but the general issue that has caused people headaches with this type of approach in the past has been that the performance of doing a query on every update (to see if the doc is already in the index) can slow things down quite a bit -- in your usecase it may not be a significant bottleneck, but that's the general issue that has come up i nthe past. If you look at systems (like nutch) that do large scale crawling, they treat the crawl phrase independent from the indexing phase precisesly for reasons like this -- so the crawler can dedup the documents (by unique URL) and eliminate duplication before ever even adding them to the index. : >> > I wonder why simple the overwrite parameter doesn't work here. ... : >> > 2. overwrite=false and uniqueID exists then newer doc must be skipped : >> since : >> > old exists. that is not what overwrite=false does (or was ever designed to do). overwrite=false is a way to tell Solr that you are already certain that the documents being added do not exist in the index, therefore Solr can save time by not attempting to overwrite an existing document. It is intended for situations where you are bulk loading documents, ie: doing an initial build of an index from a system of record (ie: a single pass over adatabase that uses the same unique key) or importing documents from a new system of record with a completley differnet id space. -Hoss