: Dedupe is completely the wrong word. Deduping is something else : entirely - it is about trying not to index the same document twice.
Dedup can also certainly be used with field collapsing -- that was one of the initial use cases identified for the SignatureUpdateProcessorFactory ... you can compute an 'expensive' signature when adding a document, index it, and then FieldCollapse on that signature field. This gives you "query time deduplication" based on a value computed when indexing (the canonical example is multiple urls refrenceing the "same" content but with slightly differnet boilerplate markup. You can use a Signature class that recognizes the boilerplate and computes an identical signature value for each URL whose content is "the same" but still index all of the URLs and their content as distinct documents ... so use cases where people only "distinct" URLs work using field collapse but by default all matching documents can still be returned and searches on text in the boilerplate markup also still work. -Hoss