Hoss, Would you suggest using dedup for my use case; and if so, do you know of a working example I can reference?
I don't have an issue using the patched version of Solr, but I'd much rather use the GA version. -Kelly hossman wrote: > > > : Dedupe is completely the wrong word. Deduping is something else > : entirely - it is about trying not to index the same document twice. > > Dedup can also certainly be used with field collapsing -- that was one of > the initial use cases identified for the SignatureUpdateProcessorFactory > ... you can compute an 'expensive' signature when adding a document, index > it, and then FieldCollapse on that signature field. > > This gives you "query time deduplication" based on a value computed when > indexing (the canonical example is multiple urls refrenceing the "same" > content but with slightly differnet boilerplate markup. You can use a > Signature class that recognizes the boilerplate and computes an identical > signature value for each URL whose content is "the same" but still index > all of the URLs and their content as distinct documents ... so use cases > where people only "distinct" URLs work using field collapse but by default > all matching documents can still be returned and searches on text in the > boilerplate markup also still work. > > > -Hoss > > > -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html Sent from the Solr - User mailing list archive at Nabble.com.