Hi all, I am the author of the article referenced in this thread and after reading it again, I can understand where there might have been confusion and my apologies on that. I have edited the article to indicate that a deduplication component is in the works and referenced SOLR-236. The article can still be found at http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics
My only question after reading this thread is what does a user purchase? A product identified by a SKU? If that's the case then certainly indexing by SKU is the way to go and then using the field collapse (the query time deduplication) should work. Also keep in mind that in my example, I was talking about the *exact* same product located in different locations which could yield a bad user experience if they were all shown on the same search result page. In your case, each SKU is a unique (purchasable) product so collapsing by product id is nice but would not doing so degrade the user experience? If I searched for a green shirt and got S,M,L (all product ID 3) is that bad? Hope that helps some Amit On Sat, Jan 16, 2010 at 3:43 PM, David MARTIN <dmartin....@gmail.com> wrote: > I'm really interested in reading the answer to this thread as my problem is > rather the same. Maybe my main difference is the huge SKU number per > product > I may have. > > > David > > On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor <wired...@hotmail.com> > wrote: > > > > > Hoss, > > > > Would you suggest using dedup for my use case; and if so, do you know of > a > > working example I can reference? > > > > I don't have an issue using the patched version of Solr, but I'd much > > rather > > use the GA version. > > > > -Kelly > > > > > > > > hossman wrote: > > > > > > > > > : Dedupe is completely the wrong word. Deduping is something else > > > : entirely - it is about trying not to index the same document twice. > > > > > > Dedup can also certainly be used with field collapsing -- that was one > of > > > the initial use cases identified for the > SignatureUpdateProcessorFactory > > > ... you can compute an 'expensive' signature when adding a document, > > index > > > it, and then FieldCollapse on that signature field. > > > > > > This gives you "query time deduplication" based on a value computed > when > > > indexing (the canonical example is multiple urls refrenceing the "same" > > > content but with slightly differnet boilerplate markup. You can use a > > > Signature class that recognizes the boilerplate and computes an > identical > > > signature value for each URL whose content is "the same" but still > index > > > all of the URLs and their content as distinct documents ... so use > cases > > > where people only "distinct" URLs work using field collapse but by > > default > > > all matching documents can still be returned and searches on text in > the > > > boilerplate markup also still work. > > > > > > > > > -Hoss > > > > > > > > > > > > > -- > > View this message in context: > > > http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > >