Re: Encountering a roadblock with my Solr schema design...use dedupe?

Amit Nithian Fri, 12 Feb 2010 11:41:27 -0800

Hi all,

I am the author of the article referenced in this thread and after reading
it again, I can understand where there might have been confusion and my
apologies on that. I have edited the article to indicate that a
deduplication component is in the works and referenced SOLR-236. The article
can still be found at
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Solr-and-RDBMS-design-basics


My only question after reading this thread is what does a user purchase? A
product identified by a SKU? If that's the case then certainly indexing by
SKU is the way to go and then using the field collapse (the query time
deduplication) should work.

Also keep in mind that in my example, I was talking about the *exact* same
product located in different locations which could yield a bad user
experience if they were all shown on the same search result page. In your
case, each SKU is a unique (purchasable) product so collapsing by product id
is nice but would not doing so degrade the user experience? If I searched
for a green shirt and got S,M,L (all product ID 3) is that bad?

Hope that helps some
Amit

On Sat, Jan 16, 2010 at 3:43 PM, David MARTIN <dmartin....@gmail.com> wrote:

> I'm really interested in reading the answer to this thread as my problem is
> rather the same. Maybe my main difference is the huge SKU number per
> product
> I may have.
>
>
> David
>
> On Thu, Jan 14, 2010 at 2:35 AM, Kelly Taylor <wired...@hotmail.com>
> wrote:
>
> >
> > Hoss,
> >
> > Would you suggest using dedup for my use case; and if so, do you know of
> a
> > working example I can reference?
> >
> > I don't have an issue using the patched version of Solr, but I'd much
> > rather
> > use the GA version.
> >
> > -Kelly
> >
> >
> >
> > hossman wrote:
> > >
> > >
> > > : Dedupe is completely the wrong word. Deduping is something else
> > > : entirely - it is about trying not to index the same document twice.
> > >
> > > Dedup can also certainly be used with field collapsing -- that was one
> of
> > > the initial use cases identified for the
> SignatureUpdateProcessorFactory
> > > ... you can compute an 'expensive' signature when adding a document,
> > index
> > > it, and then FieldCollapse on that signature field.
> > >
> > > This gives you "query time deduplication" based on a value computed
> when
> > > indexing (the canonical example is multiple urls refrenceing the "same"
> > > content but with slightly differnet boilerplate markup.  You can use a
> > > Signature class that recognizes the boilerplate and computes an
> identical
> > > signature value for each URL whose content is "the same" but still
> index
> > > all of the URLs and their content as distinct documents ... so use
> cases
> > > where people only "distinct" URLs work using field collapse but by
> > default
> > > all matching documents can still be returned and searches on text in
> the
> > > boilerplate markup also still work.
> > >
> > >
> > > -Hoss
> > >
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>

Re: Encountering a roadblock with my Solr schema design...use dedupe?

Reply via email to