Re: how to remove duplicate data while facet?

Otis Gospodnetic Tue, 11 Dec 2012 19:04:09 -0800

Hi,

Sounds like you don't need to reindex. You need to find duplicates and
delete them.


Otis
--
SOLR Performance Monitoring - http://sematext.com/spm
On Dec 11, 2012 12:42 PM, "蒋明原" <[email protected]> wrote:

> Thank you,first of all,
> Yes,no same unique key means no this trouble.
>  But for me now,I can't reindex my data,it's too big.And it,s in production
> environment .
> So,any friends have solutions?
>
> Thank you .
>
> On Wednesday, December 12, 2012, Pawel wrote:
>
> > I think that solution is quite obvous. Be sure that you don't have items
> > with the same unique key in many shards :)
> >
> > On Tue, Dec 11, 2012 at 5:24 PM, 蒋明原 <[email protected]
> <javascript:;>>
> > wrote:
> >
> > > hi,all,
> > >
> > > I'm doing a distribute facet query,and there duplicate data among the
> > > distribute cluster.
> > > for example:
> > >
> > > server A hold documents:
> > >
> > > Doc1: uniqueKey=1 userid=a
> > > Doc2: uniqueKey=2 userid=b
> > > Doc3: uniqueKey=3  userid=c
> > >
> > > server B hold documents:
> > > Doc1: uniqueKey=1 userid=a
> > > Doc2: uniqueKey=4 userid=b
> > > Doc3: uniqueKey=5  userid=c
> > >
> > > when a make a facet query using filed "userid", the expect result is:
> > > a:1
> > > b:2
> > > c:2
> > >
> > > but solr gives me:
> > >
> > > a:2
> > > b:2
> > > c:2
> > >
> > > However, I make a normal query using : userid:a,
> > > solr gives me total 1 result.
> > >
> > > It seems like: when making facet query,duplicate key will still
> > participate
> > > in calculate,but when making normal query,solr will choose only 1
> > document
> > > between duplication document.
> > >
> > > So,My problem is "how to remove duplicate documents during distributed
> > > facet search."
> > >
> > > thanks !
> > >
> >
>

Re: how to remove duplicate data while facet?

Reply via email to