Dedupe of document results at query-time

Peter S Sat, 23 Jan 2010 03:31:52 -0800

Hi,


I wonder if someone might be able to shed some insight into this problem:

 

Is it possible and/or what is the best/accepted way to achieve deduplication of 
documents by field at query-time?

 

For example:

Let's say an index contains:

 

Doc1

--------------------

host:Host1

time:1 Sept 09

appname:activePDF

 

Doc2

--------------------

host:Host1

time:2 Sept 09

appname:activePDF

 

Doc3

--------------------

host:Host1

time:3 Sept 09

appname:activePDF

 

Can a query be constructed that would return only 1 of these Documents based on 
appname (doesn't really matter which one)?

 

i.e.:

   match on host:Host1

   ignore time

   dedupe on appname:activePDF

 

Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
talking about field collapsing?

 

Many thanks,

Peter

 
                                          
_________________________________________________________________
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Dedupe of document results at query-time

Reply via email to