Re: unique values from a field in a result

Ian Holsman Tue, 29 Apr 2008 03:15:59 -0700

Hi Thijs.

If you are not concerned with a *EXACT* number there is a paper that waspublished in 1990 that discusses this problem.


http://dblab.kaist.ac.kr/Publication/pdf/ACM90_TODS_v15n2.pdf

from the paper (If I understand it correctly)

For 120,000,000 records you can sample 10,112,529 records (10%) whenthe variance is low and get an answer with 95% confidence.



Regards
Ian

Thijs wrote:

It must be my english.
When I read your comment, I think you could compare it to the categoryexample...
Maybe with an example I can explain my situation better:
The documents in the index contain variations of different products.
Say for example I have 10 different products. Every product is indexed1000 times (1000 different variations, per product) the product is notunique, the variation is unique.The first 10 result of a search only contain the best matchingvariations for all the products in the complete result. So lets saythe result returns 1000 variations for 3 different products. What Ineed is some 'sidebar information' containing detailed information onal the 3 unique products in the complete result.
My example is just simple, in real life the numbers are a lot bigger.However, the amount of unique products vs variations is such that itseems a lot of work to iterate over al variations in a DocSet just toget the few unique products.But, what I understand from you anwser is that the best way to get the3 unique products is to iterate over the 1000 variations in the resultDocSet? And if that is the case I'm happy with it.
Thanks
Thijs
But to get some extra inforamtion I need al the unique values for oneof the fields in the index (being the pk of the product).
Chris Hostetter schreef:
: You are correct I'm looking for the unique values for one field ina DocSet.: The field is not multivalued. and it contains only 1 long value,the pk of a
: database table
: But you said the counts are stored in the index, I don't see that.Because
there's something very confusing about your question ... if the valueof the field is unique for every document (by "pk" you mean theprimary key for these docs in your database correct?) then why do youspecificly need the "unique terms" ? ... aren't they by definitionunique?
usually when people ask questions like this, they are interested inthe "unique values" for something like a "category" field, where lotsof documenst are in the same category, and they want to know what thefull list of categories is for all ofhte documenst that match theirquery.
if you want the list of all "primary keys" for all the documents thatmatch your query, why not just make sure that field has stored="true"in the schema.xml and getthe values that way?
I'm extra confused because of this comment...
: when I debug simplefacet. It always iterates over all the documentsin the
: result docset (SimpleFacet.getFieldCacheCounts line 259).
it doesn't *seem* like faceting is neccessary, but why do you thinkiterating over all the documents in your result set set seems like awaste here? if you want to know what *all* the values are for everydocument in your doc set, then regardless of wether the values aredistinct for each doc, how else could Solr get all the values thenlooking at each matching doc?
-Hoss

Re: unique values from a field in a result

Reply via email to