Hi Lance,
Could you provide more details about implementing this using
SignatureUpdateProcessor?
Example can be helpful.
-
Rita
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-about-sampling-tp3984103p3985379.html
Sent from the Solr - User mailing list archive
e. :)
> But I'd love to be wrong!
>
> Otis
>
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
>>
>> From: Yuval Dotan
>>To: solr-user
>>Sent: Wednesday, May 16
: solr-user
>Sent: Wednesday, May 16, 2012 9:43 AM
>Subject: Question about sampling
>
>Hi Guys
>We have an environment containing billions of documents.
>Faceting over this large result set could take many seconds, and so we
>thought we might be able to use statistical sampling
Yes. The trick is to use a hash value on each document. The
SignatureUpdateProcessor provides a tool for this. Store the hash
value in a hex string field.
Now, do wildcard queries on the hash string: hash:a* will randomly
choose 1/16 of the documents. hash:00* will pick 1/256 of the
documents.
On