This is harder than you'd think. You'd have to know
how many documents you're going to eventually have
in the result set to be able to return only a percentage,
which you can't know until you've scored the entire
result set.

Say you're seeing the 10th document you'd eventually return.
How do you know how many more there'll be? You could
have 1,000,000 more docs fit your criteria or 0, you just
don't know at that point.

What you could do, I suppose, is fire the query once
returning 0 rows to find out the number of docs that satisfy
the result. Then use "deep paging" to cycle through all those
docs choosing some %.

You could also do some interesting things in some custom code,
consider something that would add all the docs to a BitSet
(this is code already in Solr) and randomly choose N of them to
return.

But there's nothing OOB that I know of that does this.

Best,
Erick




On Wed, Sep 28, 2016 at 8:00 AM, Yongtao Liu <y...@commvault.com> wrote:
> Alexandre,
>
> Thanks for reply.
> The use case is customer want to review document based on search result.
> But they do not want to review all, since it is costly.
> So, they want to pick partial (from 1% to 100%) document to review.
> For statistics, user also ask this function.
> It is kind of common requirement
> Do you know any plan to implement this feature in future?
>
> Post filter should work. Like collapsing query parser.
>
> Thanks,
> Yongtao
> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, September 27, 2016 9:25 PM
> To: solr-user
> Subject: Re: how to sampling search result
>
> I am not sure I understand what the business case is. However, you might be 
> able to do something with a custom post-filter.
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 27 September 2016 at 22:29, Yongtao Liu <y...@commvault.com> wrote:
>> Mikhail,
>>
>> Thanks for your reply.
>>
>> Random field is based on index time.
>> We want to do sampling based on search result.
>>
>> Like if the random field has value 1 - 100.
>> And the query touched documents may all in range 90 - 100.
>> So random field will not help.
>>
>> Is it possible we can sampling based on search result?
>>
>> Thanks,
>> Yongtao
>> -----Original Message-----
>> From: Mikhail Khludnev [mailto:m...@apache.org]
>> Sent: Tuesday, September 27, 2016 11:16 AM
>> To: solr-user
>> Subject: Re: how to sampling search result
>>
>> Perhaps, you can apply a filter on random field.
>>
>> On Tue, Sep 27, 2016 at 5:57 PM, googoo <liu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is it possible I can sampling based on  "search result"?
>>> Like run query first, and search result return 1 million documents.
>>> With random sampling, 50% (500K) documents return for facet, and stats.
>>>
>>> The sampling need based on "search result".
>>>
>>> Thanks,
>>> Yongtao
>>>
>>>
>>>
>>> --
>>> View this message in context: http://lucene.472066.n3.
>>> nabble.com/how-to-sampling-search-result-tp4298269.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev

Reply via email to