We have increased the number of documents in the SolrCloud collection to
several million now and are seeing the "issue" again:

If there are 10 documents each with exactly the same highest score and we
run the query again and again, the order of documents changes. So strictly
speaking although all documents are equally relevant, it will be very nice
if the order can stay the same so that users are confident about query
results.

How can we make sure that the order does not change when the query is run
again and again for documents that are equally relevant (i.e. their score
is exactly the same)?

Thanks

On Fri, Jan 15, 2016 at 3:12 PM, Brian Narsi <bnars...@gmail.com> wrote:

> Data is indexed using Data Import Handler with clean=true, commit=true and
> optimize=true. After that there are no updates or delete.
>
> The setup is SolrCloud with 2 shards and 2 replicas each.
>
> If the data and query has not changed, one expects to see the same results
> on repeated searches; so it is a matter of users confidence in search
> results.
>
> Thanks
>
> On Fri, Jan 15, 2016 at 10:12 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> Probably the fact that information from deleted/updated
>> documents is still hanging around in the corpus until
>> merged away.
>>
>> The nub of the issue is that terms in deleted documents
>> (or the replaced doc if you update) still influence tf/idf
>> calculations. If you optimize as Binoy suggests, all of
>> the information relating to deleted docs is removed.
>>
>> If this is a SolrCloud setup, you can be getting
>> scores from different replicas of the same shard. Due to
>> the fact that merging (which purges deleted information)
>> can occur at different times on different replicas, the scores
>> calculated for a particular doc might be different depending
>> on which replica calculated it.
>>
>> In either setup (SolrCloud or not), background merging can
>> change the result order by removing information associated
>> with deleted docs.
>>
>> All that said, does this have _practical_ consequences or
>> is this mostly a curiosity question?
>>
>> Best,
>> Erick
>>
>> On Fri, Jan 15, 2016 at 5:40 AM, Binoy Dalal <binoydala...@gmail.com>
>> wrote:
>> > You should try debugging such queries to see how exactly they're being
>> > executed.
>> > That will give you an idea as to why you're seeing the results you see.
>> >
>> > On Fri, 15 Jan 2016, 19:05 Brian Narsi <bnars...@gmail.com> wrote:
>> >
>> >> We have an index of 25 fields. Currently number of records in index is
>> >> about 120,000. We are using
>> >>
>> >> parser: edismax
>> >>
>> >> qf: contains 8 fields
>> >>
>> >> fq: 1 field
>> >>
>> >> mm = 1
>> >>
>> >> qs = 6
>> >>
>> >> pf: containing g 3 fields
>> >>
>> >> bf: containing 1 field
>> >>
>> >> We have noticed that sometimes results change between two searches
>> even if
>> >> everything is constant.
>> >>
>> >> What we have identified is if we reindex data and optimize it remedies
>> the
>> >> situation.
>> >>
>> >> Is that expected behavior? Or should we also look into other factors?
>> >>
>> >> Thanks
>> >>
>> > --
>> > Regards,
>> > Binoy Dalal
>>
>
>

Reply via email to