subject:"Solr relevancy score different on replicated nodes"

Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Aman Tandon

Thanks Erick for your suggestions and time. On Tue, Feb 12, 2019, 22:32 Erick Erickson You really only have four > 1> use exactstats. This won't guarantee precise matches, but they'll be > closer > 2> optimize (not particularly recommended, but if you're willing to do > it periodically it'll have

Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Erick Erickson

You really only have four 1> use exactstats. This won't guarantee precise matches, but they'll be closer 2> optimize (not particularly recommended, but if you're willing to do it periodically it'll have the stats match until the next updates). 3> use TLOG/PULL replicas and confine the requests to t

Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Aman Tandon

Hi Erick, Any suggestions on this? Regards, Aman On Fri, Feb 8, 2019, 17:07 Aman Tandon Hi Erick, > > I find this thread very relevant to the people who are facing the same > problem. > > In our case, we have a signals aggregation collection which is having > total of around 8 million records.

Re: Solr relevancy score different on replicated nodes

2019-02-08 Thread Aman Tandon

Hi Erick, I find this thread very relevant to the people who are facing the same problem. In our case, we have a signals aggregation collection which is having total of around 8 million records. We have Solr cloud architecture(3 shards and 4 replicas) and the whole size of index is of around 2.5

Re: Solr relevancy score different on replicated nodes

2019-02-07 Thread Erick Erickson

Optimization is safe. The large segment is irrelevant, you'll lose a little parallelization, but on an index with this few documents I doubt you'll notice. As of Solr 5, optimize will respect the max segment size which defaults to 5G, but you're well under that limit. Best, Erick On Sun, Feb 3,

Re: Solr relevancy score different on replicated nodes

2019-02-03 Thread Ashish Bisht

Thanks Erick and everyone.We are checking on stats cache. I noticed stats skew again and optimized the index to correct the same.As per the documents. https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ and https://lucidworks.com/2018/06/20/solr-and-optimizing-y

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Walter Underwood

Is this a sharded Solr Cloud collection? If so, you can try using global IDF. That should make the scores more similar on different nodes. https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ wunder Walter Underwood wun...@wu

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread David Hastings

Maybe instead of using the solr score in your metrics, find a way to use the documents location in the results? you can never trust the score to be consistent, its constantly changing as the indexes changes On Tue, Jan 29, 2019 at 1:29 PM Ashish Bisht wrote: > Hi Erick, > > Our business wanted

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht

Hi Erick, Our business wanted score not to be totally based on default relevancy algo. Instead a mix of solr relevancy+usermetrics(80%+20%). Each result doc is calculated against max score as a fraction of 80.Remaining 20 is from user metrics. Finally sort happens on new score. But say we g

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Erick Erickson

No, this is not a bug but a consequence of the design. ExactStats can help, but there is no guarantee that different replicas will compute the exact same score. Scores should be very close however. You haven't explained why you need the scores to match. 99% of the time, worrying about scores at th

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht

Hi Erick, To test this scenario I added replica again and from few days have been monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview* section of core.Checked *Segments Info* section too.Everything looks in sync. http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/ http://

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Erick Erickson

What Elizabeth said. Really, this is an intractable problem. Even in the TLOG and PULL replica case, an index getting updates will still fire their replication requests at different wall-clock time. Even if that were coordinated, the vagaries of networks etc. would _still_ mean the various replica

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Elizabeth Haubert

Hello, To a certain extent, I agree with Eric, that this isn't a problem, but looks like one. The nature of TF*IDF is such that you will see different scores for the same query over time on the same replica, or different replicas for the same query with most replication schemes. This is mildly an

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Ashish Bisht

Hi Erick, Your statement "*At best, I've seen UIs where they display, say, 1 to 5 stars that are just showing the percentile that the particular doc had _relative to the max score*" is something we are trying to achieve,but we are dealing in percentages rather stars(ratings) Change in MaxScore p

Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Erick Erickson

bq. Shouldn't both replica and leader come to same state after this much long period. No. After that long, the docs will be the same, all the docs present on one replica will be present and searchable on the other. However, they will be in different segments so the "stats skew" will remain. But d

Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Ashish Bisht

Thank you Erick for explaining. In my senario, I stopped indexing and updates too and waited for 1 day. Restarted solr too.Shouldn't both replica and leader come to same state after this much long period. As you said this gets corrected by segment merging, hope it is internal process itself and n

Re: Solr relevancy score different on replicated nodes

2019-01-07 Thread Erick Erickson

You misunderstand my point. The wall clock times _will_ be different on leader and follower. It follows that the documents contained in the individual segments on the leader and follower will _not_ be identical. This leads to _deleted_ documents being in different segments on the leader and follow

Re: Solr relevancy score different on replicated nodes

2019-01-06 Thread Ashish Bisht

Hi Erick, Thank you for the details,but doesn't look like a time difference in autocommit caused this issue.As I said if I do retrieve all query/keyword query on both server,they returned correct number of docs,its just relevancy score is taking diff values. I waited for brief period,still disc

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson

Ashish: Deleting and re-adding a replica is not a solution. Even if you did, that would then be identical only until you started indexing again, then the stats could skew a bit. When you index to NRT replicas, the wall clock times that cause the commits to trigger will be different due to network

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Ashish Bisht

Hi Erick, I have updated that I am not facing this problem in a new collection. As per 3) I can try deleting a replica and adding it again, but the confusion is which one out of two should I delete.(wondering which replica is giving correct score for query) Both replicas give same number of d

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Mikhail Khludnev

Replicated segments might have different deleted documents by design. Precise numbers can be achieved via exact stats. see https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ On Fri, Jan 4, 2019 at 2:40 PM AshB wrote: > Ve

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson

See particularly point 3 here and to a lesser extent point 2. https://support.lucidworks.com/s/question/0D5803LRpijCAD/the-number-of-results-returned-is-not-constant-every-time-i-query-solr For point two (the internal Lucene doc IDs are different) you can easily correct it by adding sort=score

Solr relevancy score different on replicated nodes

2019-01-04 Thread AshB

Version Solr 7.4.0 zookeeper 3.4.11 Achitecture Two boxes Machine-1,Machine-2 holding single instances of solr We are having a collection which was single shard and single replica i.e s=1 and rf=1 Few days back we tried to add replica to it.But the score for same query is coming different from di

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Re: Solr relevancy score different on replicated nodes

Solr relevancy score different on replicated nodes

23 matches

Site Navigation

Mail list logo

Footer information