Thanks Erick for your suggestions and time.
On Tue, Feb 12, 2019, 22:32 Erick Erickson You really only have four
> 1> use exactstats. This won't guarantee precise matches, but they'll be
> closer
> 2> optimize (not particularly recommended, but if you're willing to do
> it periodically it'll have
You really only have four
1> use exactstats. This won't guarantee precise matches, but they'll be closer
2> optimize (not particularly recommended, but if you're willing to do
it periodically it'll have the stats match until the next updates).
3> use TLOG/PULL replicas and confine the requests to t
Hi Erick,
Any suggestions on this?
Regards,
Aman
On Fri, Feb 8, 2019, 17:07 Aman Tandon Hi Erick,
>
> I find this thread very relevant to the people who are facing the same
> problem.
>
> In our case, we have a signals aggregation collection which is having
> total of around 8 million records.
Hi Erick,
I find this thread very relevant to the people who are facing the same
problem.
In our case, we have a signals aggregation collection which is having total
of around 8 million records. We have Solr cloud architecture(3 shards and 4
replicas) and the whole size of index is of around 2.5
Optimization is safe. The large segment is irrelevant, you'll
lose a little parallelization, but on an index with this few
documents I doubt you'll notice.
As of Solr 5, optimize will respect the max segment size
which defaults to 5G, but you're well under that limit.
Best,
Erick
On Sun, Feb 3,
Thanks Erick and everyone.We are checking on stats cache.
I noticed stats skew again and optimized the index to correct the same.As
per the documents.
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
and
https://lucidworks.com/2018/06/20/solr-and-optimizing-y
Is this a sharded Solr Cloud collection? If so, you can try using global IDF.
That should make the scores more similar on different nodes.
https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_
wunder
Walter Underwood
wun...@wu
Maybe instead of using the solr score in your metrics, find a way to use
the documents location in the results? you can never trust the score to
be consistent, its constantly changing as the indexes changes
On Tue, Jan 29, 2019 at 1:29 PM Ashish Bisht
wrote:
> Hi Erick,
>
> Our business wanted
Hi Erick,
Our business wanted score not to be totally based on default relevancy algo.
Instead a mix of solr relevancy+usermetrics(80%+20%).
Each result doc is calculated against max score as a fraction of
80.Remaining 20 is from user metrics.
Finally sort happens on new score.
But say we g
No, this is not a bug but a consequence of the design. ExactStats can help,
but there is no guarantee that different replicas will compute the exact same
score. Scores should be very close however.
You haven't explained why you need the scores to match. 99% of the time,
worrying about scores at th
Hi Erick,
To test this scenario I added replica again and from few days have been
monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview*
section of core.Checked *Segments Info* section too.Everything looks in
sync.
http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/
http://
What Elizabeth said.
Really, this is an intractable problem. Even in the TLOG
and PULL replica case, an index getting updates will
still fire their replication requests at different wall-clock
time. Even if that were coordinated, the vagaries of
networks etc. would _still_ mean the various replica
Hello,
To a certain extent, I agree with Eric, that this isn't a problem, but
looks like one. The nature of TF*IDF is such that you will see different
scores for the same query over time on the same replica, or different
replicas for the same query with most replication schemes. This is mildly
an
Hi Erick,
Your statement "*At best, I've seen UIs where they display, say, 1 to 5
stars that are just showing the percentile that the particular doc had
_relative to the max score*" is something we are trying to achieve,but we
are dealing in percentages rather stars(ratings)
Change in MaxScore p
bq. Shouldn't both replica and leader come to same state
after this much long period.
No. After that long, the docs will be the same, all the docs
present on one replica will be present and searchable on
the other. However, they will be in different segments so the
"stats skew" will remain.
But d
Thank you Erick for explaining.
In my senario, I stopped indexing and updates too and waited for 1 day.
Restarted solr too.Shouldn't both replica and leader come to same state
after this much long period. As you said this gets corrected by segment
merging, hope it is internal process itself and n
You misunderstand my point. The wall clock times _will_ be
different on leader and follower. It follows that the
documents contained in the individual segments on
the leader and follower will _not_ be identical.
This leads to _deleted_ documents being in different
segments on the leader and follow
Hi Erick,
Thank you for the details,but doesn't look like a time difference in
autocommit caused this issue.As I said if I do retrieve all query/keyword
query on both server,they returned correct number of docs,its just relevancy
score is taking diff values.
I waited for brief period,still disc
Ashish:
Deleting and re-adding a replica is not a solution. Even if you did,
that would then be identical only until you started indexing again,
then the stats could skew a bit.
When you index to NRT replicas, the wall clock times that cause the
commits to trigger will be different due to network
Hi Erick,
I have updated that I am not facing this problem in a new collection.
As per 3) I can try deleting a replica and adding it again, but the
confusion is which one out of two should I delete.(wondering which replica
is giving correct score for query)
Both replicas give same number of d
Replicated segments might have different deleted documents by design.
Precise numbers can be achieved via exact stats. see
https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_
On Fri, Jan 4, 2019 at 2:40 PM AshB wrote:
> Ve
See particularly point 3 here and to a lesser extent point 2.
https://support.lucidworks.com/s/question/0D5803LRpijCAD/the-number-of-results-returned-is-not-constant-every-time-i-query-solr
For point two (the internal Lucene doc IDs are different) you can
easily correct it by adding sort=score
Version Solr 7.4.0 zookeeper 3.4.11 Achitecture Two boxes Machine-1,Machine-2
holding single instances of solr
We are having a collection which was single shard and single replica i.e s=1
and rf=1
Few days back we tried to add replica to it.But the score for same query is
coming different from di
23 matches
Mail list logo