Here's Mike McCandless' blog on the topic:
https://www.elastic.co/blog/lucenes-handling-of-deleted-documents
The same options he mentions are available in Solr as both use Lucene
under the covers.
The long and short of it is that you can have a significant amount of
deleted documents in your ind
Thank you, Erick Erickson and Shawn Heisey for your excellent answers.
For some of our collections, it would seem that an occasional optimize
would be a good thing. However we have some collections that are updated
constantly
Would using the commit expungeDeletes help mitigate the issue?
I also c
On 9/7/2017 8:54 AM, Webster Homer wrote:
> I am not concerned about deleted documents. I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents
>
> I have an enhanced collections info api call that calls the co
We have several cloud collections, but this one is updated once a day with
a partial load, and once a week with a full load, followed by a delete
which is based upon an index_date field (timestamp of the solr record).
For this and related collections optimizing once per day is probably
acceptable.
bq: So apparently it IS essential to run optimize after a data load
Don't do this if you can avoid it, you run the risk of excessive
amounts of your index consisting of deleted documents unless you are
following a process whereby you periodically (and I'm talking at least
hours, if not once per da
We have several solr clouds, a couple of them have only 1 replica per
shard. We have never observed the problem when we have a single replica
only when there are multiple replicas per shard.
On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer
wrote:
> the scores are not the same
> Doc
> 305340 432.44
the scores are not the same
Doc
305340 432.44238
C2646 428.24185
12837 430.61722
One other thing. I just ran optimize and now document 305340 is
consistently the top score.
So apparently it IS essential to run optimize after a data load
Note we see this behavior fairly commonly on our sol
the scores are not the same
Doc
305340 432.44238
On Thu, Sep 7, 2017 at 10:02 AM, David Hastings <
hastings.recurs...@gmail.com> wrote:
> "I am concerned that the same
> search gives different results after each search. The top document seems to
> cycle between 3 different documents"
>
>
> if you
"I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents"
if you do debug query on the search, are the scores for the top 3 documents
the same or not? you can easily have three documents with the same score,
so
I am not concerned about deleted documents. I am concerned that the same
search gives different results after each search. The top document seems to
cycle between 3 different documents
I have an enhanced collections info api call that calls the core admin api
to get the index information for the r
Whew! I haven't been lying to people for _years_..
On Thu, Sep 7, 2017 at 5:58 AM, Yonik Seeley wrote:
> On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson
> wrote:
>> bq: and deleted documents are irrelevant to term statistics...
>>
>> Did you mean "relevant"? Or do I have to adjust my thinki
On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson wrote:
> bq: and deleted documents are irrelevant to term statistics...
>
> Did you mean "relevant"? Or do I have to adjust my thinking _again_?
One can make it work either way ;-)
Whether a document is marked as deleted or not has no effect on term
bq: and deleted documents are irrelevant to term statistics...
Did you mean "relevant"? Or do I have to adjust my thinking _again_?
Erick
On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote:
> Different replicas of the same shard can have different numbers of
> deleted documents (really just mar
Different replicas of the same shard can have different numbers of
deleted documents (really just marked as deleted), and deleted
documents are irrelevant to term statistics (like the number of
documents a term appears in). Documents marked for deletion stop
contributing to corpus statistics when
I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4
replicas (total of 4 nodes).
If I run the query multiple times I see the three different top scoring
results.
No data load is running, all data has been commited
I get these three different hits with their scores:
copperiinitrat
15 matches
Mail list logo