Yes, those numbers are different and that should explain the different size. I think you should be able to find some information in the Alfresco or Solr log. There must be a reason about the missing content. For example, are those numbers coming from two comparable snapshots? In other words, I imagine that at a given moment X you rsync-ed the two servers

 * 5.365.213 is the numDocs you got just after the sync, isn't it?
 * 4.537.651 is the numDocs you got in the staging server after the
   reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the deleted docs not yet cleared by a merge. In the console you should also see the "Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:

Hi Andrea,

I’ve checked this information and here is the result:

        

PRODUCTION

        

STAGING

*numDocs*

        

5.365.213

        

4.537.651

*MaxDoc*

        

5.845.469

        

5.129.556

It seems that there is more than 800.00 docs in PRODUCTION that will explain the size of indexes more important. But there is a thing that I don’t understand, we have copied the DB and the contenstore the numDocs for the two environments should be the same no?

Could you also explain me the meaning of the maxDocs value pleases?

Thanks

Matthieu

*From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
*Sent:* vendredi 8 février 2019 14:54
*To:* solr-user@lucene.apache.org
*Subject:* Re: Solr Index Size after reindex

Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same numbers (numdocs / maxdocs)? Any meaningful message (error or not) in log files?

Andrea

On 08/02/2019 14:19, Mathieu Menard wrote:

    Hello,

    I would like to have your point of view about an observation we
    have made on our two alfresco install (Production and Staging
    environment) and more specifically on the size of our solr indexes
    on these two environments.

    Regularly we do a rsync between the Production and the Staging
    environment, we make a copy of the Alfresco’s DB and a copy of the
    entire contenstore after that we reindex all the alfresco content.

    We have noticed that for the production environment we have 19 Gb
    of indexes while in the staging we have “only” 11. Gb of indexes.
    We have some difficulties to understand this difference because we
    assume that the indexes optimization in the same for a full
    reindex or for the normal use of solr.

    I’ve verified the configuration between the two solr instances and
    I don’t see any differences could you help me to better understand
     this phenomenon.

    Here you can find some information about our two environment, if
    you need more details, I will give you as soon as possible:

        

    PRODUCTION

        

    STAGING

    Alfresco version

        

    5.1.1.4

        

    5.1.1.4

    Solr Version

        

        

    Java version

        

        

    Linux Machine

        

    See Staging_caracteristics.txt file in attachment

        

    See Staging_caracteristics.txt file in attachment

    Please let me know if you any other information I will sent it to
    you rapidly.

    Kind Regards

    Matthieu

Reply via email to