Yes, those numbers are different and that should explain the different
size. I think you should be able to find some information in the
Alfresco or Solr log. There must be a reason about the missing content.
For example, are those numbers coming from two comparable snapshots? In
other words, I imagine that at a given moment X you rsync-ed the two servers
* 5.365.213 is the numDocs you got just after the sync, isn't it?
* 4.537.651 is the numDocs you got in the staging server after the
reindexing isn't it? Are you sure the whole reindexing is completed?
MaxDocs is the number of documents you have in the index including the
deleted docs not yet cleared by a merge. In the console you should also
see the "Deleted docs" count which should be equal to (maxdocs - numdocs)
Ciao
Andrea
On 08/02/2019 15:53, Mathieu Menard wrote:
Hi Andrea,
I’ve checked this information and here is the result:
PRODUCTION
STAGING
*numDocs*
5.365.213
4.537.651
*MaxDoc*
5.845.469
5.129.556
It seems that there is more than 800.00 docs in PRODUCTION that will
explain the size of indexes more important. But there is a thing that
I don’t understand, we have copied the DB and the contenstore the
numDocs for the two environments should be the same no?
Could you also explain me the meaning of the maxDocs value pleases?
Thanks
Matthieu
*From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
*Sent:* vendredi 8 février 2019 14:54
*To:* solr-user@lucene.apache.org
*Subject:* Re: Solr Index Size after reindex
Hi Mathieu,
what about the docs in the two infrastructures? Do they have the same
numbers (numdocs / maxdocs)? Any meaningful message (error or not) in
log files?
Andrea
On 08/02/2019 14:19, Mathieu Menard wrote:
Hello,
I would like to have your point of view about an observation we
have made on our two alfresco install (Production and Staging
environment) and more specifically on the size of our solr indexes
on these two environments.
Regularly we do a rsync between the Production and the Staging
environment, we make a copy of the Alfresco’s DB and a copy of the
entire contenstore after that we reindex all the alfresco content.
We have noticed that for the production environment we have 19 Gb
of indexes while in the staging we have “only” 11. Gb of indexes.
We have some difficulties to understand this difference because we
assume that the indexes optimization in the same for a full
reindex or for the normal use of solr.
I’ve verified the configuration between the two solr instances and
I don’t see any differences could you help me to better understand
this phenomenon.
Here you can find some information about our two environment, if
you need more details, I will give you as soon as possible:
PRODUCTION
STAGING
Alfresco version
5.1.1.4
5.1.1.4
Solr Version
Java version
Linux Machine
See Staging_caracteristics.txt file in attachment
See Staging_caracteristics.txt file in attachment
Please let me know if you any other information I will sent it to
you rapidly.
Kind Regards
Matthieu