The other thing I would be curious about is in your reindexing process, do
you clear out the entire index before hand?  if so perhaps there is content
missing/moved

On Thu, Feb 14, 2019 at 11:07 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> Basically, this is not possible ;). Therefore there's something I
> don't understand....
>
> There's nothing anywhere except what's in the index. By that I mean that
> _if_
> you copy an index (the data directory and children) from one place to
> another,
> that's all there is. No information about what's in the index is stored
> anywhere
> else. So there are a couple of possibilities I see:
>
> 1> Your rsync isn't doing what you think. By that I mean that "somehow" it
> isn't
> copying segments (perhaps with the same name, although the size and time
> checks would make it extremely unlikely to skip one). What happens if
> you _delete_ the data index on your target system first?
>
> 2> I'm not entirely sure what happens if there are multiple
> "segments_n" files. in
> the index. That file "points" to all the current segments. From a strictly
> theoretical standpoint, my _guess_ is that Lucene chooses the one with the
> highest "_n" value. So if you have multiple ones of those, it would be
> interesting
> to know,
>
> 3> Has Solr been restarted (or at least the core reloaded) on the target?
>
> So here's the experiment I'd run:
> 1> shut down the Solr running on the target
> 2> delete the data dir.
> 3> restart Solr and verify that you have zero docs. This will recreate
> the data dir and verify that that Solr instance is pointing where you
> think it is as a sanity check.
> 4> stop Solr again on the target.
> 5> do a hard commit on the source.
> 6> get a a long listing "ls -l" on your source index. This should be a
> lot of flies like _0.tim, _0.fdt...., _1.tim, _1.fdt.... etc .
> 7> do your rsync. You should _not_ be indexing to the source at this time.
> 8> start Solr on the target.
> 9> check the target again. Assuming that you have _not_ been adding
> any documents to the source system during the rsync, I'd be stunned if
> there were any differences.
> 10> If there are incorrect counts or other anomalies:
> 10.1> double-check your rsync. Is it really getting the files from your
> source?
> 10.2> compare the long listing from your index you took in <6> with
> the target. Are all files identical size-wise? Are there any files on
> the target that are not on the source and vice-versa? If there are
> differences, that would explain your issues and would point to your
> rsync process being messed up.
>
> If the index directories are identical on the source and target and
> you _still_ see differences then there's an alternate reality that we
> occupy ;).
>
> And the Alfresco folks would probably be the ones to contact.
>
> Best,
> Erick
>
>
>
> On Wed, Feb 13, 2019 at 11:28 PM Mathieu Menard
> <mathieu.men...@realdolmen.com> wrote:
> >
> > Hello Andrea,
> >
> > I'm really sorry for the delay of my answer but I beed more information
> before answer you.
> >
> > Yes 5.365.213 is the numDocs you got just after the sync and yes
> 4.537.651 is the numDocs you got in the staging server after the reindexing
> and the colleague who realized the rsync confirm that it has been entirely
> completed.
> >
> > I don't see any transaction not completed that normaly means that the
> indexation is completed. That's why I don't understand the difference.
> >
> > Kind Regards
> >
> > Matthieu
> >
> > ----Original Message-----
> > From: Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> > Sent: samedi 9 février 2019 16:56
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr Index Size after reindex
> >
> > Yes, those numbers are different and that should explain the different
> size. I think you should be able to find some information in the Alfresco
> or Solr log. There must be a reason about the missing content.
> > For example, are those numbers coming from two comparable snapshots? In
> other words, I imagine that at a given moment X you rsync-ed the two servers
> >
> >   * 5.365.213 is the numDocs you got just after the sync, isn't it?
> >   * 4.537.651 is the numDocs you got in the staging server after the
> >     reindexing isn't it? Are you sure the whole reindexing is completed?
> >
> > MaxDocs is the number of documents you have in the index including the
> deleted docs not yet cleared by a merge. In the console you should also see
> the "Deleted docs" count which should be equal to (maxdocs - numdocs)
> >
> > Ciao
> >
> > Andrea
> >
> > On 08/02/2019 15:53, Mathieu Menard wrote:
> > >
> > > Hi Andrea,
> > >
> > > I've checked this information and here is the result:
> > >
> > >
> > >
> > > PRODUCTION
> > >
> > >
> > >
> > > STAGING
> > >
> > > *numDocs*
> > >
> > >
> > >
> > > 5.365.213
> > >
> > >
> > >
> > > 4.537.651
> > >
> > > *MaxDoc*
> > >
> > >
> > >
> > > 5.845.469
> > >
> > >
> > >
> > > 5.129.556
> > >
> > > It seems that there is more than 800.00 docs in PRODUCTION that will
> > > explain the size of indexes more important. But there is a thing that
> > > I don't understand, we have copied the DB and the contenstore the
> > > numDocs for the two environments should be the same no?
> > >
> > > Could you also explain me the meaning of the maxDocs value pleases?
> > >
> > > Thanks
> > >
> > > Matthieu
> > >
> > > *From:*Andrea Gazzarini [mailto:a.gazzar...@sease.io]
> > > *Sent:* vendredi 8 février 2019 14:54
> > > *To:* solr-user@lucene.apache.org
> > > *Subject:* Re: Solr Index Size after reindex
> > >
> > > Hi Mathieu,
> > > what about the docs in the two infrastructures? Do they have the same
> > > numbers (numdocs / maxdocs)? Any meaningful message (error or not) in
> > > log files?
> > >
> > > Andrea
> > >
> > > On 08/02/2019 14:19, Mathieu Menard wrote:
> > >
> > >     Hello,
> > >
> > >     I would like to have your point of view about an observation we
> > >     have made on our two alfresco install (Production and Staging
> > >     environment) and more specifically on the size of our solr indexes
> > >     on these two environments.
> > >
> > >     Regularly we do a rsync between the Production and the Staging
> > >     environment, we make a copy of the Alfresco's DB and a copy of the
> > >     entire contenstore after that we reindex all the alfresco content.
> > >
> > >     We have noticed that for the production environment we have 19 Gb
> > >     of indexes while in the staging we have "only" 11. Gb of indexes.
> > >     We have some difficulties to understand this difference because we
> > >     assume that the indexes optimization in the same for a full
> > >     reindex or for the normal use of solr.
> > >
> > >     I've verified the configuration between the two solr instances and
> > >     I don't see any differences could you help me to better understand
> > >      this phenomenon.
> > >
> > >     Here you can find some information about our two environment, if
> > >     you need more details, I will give you as soon as possible:
> > >
> > >
> > >
> > >     PRODUCTION
> > >
> > >
> > >
> > >     STAGING
> > >
> > >     Alfresco version
> > >
> > >
> > >
> > >     5.1.1.4
> > >
> > >
> > >
> > >     5.1.1.4
> > >
> > >     Solr Version
> > >
> > >
> > >
> > >
> > >
> > >     Java version
> > >
> > >
> > >
> > >
> > >
> > >     Linux Machine
> > >
> > >
> > >
> > >     See Staging_caracteristics.txt file in attachment
> > >
> > >
> > >
> > >     See Staging_caracteristics.txt file in attachment
> > >
> > >     Please let me know if you any other information I will sent it to
> > >     you rapidly.
> > >
> > >     Kind Regards
> > >
> > >     Matthieu
> > >
>

Reply via email to