On 2-Apr-08, at 11:29 AM, Vinci wrote:
Hi,
I am trying to update the index by 2 stage posting: part of the
index will
be posted in stage 1 by 1.xml, then after a meanwhiles the left of
the index
of the entry will be posted by 2.xml. Assume both 1.xml and 2.xml
have 3
document and id is used as unique field, what I see in the admin
panel make
me feels confusing:
numDocs : 3
maxDoc : 6
which number is the value of document exist in system? Is maxDoc
just only a
stat, not involved in any calculating process?
If the maxDoc is the true number of document exist in system, is the
optimization tool is the only way to compress the index?
When you add a document that has the same unique id as a document
currently in the index, the previous document is marked as "deleted"
and the new one added. This results in 6 documents physically on
disk (BUT when searching you will never see the deleted docs).
Deleted documents are purged during segment merging, which will occur
for the whole index during optimization and will happen naturally as
you add more documents to the system without optimization. Normally
it isn't something to worry about.
-Mike