Hi Claudio,

What's happening when you re-index the documents is that Solr/Lucene implements 
an update as a delete plus a new index.  Because of the nature of inverted 
indexes, deleting documents requires a rewrite of the entire index. In order to 
avoid rewriting the entire index each time one document is deleted, deletes are 
implemented as a list of deleted  internal lucene ids. Documents aren't 
actually removed from the indexes until the index segment is merged or an 
optimize occurs.

maxDoc's is the total number of documents indexed without taking into 
consideration that some of them are marked as deleted
numDocs is the actual number of undeleted documents

If you run an optimize the index will be rewritten, the index size will go down 
 and numDocs will equal maxDocs 

Tom Burton-West

-----Original Message-----
From: Claudio Devecchi [mailto:cdevec...@gmail.com] 
Sent: Friday, November 12, 2010 10:50 AM
To: Lista Solr
Subject: Doubt about index size

Hi everybody,

I'm doing some indexing testing on solr 1.4.1 and I'm not understanding one
thing, let me try to explain.

I have 1.2 million xml files and I'm indexing then, when I do it for first
time my index size is around 3 GB and in my statistics on
http://localhost:8983/solr/admin/stats.jsp I have two entries that is:

numDocs : 1120171
maxDoc : 1120171

Until here is all right, but if I make a index update reindexing all the
same 1120171 documents I have the stats bellow:

numDocs : 1120171
maxDoc : 2240342

... and my index size goes around 6GB.

Why this happen? What happens on index size if I have the same number of
searcheable docs?

Somebody knows?

Tks

Reply via email to