Hi Claudio, What's happening when you re-index the documents is that Solr/Lucene implements an update as a delete plus a new index. Because of the nature of inverted indexes, deleting documents requires a rewrite of the entire index. In order to avoid rewriting the entire index each time one document is deleted, deletes are implemented as a list of deleted internal lucene ids. Documents aren't actually removed from the indexes until the index segment is merged or an optimize occurs.
maxDoc's is the total number of documents indexed without taking into consideration that some of them are marked as deleted numDocs is the actual number of undeleted documents If you run an optimize the index will be rewritten, the index size will go down and numDocs will equal maxDocs Tom Burton-West -----Original Message----- From: Claudio Devecchi [mailto:cdevec...@gmail.com] Sent: Friday, November 12, 2010 10:50 AM To: Lista Solr Subject: Doubt about index size Hi everybody, I'm doing some indexing testing on solr 1.4.1 and I'm not understanding one thing, let me try to explain. I have 1.2 million xml files and I'm indexing then, when I do it for first time my index size is around 3 GB and in my statistics on http://localhost:8983/solr/admin/stats.jsp I have two entries that is: numDocs : 1120171 maxDoc : 1120171 Until here is all right, but if I make a index update reindexing all the same 1120171 documents I have the stats bellow: numDocs : 1120171 maxDoc : 2240342 ... and my index size goes around 6GB. Why this happen? What happens on index size if I have the same number of searcheable docs? Somebody knows? Tks