It's probably a good idea to optimize. How are you re-indexing anyway? DIH? custom code? post.jar?
Manual optimizing is just issuing the appropriate curl command, see: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Best Erick On Fri, Nov 12, 2010 at 12:13 PM, Claudio Devecchi <[email protected]>wrote: > Hi Tom, thanks for your explanation, > > Do you recommend the index continues this way? Or can I configure it to > make > optmize automatically? > > tks > > On Fri, Nov 12, 2010 at 2:39 PM, Burton-West, Tom <[email protected] > >wrote: > > > Hi Claudio, > > > > What's happening when you re-index the documents is that Solr/Lucene > > implements an update as a delete plus a new index. Because of the nature > of > > inverted indexes, deleting documents requires a rewrite of the entire > index. > > In order to avoid rewriting the entire index each time one document is > > deleted, deletes are implemented as a list of deleted internal lucene > ids. > > Documents aren't actually removed from the indexes until the index > segment > > is merged or an optimize occurs. > > > > maxDoc's is the total number of documents indexed without taking into > > consideration that some of them are marked as deleted > > numDocs is the actual number of undeleted documents > > > > If you run an optimize the index will be rewritten, the index size will > go > > down and numDocs will equal maxDocs > > > > Tom Burton-West > > > > -----Original Message----- > > From: Claudio Devecchi [mailto:[email protected]] > > Sent: Friday, November 12, 2010 10:50 AM > > To: Lista Solr > > Subject: Doubt about index size > > > > Hi everybody, > > > > I'm doing some indexing testing on solr 1.4.1 and I'm not understanding > one > > thing, let me try to explain. > > > > I have 1.2 million xml files and I'm indexing then, when I do it for > first > > time my index size is around 3 GB and in my statistics on > > http://localhost:8983/solr/admin/stats.jsp I have two entries that is: > > > > numDocs : 1120171 > > maxDoc : 1120171 > > > > Until here is all right, but if I make a index update reindexing all the > > same 1120171 documents I have the stats bellow: > > > > numDocs : 1120171 > > maxDoc : 2240342 > > > > ... and my index size goes around 6GB. > > > > Why this happen? What happens on index size if I have the same number of > > searcheable docs? > > > > Somebody knows? > > > > Tks > > > > > > -- > Claudio Devecchi > flickr.com/cdevecchi >
