On 12/19/2014 11:22 AM, Tang, Rebecca wrote: > I have an index that has a field called collection_facet. > > There was a value 'Ness Motley Law Firm Documents' that we wanted to update > to 'Ness Motley Law Firm'. There were 36,132 records with this value. So I > re-indexed just the 36,132 records. After the update, I ran a facet query > (q=*:*&facet=true&facet.field=collection_facet) to see if the value got > updated and I saw > Ness Motley Law Firm 36,132 -- as expected > Ness Motley Law Firm Documents 0 — Why is this value still here even though > clearly there are no records with this value anymore? I thought maybe it was > cached, so I restarted solr, but I still got the same results. > > "facet_fields": { "collection_facet": [ > … "Ness Motley Law Firm", 36132, > … "Ness Motley Law Firm Documents", 0 ]
Updating a document in Solr is actually a delete of the old document followed by indexing a new version. When a document is deleted from an index, Lucene (the search API that Solr uses) does not actually remove that document from the index segment, it just writes an ID value to a file that tracks deletes. That document is still in the index, and its terms are still present, but the software can remove it from any results when it sees that ID value in the delete tracking file(s). Only a segment merge can eliminate the document and remove its terms from the inverted index. When you do a facet on that field, Lucene still sees "Ness Motley Law Firm Documents" in the inverted index, because nothing has actually removed it. The upper layers of Solr faceting code are aware that all the documents containing that term have been deleted, so it gets a correct document count of zero. To eliminate it from the results, you have two choices. One is to set facet.mincount=1 as a parameter on your query, the other is to run an optimize (also known as a forceMerge down to one segment) on the index. Thanks, Shawn