On 12/19/2014 11:22 AM, Tang, Rebecca wrote:
> I have an index that has a field called collection_facet.
>
> There was a value 'Ness Motley Law Firm Documents' that we wanted to update 
> to 'Ness Motley Law Firm'.  There were 36,132 records with this value.  So I 
> re-indexed just the 36,132 records.  After the update, I ran a facet query 
> (q=*:*&facet=true&facet.field=collection_facet) to see if the value got 
> updated and I saw
> Ness Motley Law Firm 36,132  -- as expected
> Ness Motley Law Firm Documents 0 — Why is this value still here even though 
> clearly there are no records with this value anymore?  I thought maybe it was 
> cached, so I restarted solr, but I still got the same results.
>
> "facet_fields": { "collection_facet": [
> … "Ness Motley Law Firm", 36132,
> … "Ness Motley Law Firm Documents", 0 ]

Updating a document in Solr is actually a delete of the old document
followed by indexing a new version.

When a document is deleted from an index, Lucene (the search API that
Solr uses) does not actually remove that document from the index
segment, it just writes an ID value to a file that tracks deletes.  That
document is still in the index, and its terms are still present, but the
software can remove it from any results when it sees that ID value in
the delete tracking file(s).  Only a segment merge can eliminate the
document and remove its terms from the inverted index.

When you do a facet on that field, Lucene still sees "Ness Motley Law
Firm Documents" in the inverted index, because nothing has actually
removed it. The upper layers of Solr faceting code are aware that all
the documents containing that term have been deleted, so it gets a
correct document count of zero.

To eliminate it from the results, you have two choices.  One is to set
facet.mincount=1 as a parameter on your query, the other is to run an
optimize (also known as a forceMerge down to one segment) on the index.

Thanks,
Shawn

Reply via email to