Thanks @Toke,  for pointing out these options. I'll have a read about 
expungeDeletes. 

Sounds even more so, that having solr filter out 0-counts is a good idea and I 
should handle my use-case outside of solr.

Thanks again,
Sebastian

On Fri, 2017-01-13 at 14:19 +0000, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w 
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the 
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues structure in 
the segment files, without respect to documents marked as deleted. At some 
point you had one or more documents with m_mediaType_s:1, which were later 
deleted.

If your index is not too large, you can verify this by optimizing down to 1 
segment, which will remove all traces of deleted documents (unless the index is 
already 1 segment).

If you cannot live with the false terms, committing with expungeDeletes=true 
should do the trick, although it is likely to make your indexing process a lot 
heavier.

The reason for this inaccuracy is that it is quite heavy to verify whether a 
docvalue is referenced by a document: Each time one or more documents in a 
segment are deleted, all references from all documents in that segment would 
have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where _all_ 
documents with a certain docvalue are deleted, my guess it that it is seen as 
too much of an edge case to handle.
--
Toke Eskildsen, Royal Danish Library

Reply via email to