This is indeed strange. First of all, forget about explanations that involve the transaction log etc. When Lucene opens a searcher, it is only for closed segments, the tlog has nothing to do with that.
Have you ever merget indexes? The MapReduceIndexerTool, if you ever used it, does not de-duplicate. Ditto if you ever changed the <uniqueKey>. The fact that you say that this clears up when you re-index the document leads me to wonder whether you have manipulated the index outside the normal Solr framework. IOW, I’ve never seen this before, so I suspect there’s something you did in your setup that seemed innocent at the time that lead to this (temporary) situation. Best, Erick > On May 14, 2019, at 5:43 PM, Adam Walz <a...@adamwalz.net> wrote: > > In my solr schema I have set a uniqueKey of "id" where the id field is a > solr.StrField. When querying with this field as a filter I would expect to > always get 1 or 0 documents as a result. However I am getting back multiple > documents with the same "id" field, but different internal `docid`s. This > problem is intermittent and seems to resolve itself when the document is > updated. This is happening on solr 7.0.1 without SolrCloud and while only > querying a single shard without routing. > > Any thoughts on what could be causing this behavior? This is a very large > single shard with 300 million documents and an index size of 750GB. I know > that is not recommended for a single shard, but could it explain these > duplicate results possibly because of the time it takes to commit, merge, > or something with tlogs? > > -- Query -- > http://solr:8983/solr/filesearch/select?fl=id,[docid],score&fq=id:file_ > <http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:*> > *382506116*&q=*:* > <http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:*> > -- Response -- > > { > "responseHeader":{ > "status":0, > "QTime":0, > "params":{ > "mm":" 1<-0% ", > "q.alt":"*:*", > "ps":"100", > "echoParams":"all", > "fl":"id,[docid],score", > "fq":"id:file_413041895994", > "sort":"score desc", > "rows":"35", > "version":"2.2", > "q":"*:*", > "tie":"0.01", > "defType":"edismax", > "qf":"id name_combined^10 name_zh-cn^10 name_shingle > name_shingle_zh-cn name_token^60 description file_content_en > file_content_fr file_content_de file_content_it file_content_es > file_content_zh-cn user_name user_email comments tags", > "pf":"description name_shingle^100 name_shingle_zh-cn^100 comments tags", > "wt":"json", > "debugQuery":"off"}}, > "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ > { > "id":"file_382506116", > > "[docid]":346266675, > "score":1.0}] > },{ > > "id":"file_382506116", > "[docid]":170442733, > "score":1.0}] > > }} > > > -- Schema snippet -- > <fields> > <field name="id" type="string" indexed="true" stored="true" > required="true"/> > </fields> > <uniqueKey>id</uniqueKey> > > -- > Adam Walz