This is indeed strange. First of all, forget about explanations that involve 
the transaction log etc. When Lucene opens a searcher, it is only for closed 
segments, the tlog has nothing to do with that.

Have you ever merget indexes? The MapReduceIndexerTool, if you ever used it, 
does not de-duplicate. Ditto if you ever changed the <uniqueKey>. The fact that 
you say that this clears up when you re-index the document leads me to wonder 
whether you have manipulated the index outside the normal Solr framework.

IOW, I’ve never seen this before, so I suspect there’s something you did in 
your setup that seemed innocent at the time that lead to this (temporary) 
situation.

Best,
Erick

> On May 14, 2019, at 5:43 PM, Adam Walz <a...@adamwalz.net> wrote:
> 
> In my solr schema I have set a uniqueKey of "id" where the id field is a
> solr.StrField. When querying with this field as a filter I would expect to
> always get 1 or 0 documents as a result. However I am getting back multiple
> documents with the same "id" field, but different internal `docid`s. This
> problem is intermittent and seems to resolve itself when the document is
> updated. This is happening on solr 7.0.1 without SolrCloud and while only
> querying a single shard without routing.
> 
> Any thoughts on what could be causing this behavior? This is a very large
> single shard with 300 million documents and an index size of 750GB. I know
> that is not recommended for a single shard, but could it explain these
> duplicate results possibly because of the time it takes to commit, merge,
> or something with tlogs?
> 
> -- Query --
> http://solr:8983/solr/filesearch/select?fl=id,[docid],score&fq=id:file_
> <http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:*>
> *382506116*&q=*:*
> <http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:*>
> -- Response --
> 
> {
>  "responseHeader":{
>    "status":0,
>    "QTime":0,
>    "params":{
>      "mm":" 1<-0% ",
>      "q.alt":"*:*",
>      "ps":"100",
>      "echoParams":"all",
>      "fl":"id,[docid],score",
>      "fq":"id:file_413041895994",
>      "sort":"score desc",
>      "rows":"35",
>      "version":"2.2",
>      "q":"*:*",
>      "tie":"0.01",
>      "defType":"edismax",
>      "qf":"id name_combined^10 name_zh-cn^10 name_shingle
> name_shingle_zh-cn name_token^60 description file_content_en
> file_content_fr file_content_de file_content_it file_content_es
> file_content_zh-cn user_name user_email comments tags",
>      "pf":"description name_shingle^100 name_shingle_zh-cn^100 comments tags",
>      "wt":"json",
>      "debugQuery":"off"}},
>  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
>      {
>        "id":"file_382506116",
> 
>        "[docid]":346266675,
>        "score":1.0}]
>  },{
> 
>        "id":"file_382506116",
>        "[docid]":170442733,
>        "score":1.0}]
> 
>  }}
> 
> 
> -- Schema snippet --
> <fields>
>  <field name="id" type="string" indexed="true" stored="true"
> required="true"/>
> </fields>
> <uniqueKey>id</uniqueKey>
> 
> -- 
> Adam Walz

Reply via email to