Thanks Erick, We've never merged indexes. We don't use the MapReduceIndexerTool, but do use an external map reduce process to reindex. To reindex from an empty state we have a map reduce job which runs on a separate HBase cluster and indexes into this shard. During this job each mapper is concurrently making http update requests to the shard, but only 1 mapper should post a document per unique "id".
Reindexing from scratch is done roughly every 3 months. In between that time we have a worker external to solr which reads from an event stream and posts http updates to the solr cluster. The <uniqueKey> has never but updated to my knowledge, but if it has it definitely wasn't updated in the last 3 months since the last reindexing. Also since the last reindexing nothing in the solrconfig.xml or managed-schema has been updated, nor has the index been manipulated outside of the solr framework. On Tue, May 14, 2019 at 5:24 PM Erick Erickson <erickerick...@gmail.com> wrote: > This is indeed strange. First of all, forget about explanations that > involve the transaction log etc. When Lucene opens a searcher, it is only > for closed segments, the tlog has nothing to do with that. > > Have you ever merget indexes? The MapReduceIndexerTool, if you ever used > it, does not de-duplicate. Ditto if you ever changed the <uniqueKey>. The > fact that you say that this clears up when you re-index the document leads > me to wonder whether you have manipulated the index outside the normal Solr > framework. > > IOW, I’ve never seen this before, so I suspect there’s something you did > in your setup that seemed innocent at the time that lead to this > (temporary) situation. > > Best, > Erick > > > On May 14, 2019, at 5:43 PM, Adam Walz <a...@adamwalz.net> wrote: > > > > In my solr schema I have set a uniqueKey of "id" where the id field is a > > solr.StrField. When querying with this field as a filter I would expect > to > > always get 1 or 0 documents as a result. However I am getting back > multiple > > documents with the same "id" field, but different internal `docid`s. This > > problem is intermittent and seems to resolve itself when the document is > > updated. This is happening on solr 7.0.1 without SolrCloud and while only > > querying a single shard without routing. > > > > Any thoughts on what could be causing this behavior? This is a very large > > single shard with 300 million documents and an index size of 750GB. I > know > > that is not recommended for a single shard, but could it explain these > > duplicate results possibly because of the time it takes to commit, merge, > > or something with tlogs? > > > > -- Query -- > > http://solr:8983/solr/filesearch/select?fl=id,[docid],score&fq=id:file_ > > < > http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:* > > > > *382506116*&q=*:* > > < > http://solr1128.ve.box.net:8985/solr/filesearch/select?fl=id,[docid],score&fq=id:file_413041895994&q=*:* > > > > -- Response -- > > > > { > > "responseHeader":{ > > "status":0, > > "QTime":0, > > "params":{ > > "mm":" 1<-0% ", > > "q.alt":"*:*", > > "ps":"100", > > "echoParams":"all", > > "fl":"id,[docid],score", > > "fq":"id:file_413041895994", > > "sort":"score desc", > > "rows":"35", > > "version":"2.2", > > "q":"*:*", > > "tie":"0.01", > > "defType":"edismax", > > "qf":"id name_combined^10 name_zh-cn^10 name_shingle > > name_shingle_zh-cn name_token^60 description file_content_en > > file_content_fr file_content_de file_content_it file_content_es > > file_content_zh-cn user_name user_email comments tags", > > "pf":"description name_shingle^100 name_shingle_zh-cn^100 comments > tags", > > "wt":"json", > > "debugQuery":"off"}}, > > "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[ > > { > > "id":"file_382506116", > > > > "[docid]":346266675, > > "score":1.0}] > > },{ > > > > "id":"file_382506116", > > "[docid]":170442733, > > "score":1.0}] > > > > }} > > > > > > -- Schema snippet -- > > <fields> > > <field name="id" type="string" indexed="true" stored="true" > > required="true"/> > > </fields> > > <uniqueKey>id</uniqueKey> > > > > -- > > Adam Walz >