Doesn't look like it. We do rsyncing but only as a backup for this
index-- these queries are hitting the live index.
Also, the results we get back are not exact duplicates, even though
the ID is the same. For example, if we update a document (replace an
existing document) with new information, the index will only sometimes
store two copies -- one with the old data and one with the new
content. If I update again (with the same content) the duplicate goes
away.
I have optimized & committed the index with no change to the pre-
existing duplicates.
Where within Solr is uniqueness enforced? I'd like to at least put
some debug checking in there.
On Mar 13, 2008, at 4:47 PM, Ryan McKinley wrote:
Check this thread:
http://www.nabble.com/duplicate-entries-being-returned%2C-possible-caching-issue--td15237016.html
perhaps it is related?
Brian Whitman wrote:
On a solr instance with
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
This is happening:
http://solr..../select?q=id:abc123&fl=id
<doc>
<str name="id">abc123</str>
</doc>
<doc>
<str name="id">abc123</str>
</doc>
Lots of weird stuff is writing to this index: solrj code, python
solr.py, curl, etc. -- many things at the same time. Autocommit is
at 30m.
4500 of the ca. 1.5m docs in this index are doubled like this.
What can get past the doc uniqueness constraint? Has anyone seen
this before?