Doesn't look like it. We do rsyncing but only as a backup for this index-- these queries are hitting the live index.

Also, the results we get back are not exact duplicates, even though the ID is the same. For example, if we update a document (replace an existing document) with new information, the index will only sometimes store two copies -- one with the old data and one with the new content. If I update again (with the same content) the duplicate goes away.

I have optimized & committed the index with no change to the pre- existing duplicates.

Where within Solr is uniqueness enforced? I'd like to at least put some debug checking in there.



On Mar 13, 2008, at 4:47 PM, Ryan McKinley wrote:

Check this thread:
http://www.nabble.com/duplicate-entries-being-returned%2C-possible-caching-issue--td15237016.html

perhaps it is related?


Brian Whitman wrote:
On a solr instance with
<!-- field to use to determine and enforce document uniqueness. -->
<uniqueKey>id</uniqueKey>
This is happening:
http://solr..../select?q=id:abc123&fl=id
<doc>
<str name="id">abc123</str>
</doc>
<doc>
<str name="id">abc123</str>
</doc>
Lots of weird stuff is writing to this index: solrj code, python solr.py, curl, etc. -- many things at the same time. Autocommit is at 30m.
4500 of the ca. 1.5m docs in this index are doubled like this.
What can get past the doc uniqueness constraint? Has anyone seen this before?


Reply via email to