Re: What can get past document uniqueness?

Brian Whitman Fri, 14 Mar 2008 08:31:36 -0700

Doesn't look like it. We do rsyncing but only as a backup for thisindex-- these queries are hitting the live index.

Also, the results we get back are not exact duplicates, even thoughthe ID is the same. For example, if we update a document (replace anexisting document) with new information, the index will only sometimesstore two copies -- one with the old data and one with the newcontent. If I update again (with the same content) the duplicate goesaway.

I have optimized & committed the index with no change to the pre-existing duplicates.

Where within Solr is uniqueness enforced? I'd like to at least putsome debug checking in there.




On Mar 13, 2008, at 4:47 PM, Ryan McKinley wrote:

Check this thread:
http://www.nabble.com/duplicate-entries-being-returned%2C-possible-caching-issue--td15237016.html

perhaps it is related?


Brian Whitman wrote:

On a solr instance with

<uniqueKey>id</uniqueKey>
This is happening:
http://solr..../select?q=id:abc123&fl=id
<doc>
<str name="id">abc123</str>
</doc>
<doc>
<str name="id">abc123</str>
</doc>
Lots of weird stuff is writing to this index: solrj code, pythonsolr.py, curl, etc. -- many things at the same time. Autocommit isat 30m.
4500 of the ca. 1.5m docs in this index are doubled like this.
What can get past the doc uniqueness constraint? Has anyone seenthis before?

Re: What can get past document uniqueness?

Reply via email to