On 12/4/2012 5:33 PM, Shawn Heisey wrote:
I am doing a DIH full import on a very recent checkout from
branch_4x. Something I've recently done differently is enabling
autocommit. I am seeing that there are deleted documents in some of
the indexes. See "Development Build Indexes" at the bottom of the
following screenshot. When the import is complete, the numbered
shards will contain 13 million documents.
http://dl.dropbox.com/u/97770508/statuspage-deletes-import.png
The MySQL database that this imports from has a unique index on the
field that Solr is using for its UniqueKey, soit's not possible to
have duplicates. Each import uses one SELECT statement for the entire
13 million document import. What might be leading to these deleted docs?
Interesting development: The imports are now up to over 11 million
documents, but now the number of deleted documents on all shards is zero.
I calculate deleted documents on my stats page by subtracting numDocs
from maxDoc, information gathered from /admin/mbeans?stats=true.
Thanks,
Shawn