" You almost certainly have a non-unique ID field." 
Yes it is not absolutely unique but do not think it is at this 1 to 6 ratio.
 
"Try it with a clean index, and then review the number of deleted documents 
(updates are a delete then insert action) "
I tried on a new instance - same effect. I do not see any deletions. Is there a 
way to determine this from the logs to confirm that the behavior is due to 
non-uniqueness? This will serve as an assurance.
Thanks 

<str name="Total Rows Fetched">6843469</str>
<str name="Total Documents Processed">6843469</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2015-08-16 21:22:24</str>
<str name="">
Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents.
</str>
<str name="Committed">2015-08-16 22:31:47</str>

Whereas '*:*' 
    "params":{
      "q":"*:*"}},
  "response":{"numFound":1143108,"start":0,"docs":[

-----Original Message-----
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Sunday, August 16, 2015 3:18 PM
To: solr-user@lucene.apache.org
Subject: Re: No. of records mismatch

You almost certainly have a non-unique ID field. Some documents are overwritten 
during indexing. Try it with a clean index, and then review the number of 
deleted documents (updates are a delete then insert action). Deletes are 
calculated with maxDocs minus numDocs.

Upayavira

On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram
wrote:
> I did a dataimport with 'clean' set to false.
> The DIH status upon completion was:
> 
> <str name="status">idle</str>
> <str name="importResponse"/>
> <lst name="statusMessages">
> <str name="Total Requests made to DataSource">1</str> <str name="Total 
> Rows Fetched">6843427</str> <str name="Total Documents 
> Processed">6843427</str> <str name="Total Documents Skipped">0</str> 
> <str name="Full Dump Started">2015-08-16 16:50:54</str> <str name=""> 
> Indexing completed. Added/Updated: 6843427 documents. Deleted 0 
> documents.
> </str>
> Whereas when I query using 'query?q=*:*&rows=0', I get the following 
> count {
>   "responseHeader":{
>     "status":0,
>     "QTime":1,
>     "params":{
>       "q":"*:*",
>       "rows":"0"}},
>   "response":{"numFound":1616376,"start":0,"docs":[]
>   }}
> 
> There is a difference of 5 million records. Can anyone help me 
> understand the behavior? The logs look fine.
> Thanks

Reply via email to