" You almost certainly have a non-unique ID field." Yes it is not absolutely unique but do not think it is at this 1 to 6 ratio. "Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action) " I tried on a new instance - same effect. I do not see any deletions. Is there a way to determine this from the logs to confirm that the behavior is due to non-uniqueness? This will serve as an assurance. Thanks
<str name="Total Rows Fetched">6843469</str> <str name="Total Documents Processed">6843469</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2015-08-16 21:22:24</str> <str name=""> Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents. </str> <str name="Committed">2015-08-16 22:31:47</str> Whereas '*:*' "params":{ "q":"*:*"}}, "response":{"numFound":1143108,"start":0,"docs":[ -----Original Message----- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Sunday, August 16, 2015 3:18 PM To: solr-user@lucene.apache.org Subject: Re: No. of records mismatch You almost certainly have a non-unique ID field. Some documents are overwritten during indexing. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action). Deletes are calculated with maxDocs minus numDocs. Upayavira On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram wrote: > I did a dataimport with 'clean' set to false. > The DIH status upon completion was: > > <str name="status">idle</str> > <str name="importResponse"/> > <lst name="statusMessages"> > <str name="Total Requests made to DataSource">1</str> <str name="Total > Rows Fetched">6843427</str> <str name="Total Documents > Processed">6843427</str> <str name="Total Documents Skipped">0</str> > <str name="Full Dump Started">2015-08-16 16:50:54</str> <str name=""> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 > documents. > </str> > Whereas when I query using 'query?q=*:*&rows=0', I get the following > count { > "responseHeader":{ > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":1616376,"start":0,"docs":[] > }} > > There is a difference of 5 million records. Can anyone help me > understand the behavior? The logs look fine. > Thanks