You almost certainly have a non-unique ID field. Some documents are overwritten during indexing. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action). Deletes are calculated with maxDocs minus numDocs.
Upayavira On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram wrote: > I did a dataimport with 'clean' set to false. > The DIH status upon completion was: > > <str name="status">idle</str> > <str name="importResponse"/> > <lst name="statusMessages"> > <str name="Total Requests made to DataSource">1</str> > <str name="Total Rows Fetched">6843427</str> > <str name="Total Documents Processed">6843427</str> > <str name="Total Documents Skipped">0</str> > <str name="Full Dump Started">2015-08-16 16:50:54</str> > <str name=""> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 > documents. > </str> > Whereas when I query using 'query?q=*:*&rows=0', I get the following > count > { > "responseHeader":{ > "status":0, > "QTime":1, > "params":{ > "q":"*:*", > "rows":"0"}}, > "response":{"numFound":1616376,"start":0,"docs":[] > }} > > There is a difference of 5 million records. Can anyone help me understand > the behavior? The logs look fine. > Thanks