A couple of things: 1> Be a little careful looking at deletedDocs, maxDocs and numDocs when you're done. Deleted (or updated) docs are "merged away" as segments merge. deletedDocs isn't an count of all docs that _have_ been deleted, it's just a count of the docs that have been delete/updated but not yet merged away.
2> I do not see any deletions. This isn't a count of unique IDs replaced, but the number of explicit deletions. Having that as 0 doesn't indicate that docs have been updated. 3> bq: "Yes it is not absolutely unique but do not think it is at this 1 to 6 ratio". Check your assumption here. Assuming this is a database, select the count of whatever field maps to your <uniqueKey>. 4> Is this a sharded situation? It shouldn't matter, you should get a full count unless you explicitly are adding &distrib=false, just checking. 5> If none of that is the problem, let's see your config etc. Best, Erick On Sun, Aug 16, 2015 at 11:57 PM, davidphilip cherian <davidphilipcher...@gmail.com> wrote: > Hi, > > You should check whether there were deletions by navigating to solr admin > core admin page. Example url > http://localhost:8983/solr/#/~cores/test_shard1_replica1, check for > numDocs, maxDocs and deletedDocs. If numDocs remains equal to maxDocs, then > you confirm that there were no updations (as recommended by Upayavira) > > HTH > > On Mon, Aug 17, 2015 at 4:41 AM, Pattabiraman, Meenakshisundaram < > pattabiraman.meenakshisunda...@aig.com> wrote: > >> " You almost certainly have a non-unique ID field." >> Yes it is not absolutely unique but do not think it is at this 1 to 6 >> ratio. >> >> "Try it with a clean index, and then review the number of deleted >> documents (updates are a delete then insert action) " >> I tried on a new instance - same effect. I do not see any deletions. Is >> there a way to determine this from the logs to confirm that the behavior is >> due to non-uniqueness? This will serve as an assurance. >> Thanks >> >> <str name="Total Rows Fetched">6843469</str> >> <str name="Total Documents Processed">6843469</str> >> <str name="Total Documents Skipped">0</str> >> <str name="Full Dump Started">2015-08-16 21:22:24</str> >> <str name=""> >> Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents. >> </str> >> <str name="Committed">2015-08-16 22:31:47</str> >> >> Whereas '*:*' >> "params":{ >> "q":"*:*"}}, >> "response":{"numFound":1143108,"start":0,"docs":[ >> >> -----Original Message----- >> From: Upayavira [mailto:u...@odoko.co.uk] >> Sent: Sunday, August 16, 2015 3:18 PM >> To: solr-user@lucene.apache.org >> Subject: Re: No. of records mismatch >> >> You almost certainly have a non-unique ID field. Some documents are >> overwritten during indexing. Try it with a clean index, and then review the >> number of deleted documents (updates are a delete then insert action). >> Deletes are calculated with maxDocs minus numDocs. >> >> Upayavira >> >> On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram >> wrote: >> > I did a dataimport with 'clean' set to false. >> > The DIH status upon completion was: >> > >> > <str name="status">idle</str> >> > <str name="importResponse"/> >> > <lst name="statusMessages"> >> > <str name="Total Requests made to DataSource">1</str> <str name="Total >> > Rows Fetched">6843427</str> <str name="Total Documents >> > Processed">6843427</str> <str name="Total Documents Skipped">0</str> >> > <str name="Full Dump Started">2015-08-16 16:50:54</str> <str name=""> >> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0 >> > documents. >> > </str> >> > Whereas when I query using 'query?q=*:*&rows=0', I get the following >> > count { >> > "responseHeader":{ >> > "status":0, >> > "QTime":1, >> > "params":{ >> > "q":"*:*", >> > "rows":"0"}}, >> > "response":{"numFound":1616376,"start":0,"docs":[] >> > }} >> > >> > There is a difference of 5 million records. Can anyone help me >> > understand the behavior? The logs look fine. >> > Thanks >>