Hi,

You should check whether there were deletions by navigating to solr admin
core admin page. Example url
http://localhost:8983/solr/#/~cores/test_shard1_replica1, check for
numDocs, maxDocs and deletedDocs. If numDocs remains equal to maxDocs, then
you confirm that there were no updations (as recommended by Upayavira)

HTH

On Mon, Aug 17, 2015 at 4:41 AM, Pattabiraman, Meenakshisundaram <
pattabiraman.meenakshisunda...@aig.com> wrote:

> " You almost certainly have a non-unique ID field."
> Yes it is not absolutely unique but do not think it is at this 1 to 6
> ratio.
>
> "Try it with a clean index, and then review the number of deleted
> documents (updates are a delete then insert action) "
> I tried on a new instance - same effect. I do not see any deletions. Is
> there a way to determine this from the logs to confirm that the behavior is
> due to non-uniqueness? This will serve as an assurance.
> Thanks
>
> <str name="Total Rows Fetched">6843469</str>
> <str name="Total Documents Processed">6843469</str>
> <str name="Total Documents Skipped">0</str>
> <str name="Full Dump Started">2015-08-16 21:22:24</str>
> <str name="">
> Indexing completed. Added/Updated: 6843469 documents. Deleted 0 documents.
> </str>
> <str name="Committed">2015-08-16 22:31:47</str>
>
> Whereas '*:*'
>     "params":{
>       "q":"*:*"}},
>   "response":{"numFound":1143108,"start":0,"docs":[
>
> -----Original Message-----
> From: Upayavira [mailto:u...@odoko.co.uk]
> Sent: Sunday, August 16, 2015 3:18 PM
> To: solr-user@lucene.apache.org
> Subject: Re: No. of records mismatch
>
> You almost certainly have a non-unique ID field. Some documents are
> overwritten during indexing. Try it with a clean index, and then review the
> number of deleted documents (updates are a delete then insert action).
> Deletes are calculated with maxDocs minus numDocs.
>
> Upayavira
>
> On Sun, Aug 16, 2015, at 07:18 PM, Pattabiraman, Meenakshisundaram
> wrote:
> > I did a dataimport with 'clean' set to false.
> > The DIH status upon completion was:
> >
> > <str name="status">idle</str>
> > <str name="importResponse"/>
> > <lst name="statusMessages">
> > <str name="Total Requests made to DataSource">1</str> <str name="Total
> > Rows Fetched">6843427</str> <str name="Total Documents
> > Processed">6843427</str> <str name="Total Documents Skipped">0</str>
> > <str name="Full Dump Started">2015-08-16 16:50:54</str> <str name="">
> > Indexing completed. Added/Updated: 6843427 documents. Deleted 0
> > documents.
> > </str>
> > Whereas when I query using 'query?q=*:*&rows=0', I get the following
> > count {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":1,
> >     "params":{
> >       "q":"*:*",
> >       "rows":"0"}},
> >   "response":{"numFound":1616376,"start":0,"docs":[]
> >   }}
> >
> > There is a difference of 5 million records. Can anyone help me
> > understand the behavior? The logs look fine.
> > Thanks
>

Reply via email to