Re: Solr doesn't import the whole data

Erick Erickson Fri, 27 Apr 2018 08:42:07 -0700

Any document that has the same value in the "id" field (or whatever
you've defined in <uniqueKey> in your schema) will replace any older
documents with the same value. So my guess is that your data has some
duplicate keys.


A simple way to check is to watch maxDoc .vs. numDocs in the admin UI
for a particular replica. If those numbers diverge as your indexing
job is running, then you're overwriting documents.

Do note that background merging will purge deleted docs from the
segments being merged, so it's not enough to just look once.

Best,
Erick

On Fri, Apr 27, 2018 at 5:22 AM, LOPEZ-CORTES Mariano-ext
<mariano.lopez-cortes-...@pole-emploi.fr> wrote:
> Hi
>
> We've finished the data import of 40 millions data into a 3 node Solr cluster.
>
> After injecting all data via a Java program, we've noticed that the number of 
> documents was less than expected (in 100000 rows).
> No exception, no error.
>
> Some config details:
>
>                                <autoCommit>
>                                                <maxTime>15000</maxTime>
>                                                
> <openSearcher>false</openSearcher>
>                                </autoCommit>
>                                 <autoSoftCommit>
>                                                <maxTime>15000</maxTime>
>                                </autoSoftCommit>
>
> We have no commits in the client application.
>
> But also, when consulting via admin, we've noticed that the number total of 
> rows in Solr increase slowly (numFound).
>
> It's a normal behaviour? What's the problem?
>
> Thanks!
>
>
>
>

Re: Solr doesn't import the whole data

Reply via email to