Re: Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

Erick Erickson Fri, 15 Apr 2016 09:59:49 -0700

The simplest test to see if there are duplicates is to
check the maxDoc and numDocs in the admin UI. If
they're different then you have duplicates. NOTE:
this is not definitive, and you MUST NOT run optimize
before you look. But it's quick. I'd delete all docs
before trying this first though.


Second, you say "there are no errors in any logs". Are
you completely sure that some of the docs didn't have errors?
Just double checking here since there can be a bunch
of logs as they're rolled over. And I'm thinking of the Solr
logs here.

And do note that the UUID field (assuming we're talking
the UUIDUpdateProcessorFactory here) only adds a UUID
if the field is _not_ present in the doc. Even if the field is
empty. The test is something like
if (inputDoc.get(field) == null) {
    add the UUID field
}

So even if the doc has the empty string as the UUID field,
no new UUID field will be added...

Best,
Erick

On Thu, Apr 14, 2016 at 11:51 PM, cqlangyi <cqlan...@163.com> wrote:
> hi guys,
>
>
> thank you very much for the help. sorry been so lated to reply.
>
>
> 1. "commit" didn't help.
>     after commit, the 'numFound' of "*:*" query is still the same.
>
>
> 2. "id" field in every doc is generated by solr using UUID, i have
>     idea how to check if there is a duplicated one. but i assuming
>     there shouldn't be, unless solr cloud has some known bug when
>     using UUID in a distributed environment.
>
>
> the environment is
>
>
> solr cloud with:
> 3 linux boxes, use zookeeper 3.4.6  + solr 5.2.1, oracle JDK 1.7.80
>
>
> any ideas?
>
>
> thank you very much.
>
>
>
>
>
>
> At 2016-04-05 12:09:14, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
>>Both of us implied it, but to be completely clear - if you have a duplicate
>>ID in your data set, SOLR will throw away previous documents with that ID
>>and index the new one.  That's fine if your duplicates really are
>>duplicates - it's not OK if there's a problem in the data set and the
>>duplicates ID's are on documents that are actually unique.
>>
>>On Mon, Apr 4, 2016 at 9:51 PM, John Bickerstaff <j...@johnbickerstaff.com>
>>wrote:
>>
>>> Sweet - that's a good point - I ran into that too - I had not run the
>>> commit for the last "batch" (I was using SolrJ) and so numbers didn't match
>>> until I did.
>>>
>>> On Mon, Apr 4, 2016 at 9:50 PM, Binoy Dalal <binoydala...@gmail.com>
>>> wrote:
>>>
>>>> 1) Are you sure you don't have duplicates?
>>>> 2) All of your records might have been indexed but a new searcher may not
>>>> have opened on the updated index yet. Try issuing a commit and see if that
>>>> works.
>>>>
>>>> On Tue, 5 Apr 2016, 08:56 cqlangyi, <cqlan...@163.com> wrote:
>>>>
>>>> > hi there,
>>>> >
>>>> >
>>>> > i have an solr 5.2.1,  when i do data import, after the job is done,
>>>> it's
>>>> > shown 165,191 rows processed successfully.
>>>> >
>>>> >
>>>> > but when i query with *:*, the "numFound" shown only 163,349 docs in
>>>> index.
>>>> >
>>>> >
>>>> > when i tred to do it again, , it's shown 165,191 rows processed
>>>> > successfully. but the *:* query result now is 162,390.
>>>> >
>>>> >
>>>> > no errors in any log,
>>>> >
>>>> >
>>>> > any idea?
>>>> >
>>>> >
>>>> > thank you very much!
>>>> >
>>>> >
>>>> > cq
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > At 2016-04-05 09:19:48, "Chris Hostetter" <hossman_luc...@fucit.org>
>>>> > wrote:
>>>> > >
>>>> > >: I am not sure how to use "Sort By Function" for Case.
>>>> > >:
>>>> > >:
>>>> |10#40|14#19|33#17|27#6|15#6|19#5|7#2|6#1|29#1|5#1|30#1|28#1|12#0|20#0|
>>>> > >:
>>>> > >: Can you tell how to fetch 40 when input is 10.
>>>> > >
>>>> > >Something like...
>>>> > >
>>>> >
>>>> >
>>>> >if(termfreq(f,10),40,if(termfreq(f,14),19,if(termfreq(f,33),17,....)))))))))))
>>>> > >
>>>> > >But i suspect there may be a much better way to achieve your ultimate
>>>> goal
>>>> > >if you tell us what it is.  what do these fields represent? what makes
>>>> > >these numeric valuessignificant? do you know which values are
>>>> significant
>>>> > >when indexing, or do they vary for every query?
>>>> > >
>>>> > >https://people.apache.org/~hossman/#xyproblem
>>>> > >XY Problem
>>>> > >
>>>> > >Your question appears to be an "XY Problem" ... that is: you are
>>>> dealing
>>>> > >with "X", you are assuming "Y" will help you, and you are asking about
>>>> "Y"
>>>> > >without giving more details about the "X" so that we can understand the
>>>> > >full issue.  Perhaps the best solution doesn't involve "Y" at all?
>>>> > >See Also: http://www.perlmonks.org/index.pl?node_id=542341
>>>> > >
>>>> > >
>>>> > >
>>>> > >
>>>> > >-Hoss
>>>> > >http://www.lucidworks.com/
>>>> >
>>>> --
>>>> Regards,
>>>> Binoy Dalal
>>>>
>>>
>>>

Re: Re: solr 5.2.1, data import issue, shown processed rows doesn't match acturally indexed doc quantity.

Reply via email to