Really impossible to say, assuming you're generating correctly-formed documents I don't see how this would fail. So, here's how I'd approach it:
You're assuming that 1> you're getting all the docs back from server A that you have in there and 2> you're correctly sending them all to server B So my guess is that one of these assumptions is somehow wrong, which leaves checking them as "an exercise for the reader". I should think you'd need to put some instrumentation in your SolrJ program. > Simplest is just record the number of docs you read from server A. Is it the correct number? > Record the number of docs you send to server B. Does it match the number read from server A? > Record all the IDs (whatever uniqueKey is) in a Set and report the number at the end. Does it match the count of docs you read from server A? If not, somehow you're getting duplicate docs. Something I've done repeatedly is a silly mistake like while (more docs) { add the doc to the doc list if (doclist.size() > 500) { server.add(doclist); doclist.clear(); } } then fail to do the following outside the while loop to catch the docs I've added to the doclist but not sent because size < 500. if (doclist.size() > 0) { server.add(doclist); } Although why running your SolrJ program repeatedly would "catch up" server B is hard to reconcile with an error like this. This isn't germane to your problem, but generally it's a poor practice to have the SolrJ program commit after sending a batch of docs to the server, let your autocommit settings handle that with (possibly) a single commit at the very end of the run before you exit. Best, Erick On Fri, Jan 2, 2015 at 6:43 AM, Yashveer Rana <captainjackr...@gmail.com> wrote: > I have a solr cloud setup with two collections A & B with different schemas ( > although majority of fields are identical ). > Collection A has ~ 3.6 million documents > Using *solrj 4.7.0 * > > As per a requirement, my application > - reads documents from collection A in batches of 10k > - Creates docs of type B, populates fields from the A type docs > - Calls addBeans() on collection A in batches of 500 and invokes commit > > However, this operation does not add all documents to collection B and falls > short by about 80-90k. On re-executing the operation, there is an increment > in the doc count, but it still does not reach the desired number. On > multiple executions, eventually the count reaches the 3.6 figure > > Just wondering if anyone has encountered such a behaviour before. Havent > seen any errors in solr logs generated either. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Inconsistent-document-addition-tp4177013.html > Sent from the Solr - User mailing list archive at Nabble.com.