Thank you very much for responding Mr. Høydahl. I removed the recursion which eliminated the stack overflow exception. However, I still encountering my main problem with the docs not getting indexed in solr 4.x as I mentioned in my original email. The reason I am reindexing is that with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted to add another copyField of all field values into destination "allfields"
As per your suggestion I removed softcommit and had autoCommit to maxDocs 100 and maxTime to 120000. I was printing out the indexing call...You can clearly see still it does index around 10 at a time (testing code and results shown below). Again my code finished fully and just for a good measure I commited manually after 10 minutes still when I query I only see "13513" docs got indexed. There must be something else I am missing <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="q">allfields:[* TO *]</str> <str name="wt">xml</str> <str name="rows">0</str> </lst> </lst> <result name="response" numFound="13513" start="0"/></response> TEST INDEXER CODE ------------------------------- Long total = null; Integer start = 0; Integer rows = 100; while(total == null || total >= (start+rows)) { SolrQuery query = new SolrQuery(); query.setQuery("*:*"); query.setSort("displaydatetime", ORDER.desc); query.addFilterQuery("-allfields:[* TO *]"); QueryResponse resp = server.query(query); SolrDocumentList list = resp.getResults(); total = list.getNumFound(); if(list != null && !list.isEmpty()) { for(SolrDocument doc : list) { SolrInputDocument iDoc = ClientUtils.toSolrInputDocument(doc); //To index full doc again iDoc.removeField("_version_"); server.add(iDoc); } System.out.println("Indexed " + (start+rows) + "/" + total); start = (start+rows); } } System.out.println("COMPLETELY DONE"); System.out output ------------------------- Indexed 1252100/1256575 Indexed 1252200/1256575 Indexed 1252300/1256575 Indexed 1252400/1256575 Indexed 1252500/1256575 Indexed 1252600/1256575 Indexed 1252700/1256575 Indexed 1252800/1256575 Indexed 1252900/1256575 Indexed 1253000/1256575 Indexed 1253100/1256566 Indexed 1253200/1256566 Indexed 1253300/1256566 Indexed 1253400/1256566 Indexed 1253500/1256566 Indexed 1253600/1256566 Indexed 1253700/1256566 Indexed 1253800/1256566 Indexed 1253900/1256566 Indexed 1254000/1256566 Indexed 1254100/1256566 Indexed 1254200/1256566 Indexed 1254300/1256566 Indexed 1254400/1256566 Indexed 1254500/1256566 Indexed 1254600/1256566 Indexed 1254700/1256566 Indexed 1254800/1256566 Indexed 1254900/1256566 Indexed 1255000/1256566 Indexed 1255100/1256566 Indexed 1255200/1256566 Indexed 1255300/1256566 Indexed 1255400/1256566 Indexed 1255500/1256566 Indexed 1255600/1256566 Indexed 1255700/1256557 Indexed 1255800/1256557 Indexed 1255900/1256557 Indexed 1256000/1256557 Indexed 1256100/1256557 Indexed 1256200/1256557 Indexed 1256300/1256557 Indexed 1256400/1256557 Indexed 1256500/1256557 COMPLETELY DONE Thanks, Ravi Kiran Bhaskar On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <jan....@cominvent.com> wrote: > Hi, > > Seems you try to reindex from one server to the other. > > Be aware that it could be easier for you to simply copy the whole index > folder over to your 4.6.1 server and start Solr as it will be able to read > your 3.x index. This is unless you also want to do major upgrades of your > schema or update processors so that you'll need a re-index anyway. > > If you believe you really need a re-index, then please try to batch index > without triggering commits every few seconds - this is really heavy on the > system and completely unnecessary. You won't get the benefit of SoftCommit > if you're not running SolrCloud, so no need to configure that. > > I would change your <autoCommit> into maxDocs=10000 and maxTime=120000 > (every 2min). > Further please index without 1s commitWithin, i.e. instead of > > server.add(iDoc, 1000); > use > > server.add(iDoc); > > This will make sure the server gets room to breathe and not constantly > generating new indices. > > Finally, it's probably not a good idea to use recursion here. You really > don't need to, filling up your stack. You can instead refactor the method > to do the whole indexing. And a hint is that it is generally better to ask > for ALL documents in one go and stream to the end rather than increasing > offsets with new queries all the time - because high offsets/start can be > time consuming, especially with multiple shards. If you increase the > timeout enough you should be able to retrieve all documents in one go! > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > 24. mars 2014 kl. 22:36 skrev Ravi Solr <ravis...@gmail.com>: > > > Hello, > > We are trying to reindex as part of our move from 3.6.2 to 4.6.1 > > and have faced various issues reindexing 1.5 Million docs. We dont use > > solrcloud, its still Master/Slave config. For testing this Iam using a > > single test server reading from it and putting back into same index. > > > > We send docs in batches of 100 but only 10/100 are getting indexed, is > this > > related to the maxBufferedAddsPerServer setting that is hard coded ?? > Also > > I tried to play with autocommit and softcommit settings but in vain. > > > > <autoCommit> > > <maxDocs>5</maxDocs> > > <maxTime>5000</maxTime> > > <openSearcher>true</openSearcher> > > </autoCommit> > > > > <autoSoftCommit> > > <maxTime>1000</maxTime> > > </autoSoftCommit> > > > > I use these on the test system just to check if docs are being indexed, > but > > even with a batch of 5 my solrj client code runs faster than indexing > > causing some docs to not get indexed. The function that's indexing is a > > recursive method call (shown below) which fails after sometime with > stack > > overflow (I did not have this issue with 3.6.2 with same code) > > > > private static void processDocs(HttpSolrServer server, Integer start, > > Integer rows) throws Exception { > > SolrQuery query = new SolrQuery(); > > query.setQuery("*:*"); > > query.addFilterQuery("-allfields:[* TO *]"); > > QueryResponse resp = server.query(query); > > SolrDocumentList list = resp.getResults(); > > Long total = list.getNumFound(); > > > > if(list != null && !list.isEmpty()) { > > for(SolrDocument doc : list) { > > SolrInputDocument iDoc = > > ClientUtils.toSolrInputDocument(doc); > > //To index full doc again > > iDoc.removeField("_version_"); > > server.add(iDoc, 1000); > > } > > > > System.out.println("Indexed " + (start+rows) + "/" + total); > > if (total >= (start + rows)) { > > processDocs(server, (start + rows), rows); > > } > > } > > } > > > > I also tried turning on the updateLog but that was filling up so fast to > > the point where it is useless. > > > > How do we do bulk updates in solr 4.x environment ?? Is there any setting > > that Iam missing ?? > > > > Thanks > > > > Ravi Kiran Bhaskar > > Technical Architect > > The Washington Post > >