Iam also seeing the following in the log. Is it really commiting ??? Now I am totally confused about how solr 4.x indexes. My relavant update config is as shown below
<updateHandler class="solr.DirectUpdateHandler2"> <maxPendingDeletes>1</maxPendingDeletes> <autoCommit> <maxDocs>100</maxDocs> <maxTime>120000</maxTime> <openSearcher>false</openSearcher> </autoCommit> </updateHandler> [#|2014-03-25T13:44:03.765-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820509 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler - start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} |#] [#|2014-03-25T13:44:03.766-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=83;_ThreadName=http-thread-pool-8080(4);|820510 [http-thread-pool-8080(4)] INFO org.apache.solr.update.processor.LogUpdateProcessor - [sitesearchcore] webapp=/solr-admin path=/update params={wt=javabin&version=2} {add=[09f693e6-9a6f-11e3-9900-dd917233cf9c]} 0 13 |#] [#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore - SolrDeletionPolicy.onCommit: commits: num=3 commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9y68,generation=464192} commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjf,generation=464667} commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjg,generation=464668} |#] [#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore - newest commit generation = 464668 |#] [#|2014-03-25T13:44:03.908-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820652 [commitScheduler-6-thread-1] INFO org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1e2ca86e[sitesearchcore] realtime |#] [#|2014-03-25T13:44:03.909-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820653 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler - end_commit_flush Thanks Ravi Kiran Bhaskar On Tue, Mar 25, 2014 at 1:10 PM, Ravi Solr <ravis...@gmail.com> wrote: > Thank you very much for responding Mr. Høydahl. I removed the recursion > which eliminated the stack overflow exception. However, I still > encountering my main problem with the docs not getting indexed in solr 4.x > as I mentioned in my original email. The reason I am reindexing is that > with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted > to add another copyField of all field values into destination "allfields" > > As per your suggestion I removed softcommit and had autoCommit to maxDocs > 100 and maxTime to 120000. I was printing out the indexing call...You can > clearly see still it does index around 10 at a time (testing code and > results shown below). Again my code finished fully and just for a good > measure I commited manually after 10 minutes still when I query I only see > "13513" docs got indexed. > > There must be something else I am missing > > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">1</int> > <lst name="params"> > <str name="q">allfields:[* TO *]</str> > <str name="wt">xml</str> > <str name="rows">0</str> > </lst> > </lst> > <result name="response" numFound="13513" start="0"/></response> > > TEST INDEXER CODE > ------------------------------- > Long total = null; > Integer start = 0; > Integer rows = 100; > while(total == null || total >= (start+rows)) { > > SolrQuery query = new SolrQuery(); > query.setQuery("*:*"); > query.setSort("displaydatetime", ORDER.desc); > > query.addFilterQuery("-allfields:[* TO *]"); > QueryResponse resp = server.query(query); > SolrDocumentList list = resp.getResults(); > total = list.getNumFound(); > > if(list != null && !list.isEmpty()) { > for(SolrDocument doc : list) { > SolrInputDocument iDoc = > ClientUtils.toSolrInputDocument(doc); > //To index full doc again > iDoc.removeField("_version_"); > server.add(iDoc); > > } > > System.out.println("Indexed " + (start+rows) + "/" + > total); > start = (start+rows); > } > } > > System.out.println("COMPLETELY DONE"); > > System.out output > ------------------------- > Indexed 1252100/1256575 > Indexed 1252200/1256575 > Indexed 1252300/1256575 > Indexed 1252400/1256575 > Indexed 1252500/1256575 > Indexed 1252600/1256575 > Indexed 1252700/1256575 > Indexed 1252800/1256575 > Indexed 1252900/1256575 > Indexed 1253000/1256575 > Indexed 1253100/1256566 > Indexed 1253200/1256566 > Indexed 1253300/1256566 > Indexed 1253400/1256566 > Indexed 1253500/1256566 > Indexed 1253600/1256566 > Indexed 1253700/1256566 > Indexed 1253800/1256566 > Indexed 1253900/1256566 > Indexed 1254000/1256566 > Indexed 1254100/1256566 > Indexed 1254200/1256566 > Indexed 1254300/1256566 > Indexed 1254400/1256566 > Indexed 1254500/1256566 > Indexed 1254600/1256566 > Indexed 1254700/1256566 > Indexed 1254800/1256566 > Indexed 1254900/1256566 > Indexed 1255000/1256566 > Indexed 1255100/1256566 > Indexed 1255200/1256566 > Indexed 1255300/1256566 > Indexed 1255400/1256566 > Indexed 1255500/1256566 > Indexed 1255600/1256566 > Indexed 1255700/1256557 > Indexed 1255800/1256557 > Indexed 1255900/1256557 > Indexed 1256000/1256557 > Indexed 1256100/1256557 > Indexed 1256200/1256557 > Indexed 1256300/1256557 > Indexed 1256400/1256557 > Indexed 1256500/1256557 > COMPLETELY DONE > > > Thanks, > Ravi Kiran Bhaskar > > > > On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <jan....@cominvent.com>wrote: > >> Hi, >> >> Seems you try to reindex from one server to the other. >> >> Be aware that it could be easier for you to simply copy the whole index >> folder over to your 4.6.1 server and start Solr as it will be able to read >> your 3.x index. This is unless you also want to do major upgrades of your >> schema or update processors so that you'll need a re-index anyway. >> >> If you believe you really need a re-index, then please try to batch index >> without triggering commits every few seconds - this is really heavy on the >> system and completely unnecessary. You won't get the benefit of SoftCommit >> if you're not running SolrCloud, so no need to configure that. >> >> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000 >> (every 2min). >> Further please index without 1s commitWithin, i.e. instead of >> > server.add(iDoc, 1000); >> use >> > server.add(iDoc); >> >> This will make sure the server gets room to breathe and not constantly >> generating new indices. >> >> Finally, it's probably not a good idea to use recursion here. You really >> don't need to, filling up your stack. You can instead refactor the method >> to do the whole indexing. And a hint is that it is generally better to ask >> for ALL documents in one go and stream to the end rather than increasing >> offsets with new queries all the time - because high offsets/start can be >> time consuming, especially with multiple shards. If you increase the >> timeout enough you should be able to retrieve all documents in one go! >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> >> 24. mars 2014 kl. 22:36 skrev Ravi Solr <ravis...@gmail.com>: >> >> > Hello, >> > We are trying to reindex as part of our move from 3.6.2 to 4.6.1 >> > and have faced various issues reindexing 1.5 Million docs. We dont use >> > solrcloud, its still Master/Slave config. For testing this Iam using a >> > single test server reading from it and putting back into same index. >> > >> > We send docs in batches of 100 but only 10/100 are getting indexed, is >> this >> > related to the maxBufferedAddsPerServer setting that is hard coded ?? >> Also >> > I tried to play with autocommit and softcommit settings but in vain. >> > >> > <autoCommit> >> > <maxDocs>5</maxDocs> >> > <maxTime>5000</maxTime> >> > <openSearcher>true</openSearcher> >> > </autoCommit> >> > >> > <autoSoftCommit> >> > <maxTime>1000</maxTime> >> > </autoSoftCommit> >> > >> > I use these on the test system just to check if docs are being indexed, >> but >> > even with a batch of 5 my solrj client code runs faster than indexing >> > causing some docs to not get indexed. The function that's indexing is a >> > recursive method call (shown below) which fails after sometime with >> stack >> > overflow (I did not have this issue with 3.6.2 with same code) >> > >> > private static void processDocs(HttpSolrServer server, Integer start, >> > Integer rows) throws Exception { >> > SolrQuery query = new SolrQuery(); >> > query.setQuery("*:*"); >> > query.addFilterQuery("-allfields:[* TO *]"); >> > QueryResponse resp = server.query(query); >> > SolrDocumentList list = resp.getResults(); >> > Long total = list.getNumFound(); >> > >> > if(list != null && !list.isEmpty()) { >> > for(SolrDocument doc : list) { >> > SolrInputDocument iDoc = >> > ClientUtils.toSolrInputDocument(doc); >> > //To index full doc again >> > iDoc.removeField("_version_"); >> > server.add(iDoc, 1000); >> > } >> > >> > System.out.println("Indexed " + (start+rows) + "/" + total); >> > if (total >= (start + rows)) { >> > processDocs(server, (start + rows), rows); >> > } >> > } >> > } >> > >> > I also tried turning on the updateLog but that was filling up so fast to >> > the point where it is useless. >> > >> > How do we do bulk updates in solr 4.x environment ?? Is there any >> setting >> > that Iam missing ?? >> > >> > Thanks >> > >> > Ravi Kiran Bhaskar >> > Technical Architect >> > The Washington Post >> >> >