Iam also seeing the following in the log. Is it really commiting ??? Now I
am totally confused about how solr 4.x indexes. My relavant update config
is as shown below
<updateHandler class="solr.DirectUpdateHandler2">
<maxPendingDeletes>1</maxPendingDeletes>
<autoCommit>
<maxDocs>100</maxDocs>
<maxTime>120000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
</updateHandler>
[#|2014-03-25T13:44:03.765-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820509
[commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler -
start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
|#]
[#|2014-03-25T13:44:03.766-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=83;_ThreadName=http-thread-pool-8080(4);|820510
[http-thread-pool-8080(4)] INFO
org.apache.solr.update.processor.LogUpdateProcessor - [sitesearchcore]
webapp=/solr-admin path=/update params={wt=javabin&version=2}
{add=[09f693e6-9a6f-11e3-9900-dd917233cf9c]} 0 13
|#]
[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore -
SolrDeletionPolicy.onCommit: commits: num=3
commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9y68,generation=464192}
commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjf,generation=464667}
commit{dir=/data/solr/core/sitesearch-data/index,segFN=segments_9yjg,generation=464668}
|#]
[#|2014-03-25T13:44:03.898-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820642
[commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore - newest
commit generation = 464668
|#]
[#|2014-03-25T13:44:03.908-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820652
[commitScheduler-6-thread-1] INFO
org.apache.solr.search.SolrIndexSearcher - Opening
Searcher@1e2ca86e[sitesearchcore]
realtime
|#]
[#|2014-03-25T13:44:03.909-0400|INFO|glassfish3.1.2|javax.enterprise.system.std.com.sun.enterprise.server.logging|_ThreadID=86;_ThreadName=commitScheduler-6-thread-1;|820653
[commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler -
end_commit_flush
Thanks
Ravi Kiran Bhaskar
On Tue, Mar 25, 2014 at 1:10 PM, Ravi Solr <[email protected]> wrote:
> Thank you very much for responding Mr. Høydahl. I removed the recursion
> which eliminated the stack overflow exception. However, I still
> encountering my main problem with the docs not getting indexed in solr 4.x
> as I mentioned in my original email. The reason I am reindexing is that
> with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
> to add another copyField of all field values into destination "allfields"
>
> As per your suggestion I removed softcommit and had autoCommit to maxDocs
> 100 and maxTime to 120000. I was printing out the indexing call...You can
> clearly see still it does index around 10 at a time (testing code and
> results shown below). Again my code finished fully and just for a good
> measure I commited manually after 10 minutes still when I query I only see
> "13513" docs got indexed.
>
> There must be something else I am missing
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> <lst name="params">
> <str name="q">allfields:[* TO *]</str>
> <str name="wt">xml</str>
> <str name="rows">0</str>
> </lst>
> </lst>
> <result name="response" numFound="13513" start="0"/></response>
>
> TEST INDEXER CODE
> -------------------------------
> Long total = null;
> Integer start = 0;
> Integer rows = 100;
> while(total == null || total >= (start+rows)) {
>
> SolrQuery query = new SolrQuery();
> query.setQuery("*:*");
> query.setSort("displaydatetime", ORDER.desc);
>
> query.addFilterQuery("-allfields:[* TO *]");
> QueryResponse resp = server.query(query);
> SolrDocumentList list = resp.getResults();
> total = list.getNumFound();
>
> if(list != null && !list.isEmpty()) {
> for(SolrDocument doc : list) {
> SolrInputDocument iDoc =
> ClientUtils.toSolrInputDocument(doc);
> //To index full doc again
> iDoc.removeField("_version_");
> server.add(iDoc);
>
> }
>
> System.out.println("Indexed " + (start+rows) + "/" +
> total);
> start = (start+rows);
> }
> }
>
> System.out.println("COMPLETELY DONE");
>
> System.out output
> -------------------------
> Indexed 1252100/1256575
> Indexed 1252200/1256575
> Indexed 1252300/1256575
> Indexed 1252400/1256575
> Indexed 1252500/1256575
> Indexed 1252600/1256575
> Indexed 1252700/1256575
> Indexed 1252800/1256575
> Indexed 1252900/1256575
> Indexed 1253000/1256575
> Indexed 1253100/1256566
> Indexed 1253200/1256566
> Indexed 1253300/1256566
> Indexed 1253400/1256566
> Indexed 1253500/1256566
> Indexed 1253600/1256566
> Indexed 1253700/1256566
> Indexed 1253800/1256566
> Indexed 1253900/1256566
> Indexed 1254000/1256566
> Indexed 1254100/1256566
> Indexed 1254200/1256566
> Indexed 1254300/1256566
> Indexed 1254400/1256566
> Indexed 1254500/1256566
> Indexed 1254600/1256566
> Indexed 1254700/1256566
> Indexed 1254800/1256566
> Indexed 1254900/1256566
> Indexed 1255000/1256566
> Indexed 1255100/1256566
> Indexed 1255200/1256566
> Indexed 1255300/1256566
> Indexed 1255400/1256566
> Indexed 1255500/1256566
> Indexed 1255600/1256566
> Indexed 1255700/1256557
> Indexed 1255800/1256557
> Indexed 1255900/1256557
> Indexed 1256000/1256557
> Indexed 1256100/1256557
> Indexed 1256200/1256557
> Indexed 1256300/1256557
> Indexed 1256400/1256557
> Indexed 1256500/1256557
> COMPLETELY DONE
>
>
> Thanks,
> Ravi Kiran Bhaskar
>
>
>
> On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <[email protected]>wrote:
>
>> Hi,
>>
>> Seems you try to reindex from one server to the other.
>>
>> Be aware that it could be easier for you to simply copy the whole index
>> folder over to your 4.6.1 server and start Solr as it will be able to read
>> your 3.x index. This is unless you also want to do major upgrades of your
>> schema or update processors so that you'll need a re-index anyway.
>>
>> If you believe you really need a re-index, then please try to batch index
>> without triggering commits every few seconds - this is really heavy on the
>> system and completely unnecessary. You won't get the benefit of SoftCommit
>> if you're not running SolrCloud, so no need to configure that.
>>
>> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000
>> (every 2min).
>> Further please index without 1s commitWithin, i.e. instead of
>> > server.add(iDoc, 1000);
>> use
>> > server.add(iDoc);
>>
>> This will make sure the server gets room to breathe and not constantly
>> generating new indices.
>>
>> Finally, it's probably not a good idea to use recursion here. You really
>> don't need to, filling up your stack. You can instead refactor the method
>> to do the whole indexing. And a hint is that it is generally better to ask
>> for ALL documents in one go and stream to the end rather than increasing
>> offsets with new queries all the time - because high offsets/start can be
>> time consuming, especially with multiple shards. If you increase the
>> timeout enough you should be able to retrieve all documents in one go!
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 24. mars 2014 kl. 22:36 skrev Ravi Solr <[email protected]>:
>>
>> > Hello,
>> > We are trying to reindex as part of our move from 3.6.2 to 4.6.1
>> > and have faced various issues reindexing 1.5 Million docs. We dont use
>> > solrcloud, its still Master/Slave config. For testing this Iam using a
>> > single test server reading from it and putting back into same index.
>> >
>> > We send docs in batches of 100 but only 10/100 are getting indexed, is
>> this
>> > related to the maxBufferedAddsPerServer setting that is hard coded ??
>> Also
>> > I tried to play with autocommit and softcommit settings but in vain.
>> >
>> > <autoCommit>
>> > <maxDocs>5</maxDocs>
>> > <maxTime>5000</maxTime>
>> > <openSearcher>true</openSearcher>
>> > </autoCommit>
>> >
>> > <autoSoftCommit>
>> > <maxTime>1000</maxTime>
>> > </autoSoftCommit>
>> >
>> > I use these on the test system just to check if docs are being indexed,
>> but
>> > even with a batch of 5 my solrj client code runs faster than indexing
>> > causing some docs to not get indexed. The function that's indexing is a
>> > recursive method call (shown below) which fails after sometime with
>> stack
>> > overflow (I did not have this issue with 3.6.2 with same code)
>> >
>> > private static void processDocs(HttpSolrServer server, Integer start,
>> > Integer rows) throws Exception {
>> > SolrQuery query = new SolrQuery();
>> > query.setQuery("*:*");
>> > query.addFilterQuery("-allfields:[* TO *]");
>> > QueryResponse resp = server.query(query);
>> > SolrDocumentList list = resp.getResults();
>> > Long total = list.getNumFound();
>> >
>> > if(list != null && !list.isEmpty()) {
>> > for(SolrDocument doc : list) {
>> > SolrInputDocument iDoc =
>> > ClientUtils.toSolrInputDocument(doc);
>> > //To index full doc again
>> > iDoc.removeField("_version_");
>> > server.add(iDoc, 1000);
>> > }
>> >
>> > System.out.println("Indexed " + (start+rows) + "/" + total);
>> > if (total >= (start + rows)) {
>> > processDocs(server, (start + rows), rows);
>> > }
>> > }
>> > }
>> >
>> > I also tried turning on the updateLog but that was filling up so fast to
>> > the point where it is useless.
>> >
>> > How do we do bulk updates in solr 4.x environment ?? Is there any
>> setting
>> > that Iam missing ??
>> >
>> > Thanks
>> >
>> > Ravi Kiran Bhaskar
>> > Technical Architect
>> > The Washington Post
>>
>>
>