Thank you very much for responding Mr. Høydahl. I removed the recursion
which eliminated the stack overflow exception. However, I still
encountering my main problem with the docs not getting indexed in solr 4.x
as I mentioned in my original email. The reason I am reindexing is that
with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
to add another copyField of all field values into destination "allfields"
As per your suggestion I removed softcommit and had autoCommit to maxDocs
100 and maxTime to 120000. I was printing out the indexing call...You can
clearly see still it does index around 10 at a time (testing code and
results shown below). Again my code finished fully and just for a good
measure I commited manually after 10 minutes still when I query I only see
"13513" docs got indexed.
There must be something else I am missing
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="q">allfields:[* TO *]</str>
<str name="wt">xml</str>
<str name="rows">0</str>
</lst>
</lst>
<result name="response" numFound="13513" start="0"/></response>
TEST INDEXER CODE
-------------------------------
Long total = null;
Integer start = 0;
Integer rows = 100;
while(total == null || total >= (start+rows)) {
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.setSort("displaydatetime", ORDER.desc);
query.addFilterQuery("-allfields:[* TO *]");
QueryResponse resp = server.query(query);
SolrDocumentList list = resp.getResults();
total = list.getNumFound();
if(list != null && !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
//To index full doc again
iDoc.removeField("_version_");
server.add(iDoc);
}
System.out.println("Indexed " + (start+rows) + "/" + total);
start = (start+rows);
}
}
System.out.println("COMPLETELY DONE");
System.out output
-------------------------
Indexed 1252100/1256575
Indexed 1252200/1256575
Indexed 1252300/1256575
Indexed 1252400/1256575
Indexed 1252500/1256575
Indexed 1252600/1256575
Indexed 1252700/1256575
Indexed 1252800/1256575
Indexed 1252900/1256575
Indexed 1253000/1256575
Indexed 1253100/1256566
Indexed 1253200/1256566
Indexed 1253300/1256566
Indexed 1253400/1256566
Indexed 1253500/1256566
Indexed 1253600/1256566
Indexed 1253700/1256566
Indexed 1253800/1256566
Indexed 1253900/1256566
Indexed 1254000/1256566
Indexed 1254100/1256566
Indexed 1254200/1256566
Indexed 1254300/1256566
Indexed 1254400/1256566
Indexed 1254500/1256566
Indexed 1254600/1256566
Indexed 1254700/1256566
Indexed 1254800/1256566
Indexed 1254900/1256566
Indexed 1255000/1256566
Indexed 1255100/1256566
Indexed 1255200/1256566
Indexed 1255300/1256566
Indexed 1255400/1256566
Indexed 1255500/1256566
Indexed 1255600/1256566
Indexed 1255700/1256557
Indexed 1255800/1256557
Indexed 1255900/1256557
Indexed 1256000/1256557
Indexed 1256100/1256557
Indexed 1256200/1256557
Indexed 1256300/1256557
Indexed 1256400/1256557
Indexed 1256500/1256557
COMPLETELY DONE
Thanks,
Ravi Kiran Bhaskar
On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <[email protected]> wrote:
> Hi,
>
> Seems you try to reindex from one server to the other.
>
> Be aware that it could be easier for you to simply copy the whole index
> folder over to your 4.6.1 server and start Solr as it will be able to read
> your 3.x index. This is unless you also want to do major upgrades of your
> schema or update processors so that you'll need a re-index anyway.
>
> If you believe you really need a re-index, then please try to batch index
> without triggering commits every few seconds - this is really heavy on the
> system and completely unnecessary. You won't get the benefit of SoftCommit
> if you're not running SolrCloud, so no need to configure that.
>
> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000
> (every 2min).
> Further please index without 1s commitWithin, i.e. instead of
> > server.add(iDoc, 1000);
> use
> > server.add(iDoc);
>
> This will make sure the server gets room to breathe and not constantly
> generating new indices.
>
> Finally, it's probably not a good idea to use recursion here. You really
> don't need to, filling up your stack. You can instead refactor the method
> to do the whole indexing. And a hint is that it is generally better to ask
> for ALL documents in one go and stream to the end rather than increasing
> offsets with new queries all the time - because high offsets/start can be
> time consuming, especially with multiple shards. If you increase the
> timeout enough you should be able to retrieve all documents in one go!
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 24. mars 2014 kl. 22:36 skrev Ravi Solr <[email protected]>:
>
> > Hello,
> > We are trying to reindex as part of our move from 3.6.2 to 4.6.1
> > and have faced various issues reindexing 1.5 Million docs. We dont use
> > solrcloud, its still Master/Slave config. For testing this Iam using a
> > single test server reading from it and putting back into same index.
> >
> > We send docs in batches of 100 but only 10/100 are getting indexed, is
> this
> > related to the maxBufferedAddsPerServer setting that is hard coded ??
> Also
> > I tried to play with autocommit and softcommit settings but in vain.
> >
> > <autoCommit>
> > <maxDocs>5</maxDocs>
> > <maxTime>5000</maxTime>
> > <openSearcher>true</openSearcher>
> > </autoCommit>
> >
> > <autoSoftCommit>
> > <maxTime>1000</maxTime>
> > </autoSoftCommit>
> >
> > I use these on the test system just to check if docs are being indexed,
> but
> > even with a batch of 5 my solrj client code runs faster than indexing
> > causing some docs to not get indexed. The function that's indexing is a
> > recursive method call (shown below) which fails after sometime with
> stack
> > overflow (I did not have this issue with 3.6.2 with same code)
> >
> > private static void processDocs(HttpSolrServer server, Integer start,
> > Integer rows) throws Exception {
> > SolrQuery query = new SolrQuery();
> > query.setQuery("*:*");
> > query.addFilterQuery("-allfields:[* TO *]");
> > QueryResponse resp = server.query(query);
> > SolrDocumentList list = resp.getResults();
> > Long total = list.getNumFound();
> >
> > if(list != null && !list.isEmpty()) {
> > for(SolrDocument doc : list) {
> > SolrInputDocument iDoc =
> > ClientUtils.toSolrInputDocument(doc);
> > //To index full doc again
> > iDoc.removeField("_version_");
> > server.add(iDoc, 1000);
> > }
> >
> > System.out.println("Indexed " + (start+rows) + "/" + total);
> > if (total >= (start + rows)) {
> > processDocs(server, (start + rows), rows);
> > }
> > }
> > }
> >
> > I also tried turning on the updateLog but that was filling up so fast to
> > the point where it is useless.
> >
> > How do we do bulk updates in solr 4.x environment ?? Is there any setting
> > that Iam missing ??
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> > Technical Architect
> > The Washington Post
>
>