Hello,
        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
and have faced various issues reindexing 1.5 Million docs. We dont use
solrcloud, its still Master/Slave config. For testing this Iam using a
single test server reading from it and putting back into same index.

We send docs in batches of 100 but only 10/100 are getting indexed, is this
related to the maxBufferedAddsPerServer setting that is hard coded ?? Also
I tried to play with autocommit and softcommit settings but in vain.

    <autoCommit>
       <maxDocs>5</maxDocs>
       <maxTime>5000</maxTime>
       <openSearcher>true</openSearcher>
    </autoCommit>

    <autoSoftCommit>
        <maxTime>1000</maxTime>
    </autoSoftCommit>

I use these on the test system just to check if docs are being indexed, but
even with a batch of 5 my solrj client code runs faster than indexing
causing some docs to not get indexed. The function that's indexing is a
recursive method call  (shown below) which fails after sometime with stack
overflow (I did not have this issue with 3.6.2 with same code)

    private static void processDocs(HttpSolrServer server, Integer start,
Integer rows) throws Exception {
        SolrQuery query = new SolrQuery();
        query.setQuery("*:*");
        query.addFilterQuery("-allfields:[* TO *]");
        QueryResponse resp = server.query(query);
        SolrDocumentList list =  resp.getResults();
        Long total = list.getNumFound();

        if(list != null && !list.isEmpty()) {
            for(SolrDocument doc : list) {
                SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
                //To index full doc again
                iDoc.removeField("_version_");
                server.add(iDoc, 1000);
            }

            System.out.println("Indexed " + (start+rows) + "/" + total);
            if (total >= (start + rows)) {
                processDocs(server, (start + rows), rows);
            }
        }
    }

I also tried turning on the updateLog but that was filling up so fast to
the point where it is useless.

How do we do bulk updates in solr 4.x environment ?? Is there any setting
that Iam missing ??

Thanks

Ravi Kiran Bhaskar
Technical Architect
The Washington Post

Reply via email to