Re: solr 4.x reindexing issues

Ravi Solr Tue, 25 Mar 2014 10:10:58 -0700

Thank you very much for responding Mr. Høydahl. I removed the recursion
which eliminated the stack overflow exception. However, I still
encountering my main problem with the docs not getting indexed in solr 4.x
as I mentioned in my original email. The reason I am reindexing is that
with solr 4.x EnglishPorterFilterFactory has been removed and also I wanted
to add another copyField of all field values into destination "allfields"


As per your suggestion I removed softcommit and had autoCommit to maxDocs
100 and maxTime to 120000. I was printing out the indexing call...You can
clearly see still it does index around 10 at a time (testing code and
results shown below). Again my code finished fully and just for a good
measure I commited manually after 10 minutes still when I query I only see
"13513" docs got indexed.

There must be something else I am missing

<response>
     <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
      <lst name="params">
           <str name="q">allfields:[* TO *]</str>
            <str name="wt">xml</str>
            <str name="rows">0</str>
      </lst>
      </lst>
      <result name="response" numFound="13513" start="0"/></response>

TEST INDEXER CODE
 -------------------------------
        Long total = null;
        Integer start = 0;
        Integer rows = 100;
        while(total == null || total >= (start+rows)) {
            SolrQuery query = new SolrQuery();
            query.setQuery("*:*");
            query.setSort("displaydatetime", ORDER.desc);
            query.addFilterQuery("-allfields:[* TO *]");
            QueryResponse resp = server.query(query);
            SolrDocumentList list =  resp.getResults();
            total = list.getNumFound();

            if(list != null && !list.isEmpty()) {
                for(SolrDocument doc : list) {
                    SolrInputDocument iDoc =
ClientUtils.toSolrInputDocument(doc);
                    //To index full doc again
                    iDoc.removeField("_version_");
                    server.add(iDoc);
                }

                System.out.println("Indexed " + (start+rows) + "/" + total);
                start = (start+rows);
            }
        }

       System.out.println("COMPLETELY DONE");

System.out output
-------------------------
Indexed 1252100/1256575
Indexed 1252200/1256575
Indexed 1252300/1256575
Indexed 1252400/1256575
Indexed 1252500/1256575
Indexed 1252600/1256575
Indexed 1252700/1256575
Indexed 1252800/1256575
Indexed 1252900/1256575
Indexed 1253000/1256575
Indexed 1253100/1256566
Indexed 1253200/1256566
Indexed 1253300/1256566
Indexed 1253400/1256566
Indexed 1253500/1256566
Indexed 1253600/1256566
Indexed 1253700/1256566
Indexed 1253800/1256566
Indexed 1253900/1256566
Indexed 1254000/1256566
Indexed 1254100/1256566
Indexed 1254200/1256566
Indexed 1254300/1256566
Indexed 1254400/1256566
Indexed 1254500/1256566
Indexed 1254600/1256566
Indexed 1254700/1256566
Indexed 1254800/1256566
Indexed 1254900/1256566
Indexed 1255000/1256566
Indexed 1255100/1256566
Indexed 1255200/1256566
Indexed 1255300/1256566
Indexed 1255400/1256566
Indexed 1255500/1256566
Indexed 1255600/1256566
Indexed 1255700/1256557
Indexed 1255800/1256557
Indexed 1255900/1256557
Indexed 1256000/1256557
Indexed 1256100/1256557
Indexed 1256200/1256557
Indexed 1256300/1256557
Indexed 1256400/1256557
Indexed 1256500/1256557
COMPLETELY DONE


Thanks,
Ravi Kiran Bhaskar



On Tue, Mar 25, 2014 at 7:13 AM, Jan Høydahl <jan....@cominvent.com> wrote:

> Hi,
>
> Seems you try to reindex from one server to the other.
>
> Be aware that it could be easier for you to simply copy the whole index
> folder over to your 4.6.1 server and start Solr as it will be able to read
> your 3.x index. This is unless you also want to do major upgrades of your
> schema or update processors so that you'll need a re-index anyway.
>
> If you believe you really need a re-index, then please try to batch index
> without triggering commits every few seconds - this is really heavy on the
> system and completely unnecessary. You won't get the benefit of SoftCommit
> if you're not running SolrCloud, so no need to configure that.
>
> I would change your <autoCommit> into maxDocs=10000 and maxTime=120000
> (every 2min).
> Further please index without 1s commitWithin, i.e. instead of
> >                server.add(iDoc, 1000);
> use
> >                server.add(iDoc);
>
> This will make sure the server gets room to breathe and not constantly
> generating new indices.
>
> Finally, it's probably not a good idea to use recursion here. You really
> don't need to, filling up your stack. You can instead refactor the method
> to do the whole indexing. And a hint is that it is generally better to ask
> for ALL documents in one go and stream to the end rather than increasing
> offsets with new queries all the time - because high offsets/start can be
> time consuming, especially with multiple shards. If you increase the
> timeout enough you should be able to retrieve all documents in one go!
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 24. mars 2014 kl. 22:36 skrev Ravi Solr <ravis...@gmail.com>:
>
> > Hello,
> >        We are trying to reindex as part of our move from 3.6.2 to 4.6.1
> > and have faced various issues reindexing 1.5 Million docs. We dont use
> > solrcloud, its still Master/Slave config. For testing this Iam using a
> > single test server reading from it and putting back into same index.
> >
> > We send docs in batches of 100 but only 10/100 are getting indexed, is
> this
> > related to the maxBufferedAddsPerServer setting that is hard coded ??
> Also
> > I tried to play with autocommit and softcommit settings but in vain.
> >
> >    <autoCommit>
> >       <maxDocs>5</maxDocs>
> >       <maxTime>5000</maxTime>
> >       <openSearcher>true</openSearcher>
> >    </autoCommit>
> >
> >    <autoSoftCommit>
> >        <maxTime>1000</maxTime>
> >    </autoSoftCommit>
> >
> > I use these on the test system just to check if docs are being indexed,
> but
> > even with a batch of 5 my solrj client code runs faster than indexing
> > causing some docs to not get indexed. The function that's indexing is a
> > recursive method call  (shown below) which fails after sometime with
> stack
> > overflow (I did not have this issue with 3.6.2 with same code)
> >
> >    private static void processDocs(HttpSolrServer server, Integer start,
> > Integer rows) throws Exception {
> >        SolrQuery query = new SolrQuery();
> >        query.setQuery("*:*");
> >        query.addFilterQuery("-allfields:[* TO *]");
> >        QueryResponse resp = server.query(query);
> >        SolrDocumentList list =  resp.getResults();
> >        Long total = list.getNumFound();
> >
> >        if(list != null && !list.isEmpty()) {
> >            for(SolrDocument doc : list) {
> >                SolrInputDocument iDoc =
> > ClientUtils.toSolrInputDocument(doc);
> >                //To index full doc again
> >                iDoc.removeField("_version_");
> >                server.add(iDoc, 1000);
> >            }
> >
> >            System.out.println("Indexed " + (start+rows) + "/" + total);
> >            if (total >= (start + rows)) {
> >                processDocs(server, (start + rows), rows);
> >            }
> >        }
> >    }
> >
> > I also tried turning on the updateLog but that was filling up so fast to
> > the point where it is useless.
> >
> > How do we do bulk updates in solr 4.x environment ?? Is there any setting
> > that Iam missing ??
> >
> > Thanks
> >
> > Ravi Kiran Bhaskar
> > Technical Architect
> > The Washington Post
>
>

Re: solr 4.x reindexing issues

Reply via email to