Dear Erick,

Thank you for your reply. I initialize the arraylist variable with a new
Array List after I add and commit the solrDocumentList into the solrClient.
So I dont think I have the problem of ever increasing ArrayList. (I hope
the add method in solrClient flushes the previous documents added). But as
you said I do a hard commit during the loop. I can change it by adding
commitWithin. What is the value you would recommend for this type of
scenario.

Thank you,
Arunan

*Sugunakumar Arunan*
Undergraduate - CSE | UOM

Email : aruna <aru...@wso2.com>ns...@cse.mrt.ac.lk
Mobile : 0094 766016272 <076%20601%206272>
LinkedIn : https://www.linkedin.com/in/arunans23/

On 20 July 2018 at 23:21, Erick Erickson <erickerick...@gmail.com> wrote:

> I do this all the time with batches of 1,000 and don't see this problem.
>
> one thing that sometimes bites people is to fail to clear the doclist
> after every call to add. So you send ever-increasing batches to Solr.
> Assuming when you talk about batch size meaning the size of the
> solrDocunentList, increasing it would make  the broken pipe problem
> worse if anything...
>
> Also, it's generally bad practice to commit after every batch. That's not
> your problem here, just something to note. Let your autocommit
> settings in solrconfig handle it or specify commitWithin in your
> add call.
>
> I'd also look in your Solr logs and see if there's a problem there.
>
> Net-net is this is a perfectly reasonable pattern, I suspect some
> innocent-seeming problem with your indexing code.
>
> Best,
> Erick
>
>
>
> On Fri, Jul 20, 2018 at 9:32 AM, Arunan Sugunakumar
> <arunans...@cse.mrt.ac.lk> wrote:
> > Hi,
> >
> > I have around 12 millions objects in my PostgreSQL database to be
> indexed.
> > I'm running a thread to fetch the rows from the database. The thread will
> > also create the documents and put it in an indexing queue. While this is
> > happening my main process will retrieve the documents from the queue and
> > will index it in the size of 1000. For some time the process is running
> as
> > expected, but after some time, I get an exception.
> >
> > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException:
> > IOException occured when talking to server at:
> > http://localhost:8983/solr/mine-search
> > <http://localhost:8983/solr/intermine-search>…………………………….…
> ………………………….[corePostProcess]
> > Caused by: java.net.SocketException: Broken pipe (Write
> > failed)[corePostProcess]    at
> > java.net.SocketOutputStream.socketWrite0(Native Method)*
> >
> >
> > I tried increasing the batch size upto 30000. Then I got a different
> > exception.
> >
> > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException:
> > IOException occured when talking to server at:
> > http://localhost:8983/solr/mine-search
> > <http://localhost:8983/solr/mine-search>………………………………………………
> .…………………………………………….[corePostProcess]
> > Caused by: org.apache.http.NoHttpResponseException: localhost:8983
> failed
> > to respond*
> >
> >
> > I would like to know whether there are any good practices on handling
> such
> > situation, such as max no of documents to index in one attempt etc.
> >
> > My environement :
> >
> > Version : solr 7.2, solrj 7.2
> > Ubuntu 16.04
> > RAM 20GB
> > I started Solr in standalone mode.
> > Number of replicas and shards : 1
> >
> > The method I used :
> >                 UpdateResponse response = solrClient.add(
> solrDocumentList);
> >                 solrClient.commit();
> >
> >
> > Thanks in advance.
> >
> > Arunan
>

Reply via email to