Dear Erick, Thank you for your reply. I initialize the arraylist variable with a new Array List after I add and commit the solrDocumentList into the solrClient. So I dont think I have the problem of ever increasing ArrayList. (I hope the add method in solrClient flushes the previous documents added). But as you said I do a hard commit during the loop. I can change it by adding commitWithin. What is the value you would recommend for this type of scenario.
Thank you, Arunan *Sugunakumar Arunan* Undergraduate - CSE | UOM Email : aruna <aru...@wso2.com>ns...@cse.mrt.ac.lk Mobile : 0094 766016272 <076%20601%206272> LinkedIn : https://www.linkedin.com/in/arunans23/ On 20 July 2018 at 23:21, Erick Erickson <erickerick...@gmail.com> wrote: > I do this all the time with batches of 1,000 and don't see this problem. > > one thing that sometimes bites people is to fail to clear the doclist > after every call to add. So you send ever-increasing batches to Solr. > Assuming when you talk about batch size meaning the size of the > solrDocunentList, increasing it would make the broken pipe problem > worse if anything... > > Also, it's generally bad practice to commit after every batch. That's not > your problem here, just something to note. Let your autocommit > settings in solrconfig handle it or specify commitWithin in your > add call. > > I'd also look in your Solr logs and see if there's a problem there. > > Net-net is this is a perfectly reasonable pattern, I suspect some > innocent-seeming problem with your indexing code. > > Best, > Erick > > > > On Fri, Jul 20, 2018 at 9:32 AM, Arunan Sugunakumar > <arunans...@cse.mrt.ac.lk> wrote: > > Hi, > > > > I have around 12 millions objects in my PostgreSQL database to be > indexed. > > I'm running a thread to fetch the rows from the database. The thread will > > also create the documents and put it in an indexing queue. While this is > > happening my main process will retrieve the documents from the queue and > > will index it in the size of 1000. For some time the process is running > as > > expected, but after some time, I get an exception. > > > > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException: > > IOException occured when talking to server at: > > http://localhost:8983/solr/mine-search > > <http://localhost:8983/solr/intermine-search>…………………………….… > ………………………….[corePostProcess] > > Caused by: java.net.SocketException: Broken pipe (Write > > failed)[corePostProcess] at > > java.net.SocketOutputStream.socketWrite0(Native Method)* > > > > > > I tried increasing the batch size upto 30000. Then I got a different > > exception. > > > > *[corePostProcess] org.apache.solr.client.solrj.SolrServerException: > > IOException occured when talking to server at: > > http://localhost:8983/solr/mine-search > > <http://localhost:8983/solr/mine-search>……………………………………………… > .…………………………………………….[corePostProcess] > > Caused by: org.apache.http.NoHttpResponseException: localhost:8983 > failed > > to respond* > > > > > > I would like to know whether there are any good practices on handling > such > > situation, such as max no of documents to index in one attempt etc. > > > > My environement : > > > > Version : solr 7.2, solrj 7.2 > > Ubuntu 16.04 > > RAM 20GB > > I started Solr in standalone mode. > > Number of replicas and shards : 1 > > > > The method I used : > > UpdateResponse response = solrClient.add( > solrDocumentList); > > solrClient.commit(); > > > > > > Thanks in advance. > > > > Arunan >