StreamingUpdateSolrServer logs "starting runner: ...", sends a POST with <stream>...</stream> and I guess also opens a new HTTP connection every time it has managed to empty its queue. In StreamingUpdateSolrServer.java it says this:
// info is ok since this should only happen once for each thread log.info( "starting runner: {}" , this ); But the comment is not correct. It will log everytime its queue has been emptied. I get "starting runner: ..." lots and lots in the log, but I have only 4 threads. Let's say I have this code: SolrServer server = new StreamingUpdateSolrServer("http://localhost:8983/solr", 20, 4) foreach (String id : ids) { SolrInputDocument doc = new SolrInputDocument() doc.addField("id", id) doc.addField("text", "something") server.add(doc) // Simulating that lots of stuff happens, like getting stuff from the database and what not Thread.sleep(300) } server.commit() Because there is a little delay after server.add(doc) the StreamingUpdateSolrServer's runners will quickly empty the internal queue of documents. Because of this the next time server.add(doc) is called it will open a new HTTP connection and make a new POST and send a <stream> with only one document. This is very inefficient. Would it be possible to hold a HTTP connection open and hold a <stream> open until commit is called? I realize that the way to make this more efficient is to put all documents in a list and call server.add(allDocs). I browsed the web to find StreamingUpdateSolrServer examples and found one where it said that nowadays it's just fine to call server.add(doc), that you don't have to put everything in a list first. But that's obviosly not entirely correct. Before hitting the send button I realized that the runner checks for new documents in the queue for 250 milliseconds before it gives up. So in my application this timeout isn't enough. Maybe we could modify the class so the timeout could be changed with setTimeout() instead of having it hardcoded to 250? What is a good number of documents for sever.add(docs)? Is there an upper limit or is it ok to have a million documents? /Tim