Hello, Yeah, to be brief.. I wanted to read documents and update them simoultaneously with different threads. Main issue I considered is To call add / commit for " how many " documents, because I can not keep adding millions of documents one after another to StreamingUpdateSolrServer by just sitting idle that it would take care of evrything, doing it so is not possible because of memory issues. So, if there is a case where I can splitdown document set into an optimal sized batch , then I can also go for multiple threads in updating.
Most of my doubts are solved. Thanks for your responses. :: "The beauty of StreamingUpdateSolrServer is that you don't have to worry about batch sizes " ::, So now I can just forget about batch sizes, etc. Just keep going on adding as many as I want. There is one more issue.. point 4 in my first mail. 4) queuesize parameter of Streaming constructer: What could be the rough-value when it comes to real time application having a million+ documents to be indexed ? .. So what does "queuesize" is exactly for ? , if we can go on adding as many as we can. Thanks alot. 2010/1/12 Yonik Seeley <ysee...@gmail.com>: > On Tue, Jan 12, 2010 at 1:09 PM, Smiley, David W. <dsmi...@mitre.org> wrote: >> The beauty of StreamingUpdateSolrServer is that you don't have to worry >> about batch sizes; it streams them all. Just keep calling add() with one >> document and it'll get enqueued. You can pass a collection but there's no >> performance benefit. > > Right - and the problem with building your own collection and passing > it is that it's not being streamed (if it takes any time to build > those docs - like reading from a DB - then that thread may be idle for > some amount of time). If you separate and make document production > asynchronous from document sending, then you've just re-invented > StreamingUpdateSolrServer. > > I'd really recommend just starting with StreamingUpdateSolrServer for > any amount of indexing. > > -Yonik > http://www.lucidimagination.com >