Hi Erick, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB.
Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) (update streaming using curl in php) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. One more thing, we have CPU utilization (20-25 % in all 4 cores) (using htop) Thanks Naveen On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson <erickerick...@gmail.com>wrote: > What version of Solr are you using? If it's a recent version, then > optimizing is not that essential, you can do it during off hours, perhaps > nightly or weekly. > > As far as indexing speed, have you profiled your application to see whether > it's Solr or your indexing process that's the bottleneck? A quick check > would be to monitor the CPU utilization on the server and see if it's high. > > As far as multithreading, one option is to simply have multiple clients > indexing simultaneously. But you haven't indicated how the indexing is > being > done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to > provide those kinds of details to get meaningful help. > > Best > Erick > On Aug 2, 2011 8:06 AM, "Naveen Gupta" <nkgiit...@gmail.com> wrote: > > Hi > > > > We have a requirement where we are indexing all the messages of a a > thread, > > a thread may have attachment too . We are adding to the solr for indexing > > and searching for applying few business rule. > > > > For a user, we have almost many threads (100k) in number and each thread > may > > be having 10-20 messages. > > > > Now what we are finding is that it is taking 30 mins to index the entire > > threads. > > > > When we run optimize then it is taking faster time. > > > > The question here is that how frequently this optimize should be called > and > > when ? > > > > Please note that we are following commit strategy (that is every after > 10k > > threads, commit is called). we are not calling commit after every doc. > > > > Secondly how can we use multi threading from solr perspective in order to > > improve jvm and other utilization ? > > > > > > Thanks > > Naveen >