Hi, We have a distributed Solr system (2-3 boxes with each running 2 instances of Solr and each Solr instance can write to multiple cores). Our use case is high index volume - we can get up to 100 million records (1 record = 500 bytes) per day, but very low query traffic (only administrators may need to search for data - once an hour our so). So, we need very fast index time. Here are the things I'm trying to find out in order to optimize our index process,
1) What's the optimum index size? I've noticed as the index size grows the indexing time starts increasing. In our test less than 10G index size we could index over 2K/sec, but as it grows over 20G the index rate drops to 1400/sec and keeps dropping as index size grows. I'm trying to see whether we can partition (create new SolrCore) after 10G. - related question, is there a way to find the SolrCore size (any web service for that?) - based on that information I can create a new core and freeze the one which has reached 10G. 2) In our test, we noticed that after few hours (after 8 hours of indexing) there is a period (3-4 hours period) where the indexing is very-very slow (like 500 records/sec) and after that period indexing returns back to normal rate (1500/sec). Does Solr run any optimize command on its own? How can we find that out? I'm not issuing any optimize command - should I be doing that after certain time? 3) Every time I add new documents (10K at once) to the index I see searcher closing and then re-opening/re-warming (in Catalina.out) after commit is done. I'm not sure if this is an expensive operation. Since, our search volume is very low can I configure Solr to not do this? Would it make indexing any faster? Mar 26, 2009 11:59:45 PM org.apache.solr.search.SolrIndexSearcher close INFO: Closing searc...@33d9337c main Mar 26, 2009 11:59:52 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher <init> INFO: Opening searc...@46ba6905 main Mar 26, 2009 11:59:52 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming searc...@46ba6905 main from searc...@5c5ffecd main 4) Anything else (any other configuration in Solr - I'm currently using all default settings in the solrconfig.xml and default handlers) that could help optimize my indexing process? Thanks, -vivek