It's hard to guess, but I might start by looking at what the new UpdateLog is costing you. Take it's definition out of solrconfig.xml and try your test again. Then let's take it from there.
- Mark On Jan 23, 2013, at 11:00 AM, Kevin Stone <kevin.st...@jax.org> wrote: > I am having some difficulty migrating our solr indexing scripts from using > 3.5 to solr 4.0. Notably, I am trying to track down why our performance in > solr 4.0 is about 5-10 times slower when indexing documents. Querying is > still quite fast. > > The code adds documents in groups of 1000, and adds each group to the solr > in a thread. The documents are somewhat large, including maybe 30-40 > different field types, mostly multivalued. Here are some snippets of the code > we used in 3.5. > > > MultiThreadedHttpConnectionManager mgr = new > MultiThreadedHttpConnectionManager(); > > HttpClient client = new HttpClient(mgr); > > CommonsHttpSolrServer server = new CommonsHttpSolrServer( "some url for our > index",client ); > > server.setRequestWriter(new BinaryRequestWriter()); > > > Then, we delete the index, and proceed to generate documents and load the > groups in a thread that looks kind of like this. I've omitted some overhead > for handling exceptions, and retry attempts. > > > class DocWriterThread implements Runnable > > { > > CommonsHttpSolrServer server; > > Collection<SolrInputDocument> docs; > > private int commitWithin = 50000; // 50 seconds > > public DocWriterThread(CommonsHttpSolrServer > server,Collection<SolrInputDocument> docs) > > { > > this.server=server; > > this.docs=docs; > > } > > public void run() > > { > > // set the commitWithin feature > > server.add(docs,commitWithin); > > } > > } > > > Now, I've had to change some things to get this compile with the Solr 4.0 > libraries. Here is what I tried to convert the above code to. I don't know if > these are the correct equivalents, as I am not familiar with apache > httpcomponents. > > > > ThreadSafeClientConnManager mgr = new ThreadSafeClientConnManager(); > > DefaultHttpClient client = new DefaultHttpClient(mgr); > > HttpSolrServer server = new HttpSolrServer( "some url for our solr > index",client ); > > server.setRequestWriter(new BinaryRequestWriter()); > > > > > The thread method is the same, but uses HttpSolrServer instead of > CommonsHttpSolrServer. > > We also, had an old solrconfig (not sure what version, but it is pre 3.x and > had mostly default values) that I had to replace with a 4.0 style > solrconfig.xml. I don't want to post the entire file (as it is large), but I > copied one from the solr 4.0 examples, and made a couple changes. First, I > wanted to turn off transaction logging. So essentially I have a line like > this (everything inside is commented out): > > > <updateHandler class="solr.DirectUpdateHandler2"></updateHandler> > > > And I added a handler for javabin > > > <requestHandler name="/update/javabin" > class="solr.BinaryUpdateRequestHandler"> > > <lst name="defaults"> > > <str name="stream.contentType">application/javabin</str> > > </lst> > > </requestHandler> > > I'm not sure what other configurations I should look at. I would think that > there should be a big obvious reason why the indexing performance would drop > nearly 10 fold. > > Against our 3.5 instance I timed our index load, and it adds roughly 40,000 > documents every 3-8 seconds. > > Against our 4.0 instance it adds 40,000 documents every 70-75 seconds. > > This isn't the end of the world, and I would love to use the new join feature > in solr 4.0. However, we have many different indexes with millions of > documents, and this kind of increase in load time is troubling. > > > Thanks for your help. > > > -Kevin > > > The information in this email, including attachments, may be confidential and > is intended solely for the addressee(s). If you believe you received this > email by mistake, please notify the sender by return email as soon as > possible.