Cross-posting my answer from SO: According to this wiki:
https://wiki.apache.org/solr/NearRealtimeSearch the commitWithin is a soft-commit by default. Soft-commits are very efficient in terms of making the added documents immediately searchable. But! They are not on the disk yet. That means the documents are being committed into RAM. In this setup you would use updateLog to be solr instance crash tolerant. What you do in point 2 is hard-commit, i.e. flush the added documents to disk. Doing this after each document add is very expensive. So instead, post a bunch of documents and issue a hard commit or even have you autoCommit set to some reasonable value, like 10 min or 1 hour (depends on your user expectations). On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com>wrote: > I absolutely agree and I even read the NRT page before posting this > question. > > The thing that baffles me is this: > > Doing a commit after each add kills the performance. > On the other hand, when I use commit within and specify an (absurd) 1ms > delay,- I expect that this behavior will be equivalent to making a commit- > from a functional perspective. > > Seeing that there is no magic in the world, I am trying to understand what > is the price I am actually paying when using the commitWithin feature, on > the one hand it commits almost immediately, on the other hand, it performs > wonderfully. Where is the catch? > > > -----Original Message----- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: יום ד 12 פברואר 2014 17:00 > To: solr-user > Subject: Re: Solr perfromance with commitWithin seesm too good to be true. > I am afraid I am missing something > > Doing a standard commit after every document is a Solr anti-pattern. > > commitWithin is a “near-realtime” commit in recent versions of Solr and > not a standard commit. > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > - Mark > > http://about.me/markrmiller > > On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com> > wrote: > > > I am running a very simple performance experiment where I post 2000 > documents to my application. Who in turn persists them to a relational DB > and sends them to Solr for indexing (Synchronously, in the same request). > > I am testing 3 use cases: > > > > 1. No indexing at all - ~45 sec to post 2000 documents 2. Indexing > > included - commit after each add. ~8 minutes (!) to post and index > > 2000 documents 3. Indexing included - commitWithin 1ms ~55 seconds > > (!) to post and index 2000 documents The 3rd result does not make any > sense, I would expect the behavior to be similar to the one in point 2. At > first I thought that the documents were not really committed but I could > actually see them being added by executing some queries during the > experiment (via the solr web UI). > > I am worried that I am missing something very big. The code I use for > point 2: > > SolrInputDocument = // get doc > > SolrServer solrConnection = // get connection solrConnection.add(doc); > > solrConnection.commit(); Whereas the code for point 3: > > SolrInputDocument = // get doc > > SolrServer solrConnection = // get connection solrConnection.add(doc, > > 1); // According to API documentation I understand there is no need to > > explicitly call commit with this API Is it possible that committing > after each add will degrade performance by a factor of 40? > > > > -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan