Here's some additional background that may shed light on the performance.. http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
Best, Erick On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan <solrexp...@gmail.com> wrote: > Cross-posting my answer from SO: > > According to this wiki: > > https://wiki.apache.org/solr/NearRealtimeSearch > > the commitWithin is a soft-commit by default. Soft-commits are very > efficient in terms of making the added documents immediately searchable. > But! They are not on the disk yet. That means the documents are being > committed into RAM. In this setup you would use updateLog to be solr > instance crash tolerant. > > What you do in point 2 is hard-commit, i.e. flush the added documents to > disk. Doing this after each document add is very expensive. So instead, > post a bunch of documents and issue a hard commit or even have you > autoCommit set to some reasonable value, like 10 min or 1 hour (depends on > your user expectations). > > > > On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com > >wrote: > > > I absolutely agree and I even read the NRT page before posting this > > question. > > > > The thing that baffles me is this: > > > > Doing a commit after each add kills the performance. > > On the other hand, when I use commit within and specify an (absurd) 1ms > > delay,- I expect that this behavior will be equivalent to making a > commit- > > from a functional perspective. > > > > Seeing that there is no magic in the world, I am trying to understand > what > > is the price I am actually paying when using the commitWithin feature, on > > the one hand it commits almost immediately, on the other hand, it > performs > > wonderfully. Where is the catch? > > > > > > -----Original Message----- > > From: Mark Miller [mailto:markrmil...@gmail.com] > > Sent: יום ד 12 פברואר 2014 17:00 > > To: solr-user > > Subject: Re: Solr perfromance with commitWithin seesm too good to be > true. > > I am afraid I am missing something > > > > Doing a standard commit after every document is a Solr anti-pattern. > > > > commitWithin is a “near-realtime” commit in recent versions of Solr and > > not a standard commit. > > > > > https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching > > > > - Mark > > > > http://about.me/markrmiller > > > > On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com> > > wrote: > > > > > I am running a very simple performance experiment where I post 2000 > > documents to my application. Who in turn persists them to a relational DB > > and sends them to Solr for indexing (Synchronously, in the same request). > > > I am testing 3 use cases: > > > > > > 1. No indexing at all - ~45 sec to post 2000 documents 2. Indexing > > > included - commit after each add. ~8 minutes (!) to post and index > > > 2000 documents 3. Indexing included - commitWithin 1ms ~55 seconds > > > (!) to post and index 2000 documents The 3rd result does not make any > > sense, I would expect the behavior to be similar to the one in point 2. > At > > first I thought that the documents were not really committed but I could > > actually see them being added by executing some queries during the > > experiment (via the solr web UI). > > > I am worried that I am missing something very big. The code I use for > > point 2: > > > SolrInputDocument = // get doc > > > SolrServer solrConnection = // get connection solrConnection.add(doc); > > > solrConnection.commit(); Whereas the code for point 3: > > > SolrInputDocument = // get doc > > > SolrServer solrConnection = // get connection solrConnection.add(doc, > > > 1); // According to API documentation I understand there is no need to > > > explicitly call commit with this API Is it possible that committing > > after each add will degrade performance by a factor of 40? > > > > > > > > > > -- > Dmitry > Blog: http://dmitrykan.blogspot.com > Twitter: twitter.com/dmitrykan >