Here's some additional background that may shed light on the
performance..

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Wed, Feb 12, 2014 at 7:40 AM, Dmitry Kan <solrexp...@gmail.com> wrote:

> Cross-posting my answer from SO:
>
> According to this wiki:
>
> https://wiki.apache.org/solr/NearRealtimeSearch
>
> the commitWithin is a soft-commit by default. Soft-commits are very
> efficient in terms of making the added documents immediately searchable.
> But! They are not on the disk yet. That means the documents are being
> committed into RAM. In this setup you would use updateLog to be solr
> instance crash tolerant.
>
> What you do in point 2 is hard-commit, i.e. flush the added documents to
> disk. Doing this after each document add is very expensive. So instead,
> post a bunch of documents and issue a hard commit or even have you
> autoCommit set to some reasonable value, like 10 min or 1 hour (depends on
> your user expectations).
>
>
>
> On Wed, Feb 12, 2014 at 5:28 PM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com
> >wrote:
>
> > I absolutely agree and I even read the NRT page before posting this
> > question.
> >
> > The thing that baffles me is this:
> >
> > Doing a commit after each add kills the performance.
> > On the other hand, when I use commit within and specify an (absurd) 1ms
> > delay,- I expect that this behavior will be equivalent to making a
> commit-
> > from a functional perspective.
> >
> > Seeing that there is no magic in the world, I am trying to understand
> what
> > is the price I am actually paying when using the commitWithin feature, on
> > the one hand it commits almost immediately, on the other hand, it
> performs
> > wonderfully. Where is the catch?
> >
> >
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmil...@gmail.com]
> > Sent: יום ד 12 פברואר 2014 17:00
> > To: solr-user
> > Subject: Re: Solr perfromance with commitWithin seesm too good to be
> true.
> > I am afraid I am missing something
> >
> > Doing a standard commit after every document is a Solr anti-pattern.
> >
> > commitWithin is a “near-realtime” commit in recent versions of Solr and
> > not a standard commit.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> > - Mark
> >
> > http://about.me/markrmiller
> >
> > On Feb 12, 2014, at 9:52 AM, Pisarev, Vitaliy <vitaliy.pisa...@hp.com>
> > wrote:
> >
> > > I am running a very simple performance experiment where I post 2000
> > documents to my application. Who in turn persists them to a relational DB
> > and sends them to Solr for indexing (Synchronously, in the same request).
> > > I am testing 3 use cases:
> > >
> > >  1.  No indexing at all - ~45 sec to post 2000 documents  2.  Indexing
> > > included - commit after each add. ~8 minutes (!) to post and index
> > > 2000 documents  3.  Indexing included - commitWithin 1ms ~55 seconds
> > > (!) to post and index 2000 documents The 3rd result does not make any
> > sense, I would expect the behavior to be similar to the one in point 2.
> At
> > first I thought that the documents were not really committed but I could
> > actually see them being added by executing some queries during the
> > experiment (via the solr web UI).
> > > I am worried that I am missing something very big. The code I use for
> > point 2:
> > > SolrInputDocument = // get doc
> > > SolrServer solrConnection = // get connection solrConnection.add(doc);
> > > solrConnection.commit(); Whereas the code for point 3:
> > > SolrInputDocument = // get doc
> > > SolrServer solrConnection = // get connection solrConnection.add(doc,
> > > 1); // According to API documentation I understand there is no need to
> > > explicitly call commit with this API Is it possible that committing
> > after each add will degrade performance by a factor of 40?
> > >
> >
> >
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: twitter.com/dmitrykan
>

Reply via email to