Re: Embedded about 50% faster for indexing

climbingrose Mon, 27 Aug 2007 21:18:43 -0700

Agree. I was actually thinking of developing the embedded version early this
year for one of my projects. I'm sure it will be needed in cases where
running another web server is an overkill.


On 8/28/07, Jonathan Woods <[EMAIL PROTECTED]> wrote:
>
> I don't think you should apologise for highlighting embedded usage.  For
> circumstances in which you're at liberty to run a Solr instance in the
> same
> JVM as an app which uses it, I find it very strange that you should have
> to
> use anything _other_ than embedded, and jump through all the unnecessary
> hoops (XML conversion, HTTP transport) that this implies.  It's a bit like
> suggesting you should throw away Java method invocations altogether, and
> write everything in XML-RPC.
>
> Bit of a pet issue of mine!  I'll be creating a JIRA issue on the subject
> soon.
>
> Jon
>
> > -----Original Message-----
> > From: Sundling, Paul [mailto:[EMAIL PROTECTED]
> > Sent: 28 August 2007 03:24
> > To: solr-user@lucene.apache.org
> > Subject: RE: Embedded about 50% faster for indexing
> >
> > At this point I think I'm going recommend against embedded,
> > regardless of any performance advantage.  The level of
> > documentation is just too low, while the XML API is clearly
> > documented.  It's clear that XML is preferred.
> >
> > The embedded example on the wiki is pretty good, but until
> > mutliple core support comes out in the next version, you have
> > to use multiple SolrCore.  If they are accessed in the same
> > webapp, then you can't just set JNDI (since you can only have
> > one value).  So you have to use a Config object as alluded to
> > in the example.  However, you look at the code and there is
> > no javadoc for the constructor.  The constructor args are
> > (String name, InputStream is, String prefix).  I think name
> > is a unique name for the solr core, but that is a guess.
> > Inputstream may be a stream to the solr home, but it could be
> > anything.  Prefix may be a URI prefix.  These are all guesses
> > without trying to read through the code.
> >
> > When I look at SolrCore, it looks like it's a singleton, so
> > maybe I can't even access more than one SolrCore using
> > embedded anyway.  :(  So I apologize for highlighting Embedded.
> >
> > Anyway it's clear how to do multiple solr cores using XML.
> > You just have different post URI for the difference cores.
> > You can easily inject that with Spring and externalize the
> > config.  Simple and easy.  So I concede XML is the way to go. :)
> >
> > Paul Sundling
> >
> > -----Original Message-----
> > From: Mike Klaas [mailto:[EMAIL PROTECTED]
> > Sent: Monday, August 27, 2007 5:50 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Embedded about 50% faster for indexing
> >
> >
> > On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote:
> >
> > > Whether embedded solr should give me a performance boost or not, it
> > > did.
> > > :)  I'm not surprised, since it skips XML parsing.
> > Although you never
> > > know where cycles are used for sure until you profile.
> >
> > It certainly is possible that XML parsing dwarfs indexing, but I'd
> > expect that only to occur under very light analysis and field
> > storage
> > workloads.
> >
> > > I tried doing more records per post (200) and it was
> > actually slightly
> >
> > > slower and seemed to require more memory.  This makes sense because
> > > you
> > > have to take up more memory for the StringBuilder to store the much
> > > larger XML.  For 10,000 it was much slower.  For that size I would
> > > need
> > > to XML streaming or something to make it work.
> > >
> > > The solr war was on the same machine, so network overhead was only
> > > from
> > > using loopback.
> >
> > The big question is still your connection handling strategy:
> > are you
> > using persistent http connections?  Are you threadedly indexing?
> >
> > cheers,
> > -Mike
> >
> > > Paul Sundling
> > >
> > > -----Original Message-----
> > > From: climbingrose [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, August 27, 2007 12:22 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Embedded about 50% faster for indexing
> > >
> > >
> > > Haven't tried the embedded server but I think I have to agree with
> > > Mike.
> > > We're currently sending 2000 job batches to SOLR server and
> > the amount
> > > of time required to transfer documents over http is insignificant
> > > compared with the time required to index them. So I do
> > think unless
> > > you
> > > are sending document one by one, embedded SOLR shouldn't
> > give you much
> > > more performance boost.
> > >
> > > On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> > >>
> > >> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
> > >>
> > >>>> -----Original Message-----
> > >>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
> > >>>> Yonik Seeley
> > >>>> Sent: Friday, August 24, 2007 2:07 PM
> > >>>> To: solr-user@lucene.apache.org
> > >>>> Subject: Re: Embedded about 50% faster for indexing
> > >>>>
> > >>>> One thing I'd like to avoid is everyone trying to embed just for
> > >>>> performance gains. If there is really that much
> > difference, then we
> > >
> > >>>> need a better way for people to get that without
> > resorting to Java
> > >>>> code.
> > >>>>
> > >>>> -Yonik
> > >>>>
> > >>>
> > >>> Theoretically and practically, embedded solution will be
> > faster than
> > >
> > >>> going through http/xml.
> > >>
> > >> This is only true if the http interface adds significant
> > overhead to
> > >> the cost of indexing a document, and I don't see why this
> > should be
> > >> so, as indexing is relatively heavyweight.  setting up the
> > connection
> >
> > >> could be expensive, but this can be greatly mitigated by
> > sending more
> >
> > >> than one doc per http request, using persistent connections, and
> > >> threading.
> > >>
> > >> -Mike
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Cuong Hoang
> >
> >
> >
> >
> >
>
>


-- 
Regards,

Cuong Hoang

Re: Embedded about 50% faster for indexing

Reply via email to