RE: Embedded about 50% faster for indexing

Jonathan Woods Mon, 27 Aug 2007 20:41:44 -0700

I don't think you should apologise for highlighting embedded usage.  For
circumstances in which you're at liberty to run a Solr instance in the same
JVM as an app which uses it, I find it very strange that you should have to
use anything _other_ than embedded, and jump through all the unnecessary
hoops (XML conversion, HTTP transport) that this implies.  It's a bit like
suggesting you should throw away Java method invocations altogether, and
write everything in XML-RPC.


Bit of a pet issue of mine!  I'll be creating a JIRA issue on the subject
soon.

Jon

> -----Original Message-----
> From: Sundling, Paul [mailto:[EMAIL PROTECTED] 
> Sent: 28 August 2007 03:24
> To: solr-user@lucene.apache.org
> Subject: RE: Embedded about 50% faster for indexing
> 
> At this point I think I'm going recommend against embedded, 
> regardless of any performance advantage.  The level of 
> documentation is just too low, while the XML API is clearly 
> documented.  It's clear that XML is preferred.
> 
> The embedded example on the wiki is pretty good, but until 
> mutliple core support comes out in the next version, you have 
> to use multiple SolrCore.  If they are accessed in the same 
> webapp, then you can't just set JNDI (since you can only have 
> one value).  So you have to use a Config object as alluded to 
> in the example.  However, you look at the code and there is 
> no javadoc for the constructor.  The constructor args are 
> (String name, InputStream is, String prefix).  I think name 
> is a unique name for the solr core, but that is a guess.  
> Inputstream may be a stream to the solr home, but it could be 
> anything.  Prefix may be a URI prefix.  These are all guesses 
> without trying to read through the code.
> 
> When I look at SolrCore, it looks like it's a singleton, so 
> maybe I can't even access more than one SolrCore using 
> embedded anyway.  :(  So I apologize for highlighting Embedded.  
> 
> Anyway it's clear how to do multiple solr cores using XML.  
> You just have different post URI for the difference cores.  
> You can easily inject that with Spring and externalize the 
> config.  Simple and easy.  So I concede XML is the way to go. :)  
> 
> Paul Sundling
> 
> -----Original Message-----
> From: Mike Klaas [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 27, 2007 5:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Embedded about 50% faster for indexing
> 
> 
> On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote:
> 
> > Whether embedded solr should give me a performance boost or not, it
> > did.
> > :)  I'm not surprised, since it skips XML parsing.  
> Although you never
> > know where cycles are used for sure until you profile.
> 
> It certainly is possible that XML parsing dwarfs indexing, but I'd  
> expect that only to occur under very light analysis and field 
> storage  
> workloads.
> 
> > I tried doing more records per post (200) and it was 
> actually slightly
> 
> > slower and seemed to require more memory.  This makes sense because
> > you
> > have to take up more memory for the StringBuilder to store the much
> > larger XML.  For 10,000 it was much slower.  For that size I would  
> > need
> > to XML streaming or something to make it work.
> >
> > The solr war was on the same machine, so network overhead was only
> > from
> > using loopback.
> 
> The big question is still your connection handling strategy:  
> are you  
> using persistent http connections?  Are you threadedly indexing?
> 
> cheers,
> -Mike
> 
> > Paul Sundling
> >
> > -----Original Message-----
> > From: climbingrose [mailto:[EMAIL PROTECTED]
> > Sent: Monday, August 27, 2007 12:22 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Embedded about 50% faster for indexing
> >
> >
> > Haven't tried the embedded server but I think I have to agree with
> > Mike.
> > We're currently sending 2000 job batches to SOLR server and 
> the amount
> > of time required to transfer documents over http is insignificant
> > compared with the time required to index them. So I do 
> think unless  
> > you
> > are sending document one by one, embedded SOLR shouldn't 
> give you much
> > more performance boost.
> >
> > On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >>
> >> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote:
> >>
> >>>> -----Original Message-----
> >>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of 
> >>>> Yonik Seeley
> >>>> Sent: Friday, August 24, 2007 2:07 PM
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Embedded about 50% faster for indexing
> >>>>
> >>>> One thing I'd like to avoid is everyone trying to embed just for 
> >>>> performance gains. If there is really that much 
> difference, then we
> >
> >>>> need a better way for people to get that without 
> resorting to Java 
> >>>> code.
> >>>>
> >>>> -Yonik
> >>>>
> >>>
> >>> Theoretically and practically, embedded solution will be 
> faster than
> >
> >>> going through http/xml.
> >>
> >> This is only true if the http interface adds significant 
> overhead to 
> >> the cost of indexing a document, and I don't see why this 
> should be 
> >> so, as indexing is relatively heavyweight.  setting up the 
> connection
> 
> >> could be expensive, but this can be greatly mitigated by 
> sending more
> 
> >> than one doc per http request, using persistent connections, and 
> >> threading.
> >>
> >> -Mike
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Cuong Hoang
> 
> 
> 
> 
>

RE: Embedded about 50% faster for indexing

Reply via email to