I don't think you should apologise for highlighting embedded usage. For circumstances in which you're at liberty to run a Solr instance in the same JVM as an app which uses it, I find it very strange that you should have to use anything _other_ than embedded, and jump through all the unnecessary hoops (XML conversion, HTTP transport) that this implies. It's a bit like suggesting you should throw away Java method invocations altogether, and write everything in XML-RPC.
Bit of a pet issue of mine! I'll be creating a JIRA issue on the subject soon. Jon > -----Original Message----- > From: Sundling, Paul [mailto:[EMAIL PROTECTED] > Sent: 28 August 2007 03:24 > To: solr-user@lucene.apache.org > Subject: RE: Embedded about 50% faster for indexing > > At this point I think I'm going recommend against embedded, > regardless of any performance advantage. The level of > documentation is just too low, while the XML API is clearly > documented. It's clear that XML is preferred. > > The embedded example on the wiki is pretty good, but until > mutliple core support comes out in the next version, you have > to use multiple SolrCore. If they are accessed in the same > webapp, then you can't just set JNDI (since you can only have > one value). So you have to use a Config object as alluded to > in the example. However, you look at the code and there is > no javadoc for the constructor. The constructor args are > (String name, InputStream is, String prefix). I think name > is a unique name for the solr core, but that is a guess. > Inputstream may be a stream to the solr home, but it could be > anything. Prefix may be a URI prefix. These are all guesses > without trying to read through the code. > > When I look at SolrCore, it looks like it's a singleton, so > maybe I can't even access more than one SolrCore using > embedded anyway. :( So I apologize for highlighting Embedded. > > Anyway it's clear how to do multiple solr cores using XML. > You just have different post URI for the difference cores. > You can easily inject that with Spring and externalize the > config. Simple and easy. So I concede XML is the way to go. :) > > Paul Sundling > > -----Original Message----- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Monday, August 27, 2007 5:50 PM > To: solr-user@lucene.apache.org > Subject: Re: Embedded about 50% faster for indexing > > > On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote: > > > Whether embedded solr should give me a performance boost or not, it > > did. > > :) I'm not surprised, since it skips XML parsing. > Although you never > > know where cycles are used for sure until you profile. > > It certainly is possible that XML parsing dwarfs indexing, but I'd > expect that only to occur under very light analysis and field > storage > workloads. > > > I tried doing more records per post (200) and it was > actually slightly > > > slower and seemed to require more memory. This makes sense because > > you > > have to take up more memory for the StringBuilder to store the much > > larger XML. For 10,000 it was much slower. For that size I would > > need > > to XML streaming or something to make it work. > > > > The solr war was on the same machine, so network overhead was only > > from > > using loopback. > > The big question is still your connection handling strategy: > are you > using persistent http connections? Are you threadedly indexing? > > cheers, > -Mike > > > Paul Sundling > > > > -----Original Message----- > > From: climbingrose [mailto:[EMAIL PROTECTED] > > Sent: Monday, August 27, 2007 12:22 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Embedded about 50% faster for indexing > > > > > > Haven't tried the embedded server but I think I have to agree with > > Mike. > > We're currently sending 2000 job batches to SOLR server and > the amount > > of time required to transfer documents over http is insignificant > > compared with the time required to index them. So I do > think unless > > you > > are sending document one by one, embedded SOLR shouldn't > give you much > > more performance boost. > > > > On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > >> > >> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote: > >> > >>>> -----Original Message----- > >>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of > >>>> Yonik Seeley > >>>> Sent: Friday, August 24, 2007 2:07 PM > >>>> To: solr-user@lucene.apache.org > >>>> Subject: Re: Embedded about 50% faster for indexing > >>>> > >>>> One thing I'd like to avoid is everyone trying to embed just for > >>>> performance gains. If there is really that much > difference, then we > > > >>>> need a better way for people to get that without > resorting to Java > >>>> code. > >>>> > >>>> -Yonik > >>>> > >>> > >>> Theoretically and practically, embedded solution will be > faster than > > > >>> going through http/xml. > >> > >> This is only true if the http interface adds significant > overhead to > >> the cost of indexing a document, and I don't see why this > should be > >> so, as indexing is relatively heavyweight. setting up the > connection > > >> could be expensive, but this can be greatly mitigated by > sending more > > >> than one doc per http request, using persistent connections, and > >> threading. > >> > >> -Mike > >> > > > > > > > > -- > > Regards, > > > > Cuong Hoang > > > > >