Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Mikhail Khludnev
Tom, note about https://issues.apache.org/jira/browse/SOLR-6559 and https://issues.apache.org/jira/browse/SOLR-3585. They seem relevant. On Fri, Dec 12, 2014 at 7:31 PM, Tom Burton-West wrote: > Thanks everybody for the information. > > Shawn, thanks for bringing up the issues around making sure

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Tom Burton-West
Thanks everybody for the information. Shawn, thanks for bringing up the issues around making sure each document is indexed ok. With our current architecture, that is important for us. Yonik's clarification about streaming really helped me to understand one of the main advantages of CUSS: >>When

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Shawn Heisey
On 12/12/2014 3:49 AM, Michael Della Bitta wrote: > I seem to remember being able to do something about errors with the > handleError method, but I must have had to do it in a custom subclass to > actually have visibility into what exactly went wrong. Although it may be possible to override the ha

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Michael Della Bitta
Shawn: I seem to remember being able to do something about errors with the handleError method, but I must have had to do it in a custom subclass to actually have visibility into what exactly went wrong. On Dec 11, 2014 9:28 PM, "Shawn Heisey" wrote: > On 12/11/2014 9:19 AM, Michael Della Bitta w

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-12 Thread Mikhail Khludnev
On Wed, Dec 10, 2014 at 10:12 PM, Tom Burton-West wrote: > I have very large XML documents, and the examples I see all build documents > by adding fields in Java code. Is there an example that actually reads XML > files from the file system? > Tom, What's the possible architecture, can you let S

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Shawn Heisey
On 12/11/2014 9:19 AM, Michael Della Bitta wrote: > Only thing you have to worry about (in both the CUSS and the home grown > case) is a single bad document in a batch fails the whole batch. It's up > to you to fall back to writing them individually so the rest of the > batch makes it in. With CUS

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley
On Thu, Dec 11, 2014 at 11:52 AM, Alexandre Rafalovitch wrote: > On 11 December 2014 at 11:40, Yonik Seeley wrote: >> So to Solr (server side), it looks like a single update request >> (assuming 1 thread) with a batch of multiple documents... but it was >> never actually "batched" on the client s

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Alexandre Rafalovitch
On 11 December 2014 at 11:40, Yonik Seeley wrote: > So to Solr (server side), it looks like a single update request > (assuming 1 thread) with a batch of multiple documents... but it was > never actually "batched" on the client side. Does Solr also indexes them one-by-one as it parses them off th

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Yonik Seeley
On Wed, Dec 10, 2014 at 6:09 PM, Erick Erickson wrote: > So CUSS will do something like this: > 1> assemble a packet for Solr > 2> pass off the actual transmission > to Solr to a thread and immediately > go back to <1>. > > Basically, CUSS is doing async processing. The more important p

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Mikhail Khludnev
Agree with Erick. However, I suppose you can try to provide your own RequestWriter, and let it stream XML. btw, what's in them? How Solr handles them right now? Why don't you want to start from the test? On Thu, Dec 11, 2014 at 7:04 PM, Erick Erickson wrote: > I don't think so, it uses SolrInpu

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Michael Della Bitta
Tom: ConcurrentUpdateSolrServer isn't magic or anything. You could pretty trivially write something that takes batches of your XML documents and combines them into a single document (multiple tags in the section) and sends them up to Solr and achieve some of the same speed benefits. If yo

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Erick Erickson
I don't think so, it uses SolrInputDocuments and lists thereof. So if you parse the xml and then put things in SolrInputDocuments.. Or something like that. Erick On Thu, Dec 11, 2014 at 9:43 AM, Tom Burton-West wrote: > Thanks Eric, > > That is helpful. We already have a process that works

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-11 Thread Tom Burton-West
Thanks Eric, That is helpful. We already have a process that works similarly. Each thread/process that sends a document to Solr waits until it gets a response in order to make sure that the document was indexed successfully (we log errors and retry docs that don't get indexed successfully), howe

Re: Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-10 Thread Erick Erickson
The process if you don't use CUSS is this: 1> assemble the packet of docs 2> send it to Solr 3> wait until Solr is done indexing it 4> start assembling the second doc. So, several things are going on here. 1> the client is sitting idle while Solr is indexing and 2> Solr is sitting idle when t

Details on why ConccurentUpdateSolrServer is reccommended for maximum index performance

2014-12-10 Thread Tom Burton-West
Hello all, In the example schema.xml for Solr 4.10.2 this comment is listed under the "PERFORMANCE NOTE" "For maximum indexing performance, use the ConcurrentUpdateSolrServer java client." Is there some documentation somewhere that explains why this will maximize indexing peformance? In par