Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
What i tried is to make an update processor , with try / catch inside of ProcessAdd. This update processor was the last one last in update chain . in catch statement I tried to add to response the id of failed item . This information ( about failed items ) is lost somewhere when request redirected

Re: ignoring bad documents during index

2015-02-22 Thread Mikhail Khludnev
Can you use CloudSolrServer to submit XMLs that ommit the intermediate relay, and might make simple to respond additional info. regarding experimenting with DistributingUpdateProcessor you can copy it and make its' factory implement DistributingUpdateProcessorFactory and add your processor factory

Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
I'm not using a replicas. Does this class relevant anyway? Is there any way to not change this class ,but inherit it and do try catch on ProcessAdd? -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4188008.html Sent from the So

Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
We are working with the following configuration : There is Indexer service that prepares a bulk of xmls . Those XMLs received by a shard , which used only for distributing a request among a shards. ( let's call it GW) Some of shards could return OK, some 400 ( wrong field value ) some 500 ( becau

Re: ignoring bad documents during index

2015-02-22 Thread Mikhail Khludnev
On Sun, Feb 22, 2015 at 12:20 PM, SolrUser1543 wrote: > Does anyone know where is it? local update on leader happens first (assuming you use CloudSolrServer), https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L70

Re: ignoring bad documents during index

2015-02-22 Thread SolrUser1543
I think , you did not understand the question. The problem is indexing via cloud. When one shard gets a request, distributes it among others and in case of error on one of them this information is not passed to request initiator. Does anyone know where is it? -- View this message in cont

Re: ignoring bad documents during index

2015-02-20 Thread Michael Della Bitta
At the layer right before you send that XML out, have it have a fallback option on error where it sends each document one at a time if there's a failure with the batch. Michael Della Bitta Senior Software Engineer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East

Re: ignoring bad documents during index

2015-02-20 Thread SolrUser1543
I am sending a bulk of XML via http request. The same way like indexing via " documents " in solr interface. -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4187632.html Sent from the Solr - User mailing list archive at Nabble

Re: ignoring bad documents during index

2015-02-20 Thread Gora Mohanty
On 20 February 2015 at 15:31, SolrUser1543 wrote: > > I want to experiment with this issue , where exactly I should take a look ? > I want to try to fix this missing aggregation . > > What class is responsible to that ? Are you indexing through SolrJ, DIH, or what? Regards,

Re: ignoring bad documents during index

2015-02-20 Thread SolrUser1543
I want to experiment with this issue , where exactly I should take a look ? I want to try to fix this missing aggregation . What class is responsible to that ? -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4187587.html Sent

Re: ignoring bad documents during index

2015-01-10 Thread Erick Erickson
There are some significant throughput improvements when you batch up a bunch of docs to Solr (assuming SolrJ). You can go ahead and send, say, 1,000 docs in a batch and if the batch fails, re-process the list to find the bad doc. But as Jack says, Solr could do better here. Best, Erick On Sat, J

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
Sending individual documents will give you absolute control - just make sure not to "commit" on each document sent since that would really slow down indexing. You could also send smaller batches, life 5 to 20 documents to balance between fine control and performance. It also depends on your docume

Re: ignoring bad documents during index

2015-01-10 Thread SolrUser1543
Would it be a good solution to index single document instead of bulk ? In this case I will know about the status of each message . What is recommendation in this case : Bulk vs Single ? -- View this message in context: http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-t

Re: ignoring bad documents during index

2015-01-10 Thread Jack Krupansky
Correct, Solr clearly needs improvement in this area. Feel free to comment on the Jira about what options you would like to see supported. -- Jack Krupansky On Sat, Jan 10, 2015 at 5:49 AM, SolrUser1543 wrote: > From reading this (https://issues.apache.org/jira/browse/SOLR-445) I see > that > t

Re: ignoring bad documents during index

2015-01-10 Thread SolrUser1543
>From reading this (https://issues.apache.org/jira/browse/SOLR-445) I see that there is no solution provided for the issue of aggregating responses from several solr instances is available . Solr is not able to do that ? -- View this message in context: http://lucene.472066.n3.nabble.com/ig

Re: ignoring bad documents during index

2015-01-08 Thread Chris Hostetter
38:47 -0700 (MST) : From: SolrUser1543 : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: ignoring bad documents during index : : I have implemented an update processor as described above. : : On single solr instance it works fine. : : When I testing it on

Re: ignoring bad documents during index

2015-01-07 Thread SolrUser1543
I have implemented an update processor as described above. On single solr instance it works fine. When I testing it on solr cloud with several nodes and trying to index few documents , when some of them are incorrect , each instance is creating its response, but it is not aggregated by the ins

Re: ignoring bad documents during index

2015-01-01 Thread Mikhail Khludnev
Hello, Please find below On Thu, Jan 1, 2015 at 11:59 PM, SolrUser1543 wrote: > 1. If it is possible to ignore such an error and continue to index D4 ? > this can be done by catching and swallowing an exception in custom UpdateRequestProcessor > 2. What will the best way to add an information