Re: Tracking down the input that hits an analysis chain bug

2014-01-16 Thread Benson Margulies
I think that https://issues.apache.org/jira/browse/SOLR-5623 should be ready to go. Would someone please commit from the PR? If there's a preference, I can attach a patch as well. On Fri, Jan 10, 2014 at 1:37 PM, Benson Margulies wrote: > Thanks, that's the recipe that I need. > > On Fri, Jan 10,

Re: Tracking down the input that hits an analysis chain bug

2014-01-10 Thread Benson Margulies
Thanks, that's the recipe that I need. On Fri, Jan 10, 2014 at 11:40 AM, Chris Hostetter wrote: > > : Is there a neighborhood of existing tests I should be visiting here? > > You'll need a custom schema that refers to your new > MockFailOnCertainTokensFilterFactory, so i would create a completley

Re: Tracking down the input that hits an analysis chain bug

2014-01-10 Thread Chris Hostetter
: Is there a neighborhood of existing tests I should be visiting here? You'll need a custom schema that refers to your new MockFailOnCertainTokensFilterFactory, so i would create a completley new test class somewhere in ...solr.update (you're testing that an update fails with a clean error)

Re: Tracking down the input that hits an analysis chain bug

2014-01-10 Thread Benson Margulies
Is there a neighborhood of existing tests I should be visiting here? On Fri, Jan 10, 2014 at 11:27 AM, Benson Margulies wrote: > OK, patch forthcoming. > > On Fri, Jan 10, 2014 at 11:23 AM, Chris Hostetter > wrote: >> >> : The problem manifests as this sort of thing: >> : >> : Jan 3, 2014 6:05:

Re: Tracking down the input that hits an analysis chain bug

2014-01-10 Thread Benson Margulies
OK, patch forthcoming. On Fri, Jan 10, 2014 at 11:23 AM, Chris Hostetter wrote: > > : The problem manifests as this sort of thing: > : > : Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log > : SEVERE: java.lang.IllegalArgumentException: startOffset must be > : non-negative, and endO

Re: Tracking down the input that hits an analysis chain bug

2014-01-10 Thread Chris Hostetter
: The problem manifests as this sort of thing: : : Jan 3, 2014 6:05:33 PM org.apache.solr.common.SolrException log : SEVERE: java.lang.IllegalArgumentException: startOffset must be : non-negative, and endOffset must be >= startOffset, : startOffset=-1811581632,endOffset=-1811581632 Is there a st

Re: Tracking down the input that hits an analysis chain bug

2014-01-05 Thread Michael Sokolov
I think you do (or can) get a log message for each document insert? If that's all you need, I think logging configuration will get you there. I use log4j and turn Solr's pretty verbose logging off using: log4j.logger.org.apache.lucene.solr = WARN assuming the rest of log4j is set up OK, I th

Re: Tracking down the input that hits an analysis chain bug

2014-01-04 Thread Benson Margulies
I rather assumed that there was some log4j-ish config to be set that would do this for me. Lacking one, I guess I'll end up there. On Fri, Jan 3, 2014 at 8:23 PM, Michael Sokolov wrote: > Have you considered using a custom UpdateProcessor to catch the exception > and provide more context in the l

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Michael Sokolov
Have you considered using a custom UpdateProcessor to catch the exception and provide more context in the logs? -Mike On 01/03/2014 03:33 PM, Benson Margulies wrote: Robert, Yes, if the problem was not data-dependent, indeed I wouldn't need to index anything. However, I've run a small mountai

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Benson Margulies
Robert, Yes, if the problem was not data-dependent, indeed I wouldn't need to index anything. However, I've run a small mountain of data through our tokenizer on my machine, and never seen the error, but my customer gets these errors in the middle of a giant spew of data. As it happens, I _was_ mi

Re: Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Robert Muir
This exception comes from OffsetAttributeImpl (e.g. you dont need to index anything to reproduce it). Maybe you have a missing clearAttributes() call (your tokenizer 'returns true' without calling that first)? This could explain it, if something like a StopFilter is also present in the chain: basi

Tracking down the input that hits an analysis chain bug

2014-01-03 Thread Benson Margulies
Using Solr Cloud with 4.3.1. We've got a problem with a tokenizer that manifests as calling OffsetAtt.setOffsets() with invalid inputs. OK, so, we want to figure out what input provokes our code into getting into this pickle. The problem happens on SolrCloud nodes. The problem manifests as this