Thanks for your helpful suggestions. I have considered other analyzers but WDF has great strengths. I will experiment with maintaining transitions and then consider modifying the code.
F. Knudson Mike Klaas wrote: > > On 30-Sep-07, at 12:47 PM, F Knudson wrote: > >> >> Is there a flag to disable the letter-number transition in the >> solr.WordDelimiterFilterFactory? We are indexing category codes, >> thesaurus >> codes for which this letter number transition makes no sense. It is >> bloating the indexing (which is already large). > > Have you considered using a different analyzer? > > If you want to continue using WDF, you could make a quick change > around since 320: > > if (splitOnCaseChange == 0 && > (lastType & ALPHA) != 0 && (type & ALPHA) != 0) { > // ALPHA->ALPHA: always ignore if case isn't considered. > > } else if ((lastType & UPPER)!=0 && (type & LOWER)!=0) { > // UPPER->LOWER: Don't split > } else { > > ... > > by adding a clause that catches ALPHA -> NUMERIC (and vice versa) and > ignores it. > > Another approach that I am using locally is to maintain the > transitions, but force tokens to be a minimum size (so r2d2 doesn't > tokenize to four tokens but arrr2222deee2222 does). > > There is a patch here: http://issues.apache.org/jira/browse/SOLR-293 > > If you vote for it, I promise to get it in for 1.3 <g> > > -Mike > > -- View this message in context: http://www.nabble.com/Letter-number-transitions---can-this-be-turned-off-tf4544769.html#a13003019 Sent from the Solr - User mailing list archive at Nabble.com.