Hi Devon,

Something like this should work for you (untested!):

<analyzer>
  <!-- Remove non-"word" characters; only underscores, letters & numbers 
allowed -->
  <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\W+" 
replacement=""/>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2"/>
</analyzer>

Steve

> -----Original Message-----
> From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com]
> Sent: Monday, December 12, 2011 4:52 PM
> To: 'solr-user@lucene.apache.org'
> Subject: Removing whitespace
> 
> Hello,
> 
> I am having trouble finding how to remove/ignore whitespace when indexing.
> The only answer I have found suggested that it is necessary to write my
> own tokenizer. Is this true? I want to remove whitespace and special
> characters from the phrase and create N-grams from the result.
> 
> Ultimately, the effect I am after is that searching "bobdole" would match
> "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better
> way... can anyone lend some assistance?
> 
> Thanks!
> 
> Dev B

Reply via email to