Hi Devon, Something like this should work for you (untested!):
<analyzer> <!-- Remove non-"word" characters; only underscores, letters & numbers allowed --> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\W+" replacement=""/> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2"/> </analyzer> Steve > -----Original Message----- > From: Devon Baumgarten [mailto:dbaumgar...@nationalcorp.com] > Sent: Monday, December 12, 2011 4:52 PM > To: 'solr-user@lucene.apache.org' > Subject: Removing whitespace > > Hello, > > I am having trouble finding how to remove/ignore whitespace when indexing. > The only answer I have found suggested that it is necessary to write my > own tokenizer. Is this true? I want to remove whitespace and special > characters from the phrase and create N-grams from the result. > > Ultimately, the effect I am after is that searching "bobdole" would match > "Bob Dole", "Bo B. Dole", and maybe "Bobdo". Maybe there is a better > way... can anyone lend some assistance? > > Thanks! > > Dev B