Hi Jack, That seems like the solution I am looking for. Thanks so much!
//Can't find this "types" for WDF anywhere. Ming- On Tue, Jun 18, 2013 at 4:52 PM, Jack Krupansky <j...@basetechnology.com>wrote: > The WDF has a "types" attribute which can specify one or more character > type mapping files. You could create a file like: > > @ => ALPHA > _ => ALPHA > > For example (from the book!): > > Example - Treat at-sign and underscores as text > > <fieldType name="text_at_under" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="**true"> > <analyzer> > <tokenizer class="solr.**WhitespaceTokenizerFactory"/> > <filter class="solr.**WordDelimiterFilterFactory" > types="at-under-alpha.txt"/> > </analyzer> > </fieldType> > > The file +at-under-alpha.txt+ would contain: > > @ => ALPHA > _ => ALPHA > > The analysis results: > > Source: Hello @World_bar, r@end. > Tokens: 1: Hello 2: @World_bar 3: r@end > > > -- Jack Krupansky > > -----Original Message----- From: Mingfeng Yang > Sent: Tuesday, June 18, 2013 6:58 PM > To: solr-user@lucene.apache.org > Subject: preserve special characters > > > We need to index and search lots of tweets which can like "@solr: solr is > great". or "@solr_lucene, good combination". > > And we want to search with "@solr" or "@solr_lucene". How can we preserve > "@" and "_" in the index? > > If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene > will be broken down into "solr" and "lucene", which make the search results > contain lots of non-relevant docs. > > If using standardtokenizer, the "@" symbol is stripped. > > Thanks, > Ming- >