The WDF has a "types" attribute which can specify one or more character type
mapping files. You could create a file like:
@ => ALPHA
_ => ALPHA
For example (from the book!):
Example - Treat at-sign and underscores as text
<fieldType name="text_at_under" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
types="at-under-alpha.txt"/>
</analyzer>
</fieldType>
The file +at-under-alpha.txt+ would contain:
@ => ALPHA
_ => ALPHA
The analysis results:
Source: Hello @World_bar, r@end.
Tokens: 1: Hello 2: @World_bar 3: r@end
-- Jack Krupansky
-----Original Message-----
From: Mingfeng Yang
Sent: Tuesday, June 18, 2013 6:58 PM
To: solr-user@lucene.apache.org
Subject: preserve special characters
We need to index and search lots of tweets which can like "@solr: solr is
great". or "@solr_lucene, good combination".
And we want to search with "@solr" or "@solr_lucene". How can we preserve
"@" and "_" in the index?
If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
will be broken down into "solr" and "lucene", which make the search results
contain lots of non-relevant docs.
If using standardtokenizer, the "@" symbol is stripped.
Thanks,
Ming-