Re: preserve special characters

Mingfeng Yang Tue, 18 Jun 2013 17:09:11 -0700

Hi Jack,

That seems like the solution I am looking for. Thanks so much!


//Can't find this "types" for WDF anywhere.

Ming-


On Tue, Jun 18, 2013 at 4:52 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> The WDF has a "types" attribute which can specify one or more character
> type mapping files. You could create a file like:
>
> @ => ALPHA
> _ => ALPHA
>
> For example (from the book!):
>
> Example - Treat at-sign and underscores as text
>
>  <fieldType name="text_at_under" class="solr.TextField"
>             positionIncrementGap="100" autoGeneratePhraseQueries="**true">
>    <analyzer>
>      <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>      <filter class="solr.**WordDelimiterFilterFactory"
>              types="at-under-alpha.txt"/>
>    </analyzer>
>  </fieldType>
>
> The file +at-under-alpha.txt+ would contain:
>
>  @ => ALPHA
>  _ => ALPHA
>
> The analysis results:
>
>    Source: Hello @World_bar, r@end.
>    Tokens: 1: Hello 2: @World_bar 3: r@end
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mingfeng Yang
> Sent: Tuesday, June 18, 2013 6:58 PM
> To: solr-user@lucene.apache.org
> Subject: preserve special characters
>
>
> We need to index and search lots of tweets which can like "@solr:  solr is
> great". or "@solr_lucene, good combination".
>
> And we want to search with "@solr" or "@solr_lucene".  How can we preserve
> "@" and "_" in the index?
>
> If using whitespacetokennizer followed by worddelimiterfilter, @solr_lucene
> will be broken down into "solr" and "lucene", which make the search results
> contain lots of non-relevant docs.
>
> If using standardtokenizer, the "@" symbol is stripped.
>
> Thanks,
> Ming-
>

Re: preserve special characters

Reply via email to