Re: Special characters not indexed

Timothy Potter Tue, 12 Mar 2013 12:54:49 -0700

Just to add to Jack's points, you can also use the term query parser to
avoid all the escaping for special characters, e.g.


fq={!term f=some_field}<crazy&term#value%>

See Erik's preso from Apache Eurocon 2012 around 25:50 -
http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822628

On Tue, Mar 12, 2013 at 12:33 PM, Jack Krupansky <[email protected]>wrote:

> Use the white space tokenizer and be sure to escape a lot of them in
> queries since a number of them have meaning to the query parser. Or,
> enclose query terms in quotes.
>
> -- Jack Krupansky
>
> -----Original Message----- From: vsl
> Sent: Tuesday, March 12, 2013 11:16 AM
> To: [email protected]
> Subject: Special characters not indexed
>
>
> Hi,
> I am trying to index special characters and make them searchable.
>
> User Story:
> 1. Index document with content: §$ %&/( )=? +*#'-<>
> 2. Find indexed document using search term: &
>
> Additionaly I have several other fields that are copied to textAll Field.
> The search is performed on this field.
>
> Does anybody know how to deal with such cases?
>
> Field definition:
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.**StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>
>        <filter class="solr.**LowerCaseFilterFactory"/>
>        <filter class="solr.**SnowballPorterFilterFactory"
> language="English"/>
>        <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" types="characters.txt" />
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.**StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="**true" />
>        <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.**LowerCaseFilterFactory"/>
>        <filter class="solr.**SnowballPorterFilterFactory"
> language="English"/>
>        <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" types="characters.txt" />
>      </analyzer>
>    </fieldType>
>
> where: characters.txt
>
> § => ALPHA
> $ => ALPHA
> % => ALPHA
> & => ALPHA
> / => ALPHA
> ( => ALPHA
> ) => ALPHA
> = => ALPHA
> ? => ALPHA
> + => ALPHA
> * => ALPHA
> # => ALPHA
> ' => ALPHA
> - => ALPHA
> < => ALPHA
>
>> => ALPHA
>>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Special-characters-**not-indexed-tp4046630.html<http://lucene.472066.n3.nabble.com/Special-characters-not-indexed-tp4046630.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Special characters not indexed

Reply via email to