Use the white space tokenizer and be sure to escape a lot of them in queries
since a number of them have meaning to the query parser. Or, enclose query
terms in quotes.
-- Jack Krupansky
-----Original Message-----
From: vsl
Sent: Tuesday, March 12, 2013 11:16 AM
To: solr-user@lucene.apache.org
Subject: Special characters not indexed
Hi,
I am trying to index special characters and make them searchable.
User Story:
1. Index document with content: §$ %&/( )=? +*#'-<>
2. Find indexed document using search term: &
Additionaly I have several other fields that are copied to textAll Field.
The search is performed on this field.
Does anybody know how to deal with such cases?
Field definition:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
language="English"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
preserveOriginal="1" types="characters.txt" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory"
language="English"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
preserveOriginal="1" types="characters.txt" />
</analyzer>
</fieldType>
where: characters.txt
§ => ALPHA
$ => ALPHA
% => ALPHA
& => ALPHA
/ => ALPHA
( => ALPHA
) => ALPHA
= => ALPHA
? => ALPHA
+ => ALPHA
* => ALPHA
# => ALPHA
' => ALPHA
- => ALPHA
< => ALPHA
=> ALPHA
--
View this message in context:
http://lucene.472066.n3.nabble.com/Special-characters-not-indexed-tp4046630.html
Sent from the Solr - User mailing list archive at Nabble.com.