Have two or three copies of the text, one field could be raw string and
boosted heavily for exact match, a second could be text using the keyword
tokenizer but with lowercase filter also heavily boosted, and the third
field general, tokenized text with a lower boost. You could also have a copy
that uses the keyword tokenizer to maintain a single token but also applies
a regex filter to strip special characters and applies a lower case filter
and give that an intermediate boost.
-- Jack Krupansky
-----Original Message-----
From: johnmu...@aol.com
Sent: Thursday, October 24, 2013 9:20 AM
To: solr-user@lucene.apache.org
Subject: Searching on special characters
Hi,
How should I setup Solr so I can search and get hit on special characters
such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \
My need is, if a user has text like so:
Doc-#1: "(Solr)"
Doc-#2: "Solr"
And they type "(solr)" I want a hit on "(solr)" only in document #1, with
the brackets matching. And if they type "solr", they will get a hit in
Document #2 only.
An additional nice-to-have is, if they type "solr", I want a hit in both
document #1 and #2.
Here is what my current schema.xml looks like:
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_en.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="1" splitOnCaseChange="0"
splitOnNumerics="1" stemEnglishPossessive="1" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
Currently, special characters are being stripped.
Any idea how I can configure Solr to do this? I'm using Solr 3.6.
Thanks !!
-MJ