Have two or three copies of the text, one field could be raw string and boosted heavily for exact match, a second could be text using the keyword tokenizer but with lowercase filter also heavily boosted, and the third field general, tokenized text with a lower boost. You could also have a copy that uses the keyword tokenizer to maintain a single token but also applies a regex filter to strip special characters and applies a lower case filter and give that an intermediate boost.

-- Jack Krupansky

-----Original Message----- From: johnmu...@aol.com
Sent: Thursday, October 24, 2013 9:20 AM
To: solr-user@lucene.apache.org
Subject: Searching on special characters

Hi,


How should I setup Solr so I can search and get hit on special characters such as: + - && || ! ( ) { } [ ] ^ " ~ * ? : \


My need is, if a user has text like so:


Doc-#1: "(Solr)"
Doc-#2: "Solr"


And they type "(solr)" I want a hit on "(solr)" only in document #1, with the brackets matching. And if they type "solr", they will get a hit in Document #2 only.


An additional nice-to-have is, if they type "solr", I want a hit in both document #1 and #2.


Here is what my current schema.xml looks like:



     <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="0" splitOnNumerics="1" stemEnglishPossessive="1" preserveOriginal="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
       <filter class="solr.PorterStemFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>



Currently, special characters are being stripped.



Any idea how I can configure Solr to do this?  I'm using Solr 3.6.



Thanks !!


-MJ

Reply via email to