Re: Special Characters search in solr

dabboo Tue, 17 Mar 2009 07:05:06 -0700

Yes, I did and below is my debugQuery result.

<?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">47</int> 
- <lst name="params">
  <str name="rows">10</str> 
  <str name="start">0</str> 
  <str name="indent">on</str> 
  <str name="q">Colo�</str> 
  <str name="qt">dismaxrequest</str> 
  <str name="debugQuery">true</str> 
  <str name="version">2.2</str> 
  </lst>
  </lst>
  <result name="response" numFound="0" start="0" maxScore="0.0" /> 
- <lst name="debug">
  <str name="rawquerystring">Colo�</str> 
  <str name="querystring">Colo�</str> 
  <str
name="parsedquery">+DisjunctionMaxQuery((programJacketImage_program_s:colo |
courseCodeSeq_course_s:colo | authorLastName_product_s:colo |
era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo |
discCode_course_s:colo | sourceGroupName_course_s:colo |
indexType_course_s:colo | prdMainTitle_product_s:colo |
isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo |
discipline_product_s:colo | courseJacketImage_course_s:colo |
imprint_product_s:colo | introText_program_s:colo |
productType_product_s:colo | isbn13_product_s:colo |
copyrightYear_product_s:colo | prdPubDate_product_s:colo |
programType_program_s:colo | editor_product_s:colo |
courseType_course_s:colo | courseId_course_s:colo |
categoryIds_product_s:colo | contentType_product_s:colo |
indexType_program_s:colo | strapline_product_s:colo |
subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo
| subject_product_s:colo | edition_product_s:colo | IndexId_s:colo |
programId_program_s:colo)~0.01) () all:english^90.0 all:hindi^123.0
all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0
all:erer^4545.0</str> 
  <str name="parsedquery_toString">+(programJacketImage_program_s:colo |
courseCodeSeq_course_s:colo | authorLastName_product_s:colo |
era_product_s:colo | Index_Type_s:colo | prdMainTitle_s:colo |
discCode_course_s:colo | sourceGroupName_course_s:colo |
indexType_course_s:colo | prdMainTitle_product_s:colo |
isbn10_product_s:colo | displayName_course_s:colo | groupNm_program_s:colo |
discipline_product_s:colo | courseJacketImage_course_s:colo |
imprint_product_s:colo | introText_program_s:colo |
productType_product_s:colo | isbn13_product_s:colo |
copyrightYear_product_s:colo | prdPubDate_product_s:colo |
programType_program_s:colo | editor_product_s:colo |
courseType_course_s:colo | courseId_course_s:colo |
categoryIds_product_s:colo | contentType_product_s:colo |
indexType_program_s:colo | strapline_product_s:colo |
subCompany_course_s:colo | aluminator_product_s:colo | readBy_product_s:colo
| subject_product_s:colo | edition_product_s:colo | IndexId_s:colo |
programId_program_s:colo)~0.01 () all:english^90.0 all:hindi^123.0
all:glorious^2000.0 all:highlight^1.0E7 all:math^100.0 all:ab^12.0
all:erer^4545.0</str> 
  <lst name="explain" /> 
  <str name="QParser">DismaxQParser</str>



It is actually converting "Coloèr" to "Colo�" and hence not searching. It is
behaving the same even before adding the ISOLatin1AccentFilter.

Please suggest.

Thanks,
Amit Garg

Erick Erickson wrote:
> 
> Did you reindex after you incorporated the ISOLatin... filter?
> 
> On Tue, Mar 17, 2009 at 8:40 AM, dabboo <ag...@sapient.com> wrote:
> 
>>
>> This is the entry in schema.xml
>>
>>    <fieldType name="text" class="solr.TextField"
>> positionIncrementGap="100"
>> omitNorms="true">
>>      <analyzer type="index">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <!--tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"
>> /-->
>>        <!-- in this example, we will only use synonyms at query time
>>        <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>        -->
>>        <!-- Case insensitive stop word removal.
>>             enablePositionIncrements=true ensures that a 'gap' is left to
>>             allow for accurate phrase queries.
>>        -->
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="stopwords.txt"
>>                enablePositionIncrements="true"
>>                />
>>                <filter class="solr.ISOLatin1AccentFilterFactory"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>        <analyzer class="org.apache.lucene.analysis.ru.RussianAnalyzer"/>
>>
>>      </analyzer>
>>      <analyzer type="query">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.ISOLatin1AccentFilterFactory"/>
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>        <!--analyzer
>> class="org.apache.lucene.analysis.ru.RussianAnalyzer"/-->
>>         <filter class="solr.ShingleFilterFactory" outputUnigrams="true"
>> outputUnigramIfNoNgram="true" maxShingleSize="99"/>
>>
>>
>>      </analyzer>
>>    </fieldType>
>>
>>
>>
>> dabboo wrote:
>> >
>> > I have added this filter factory in my schema.xml also but still that
>> is
>> > not working. I am sorry but I didnt get as how to create the field to
>> > handle the accents.
>> >
>> > Please help.
>> >
>> >
>> > Grant Ingersoll-6 wrote:
>> >>
>> >> You will need to create a field that handles the accents in order to
>> >> do this.  Start by looking at the ISOLatin1AccentFilter.
>> >>
>> >> -Grant
>> >>
>> >> On Mar 17, 2009, at 7:31 AM, dabboo wrote:
>> >>
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am searching with any query string, which contains special
>> >>> characters like
>> >>> è in it. for e.g. If I search for tèst then it shud return all the
>> >>> results
>> >>> which contains tèst and test etc. There are other special characters
>> >>> also.
>> >>>
>> >>> I have updated my server.xml file of tomcat server and included
>> >>> UTF-8 as
>> >>> encoding type in the server entry but still it is not working.
>> >>>
>> >>> Please suggest.
>> >>>
>> >>> Thanks,
>> >>> Amit Garg
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22557230.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22558353.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Special-Characters-search-in-solr-tp22557230p22559419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Special Characters search in solr

Reply via email to