Another problem that I see in Solr analysis is the query term that matches the tokenized field does not match on the case insensitive field. So, if I'm searching for 'coast to coast', I see that the tokenized series title (pg_series_title) is matched but not the ci field which is pg_series_title_ci.
The definition of both field is as below: <fieldType name="text_wc" class="solr.TextField" positionIncrementGap="100" > <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" splitOnNumerics="0" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" omitNorms="true" compressThreshold="10"> <analyzer> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="pg_series_title" type="text_wc" indexed="true" stored="true" multiValued="false" /> <field name="pg_series_title_ci" type="string_ci" indexed="true" stored="true" multiValued="false" /> *<copyField source="pg_series_title" dest="pg_series_title_ci" />* * * *Can this copyfield directive be an issue? Should it be other way round or does it matter?* Thanks, Sandeep On 4 April 2013 10:38, Sandeep Mestry <sanmes...@gmail.com> wrote: > Hi Jan, > > Thanks for your reply. I have defined string_ci like below: > > <fieldType name="string_ci" class="solr.TextField" sortMissingLast="true" > omitNorms="true" compressThreshold="10"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > When I analyse the query in solr, I saw that document containing > pg_series_title_ci:"funny" matches when I do a search for > pg_series_title_ci:"funny games" and is ranked higher than the document > containing the exact matches. I can use the default string data type but > then the match will be on exact casing. > > Thanks, > Sandeep > > > On 3 April 2013 22:20, Jan Høydahl <jan....@cominvent.com> wrote: > >> Can you show us your *_ci field type? Solr does not really have a way to >> tell whether a match is "exact" or only partial, but you could hack around >> it with the fieldType. See https://github.com/cominvent/exactmatch for a >> possible solution. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> 3. apr. 2013 kl. 15:55 skrev Sandeep Mestry <sanmes...@gmail.com>: >> >> > Hi All, >> > >> > I have a requirement where in exact matches for 2 fields (Series Title, >> > Title) should be ranked higher than the partial matches. The >> configuration >> > looks like below: >> > >> > <requestHandler name="assetdismax" class="solr.SearchHandler" > >> > <lst name="defaults"> >> > <str name="defType">edismax</str> >> > <str name="echoParams">explicit</str> >> > <float name="tie">0.01</float> >> > <str name="qf">*pg_series_title_ci*^500 *title_ci*^300 * >> > pg_series_title*^200 *title*^25 classifications^15 >> classifications_texts^15 >> > parent_classifications^10 synonym_classifications^5 pg_brand_title^5 >> > pg_series_working_title^5 p_programme_title^5 p_item_title^5 >> > p_interstitial_title^5 description^15 pg_series_description >> annotations^0.1 >> > classification_notes^0.05 pv_program_version_number^2 >> > pv_program_version_number_ci^2 pv_program_number^2 >> pv_program_number_ci^2 >> > p_program_number^2 ma_version_number^2 ma_recording_location >> > ma_contributions^0.001 rel_pg_series_title rel_programme_title >> > rel_programme_number rel_programme_number_ci pg_uuid^0.5 p_uuid^0.5 >> > pv_uuid^0.5 ma_uuid^0.5</str> >> > <str name="pf">pg_series_title_ci^500 title_ci^500</str> >> > <int name="ps">0</int> >> > <str name="q.alt">*:*</str> >> > <str name="mm">100%</str> >> > <str name="q.op">AND</str> >> > <str name="facet">true</str> >> > <str name="facet.limit">-1</str> >> > <str name="facet.mincount">1</str> >> > </lst> >> > </requestHandler> >> > >> > As you can see above, the search is against many fields. What I'd want >> is >> > the documents that have exact matches for series title and title fields >> > should rank higher than the rest. >> > >> > I have added 2 case insensitive (*pg_series_title_ci, title_ci*) fields >> for >> > series title and title and have boosted them higher over the tokenized >> and >> > rest of the fields. I have also implemented a similarity class to >> override >> > idf however I still get documents having partial matches in title and >> other >> > fields ranking higher than exact match in pg_series_title_ci. >> > >> > Many Thanks, >> > Sandeep >> >> >