Re: Multi-word exact keyword case-insensitive search suggestions

Adam Estrada Thu, 13 Jan 2011 07:31:30 -0800

Hi,

the following seems to work pretty well.


    <fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.ShingleFilterFactory"
          maxShingleSize="4" outputUnigrams="true"
outputUnigramIfNoNgram="false" />
      </analyzer>
    </fieldType>

    <!-- A text field that uses WordDelimiterFilter to enable splitting and
matching of
        words on case-change, alpha numeric boundaries, and non-alphanumeric
chars,
        so that a query of "wifi" or "wi fi" could match a document
containing "Wi-Fi".
        Synonyms and stopwords are customized by external files, and
stemming is enabled.
        The attribute autoGeneratePhraseQueries="true" (the default) causes
words that get split to
        form phrase queries. For example, WordDelimiterFilter splitting
text:pdp-11 will cause the parser
        to generate text:"pdp 11" rather than (text:PDP OR text:11).
        NOTE: autoGeneratePhraseQueries="true" tends to not work well for
non whitespace delimited languages.
        -->
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <copyField source="cat" dest="text"/>
    <copyField source="subject" dest="text"/>
    <copyField source="summary" dest="text"/>
    <copyField source="cause" dest="text"/>
    <copyField source="status" dest="text"/>
    <copyField source="urgency" dest="text"/>

I ingest the source fields as text_ws (I know I've changed it a bit) and
then copy the field to text. This seems to do what you are asking for.

Adam

On Thu, Jan 13, 2011 at 12:05 AM, Chamnap Chhorn <chamnapchh...@gmail.com>wrote:

> Hi all,
>
> I'm just stuck with exact keyword for several days. Hope you guys could
> help
> me. Here is the scenario:
>
>   1. It need to be matched with multi-word keyword and case insensitive
>   2. Partial word or single word matching with this field is not allowed
>
> I want to know the field type definition for this field and sample solr
> query. I need to combine this search with my full text search which uses
> dismax query.
>
> Thanks
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>

Re: Multi-word exact keyword case-insensitive search suggestions

Reply via email to