Re: Multi-word exact keyword case-insensitive search suggestions

Chamnap Chhorn Mon, 17 Jan 2011 17:26:56 -0800

No other way around to fit this requirement?

On Sat, Jan 15, 2011 at 10:01 AM, Chamnap Chhorn <chamnapchh...@gmail.com>wrote:


> Ahh, thanks guys for helping me!
>
> For Adam solution, it doesn't work for me. Here is my Field, FieldType, and
> solr query:
>
> <fieldType name="text_keyword" class="solr.TextField"
> positionIncrementGap="100">
>
>        <analyzer>
>        <tokenizer class="solr.KeywordTokenizerFactory" />
>        <filter class="solr.ShingleFilterFactory"
>          maxShingleSize="4" outputUnigrams="true"
> outputUnigramIfNoNgram="false" />
>      </analyzer>
> </fieldType>
>
> <field name="keyphrase" type="text_keyword" indexed="true" stored="false"
> multiValued="true"/>
>
>
> http://localhost:8081/solr/select?q=printing%20house&qf=keyphrase&debugQuery=on&defType=dismax
>
>
> <str name="parsedquery">
> +((DisjunctionMaxQuery((keyphrase:smart))
> DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
> </str>
> <str name="parsedquery_toString">+(((keyphrase:smart)
> (keyphrase:mobile))~2) ()</str>
>  <lst name="explain"/>
>
> The result is not found.
>
> For erick solution, it works for me. However, I can't put filter query,
> since it's part of full text search. If I put fq, it would just return
> documents that match exactly as the query. I want to show those that match
> exactly on the top and the rest for documents that match partially.
>
> The problem is that when the user search a word (eg. "printing" of the
> keyword "printing house"), that document also include in the search results.
> The other problem is that if the user search the reverse order(eg. "house
> printing"), it's also found.
>
> Cheers
>
>
> On Sat, Jan 15, 2011 at 3:31 AM, Erick Erickson 
> <erickerick...@gmail.com>wrote:
>
>> This might work:
>>
>> Define your field to use WhitespaceTokenizer and LowerCaseFilterFactory
>>
>> Use a filter query referencing this field.
>>
>> If you wanted the words to appear in their exact order, you could just
>> define
>> the "pf" field in your dismax.
>>
>> Best
>> Erick
>>
>> On Thu, Jan 13, 2011 at 8:01 PM, Estrada Groups <
>> estrada.adam.gro...@gmail.com> wrote:
>>
>> > Ahhh...the fun of open source software ;-). Requires a ton of trial and
>> > error! I found what worked for me and figured it was worth passing it
>> along.
>> > If you don't mind...when you sort everything out on your end, please
>> post
>> > results for the rest of us to take a gander at.
>> >
>> > Cheers,
>> > Adam
>> >
>> > On Jan 13, 2011, at 9:08 PM, Chamnap Chhorn <chamnapchh...@gmail.com>
>> > wrote:
>> >
>> > > Thanks for your reply. However, it doesn't work for my case at all. I
>> > think
>> > > it's the problem with query parser or something else. It forces me to
>> put
>> > > double quote to the search query in order to get the results found.
>> > >
>> > > <str name="rawquerystring">"sim 010"</str>
>> > > <str name="querystring">"sim 010"</str>
>> > > <str name="parsedquery">+DisjunctionMaxQuery((keyphrase:sim 010))
>> > ()</str>
>> > > <str name="parsedquery_toString">+(keyphrase:sim 010) ()</str>
>> > >
>> > > <str name="rawquerystring">smart mobile</str>
>> > > <str name="querystring">smart mobile</str>
>> > > <str name="parsedquery">
>> > > +((DisjunctionMaxQuery((keyphrase:smart))
>> > > DisjunctionMaxQuery((keyphrase:mobile)))~2) ()
>> > > </str>
>> > > <str name="parsedquery_toString">+(((keyphrase:smart)
>> > (keyphrase:mobile))~2)
>> > > ()</str>
>> > >
>> > > The intent here is to do a full text search, part of that is to search
>> > > keyword field, so I can't put quote to it.
>> > >
>> > > On Thu, Jan 13, 2011 at 10:30 PM, Adam Estrada <
>> > > estrada.adam.gro...@gmail.com> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> the following seems to work pretty well.
>> > >>
>> > >>   <fieldType name="text_ws" class="solr.TextField"
>> > >> positionIncrementGap="100">
>> > >>     <analyzer>
>> > >>       <tokenizer class="solr.KeywordTokenizerFactory" />
>> > >>       <filter class="solr.ShingleFilterFactory"
>> > >>         maxShingleSize="4" outputUnigrams="true"
>> > >> outputUnigramIfNoNgram="false" />
>> > >>     </analyzer>
>> > >>   </fieldType>
>> > >>
>> > >>   <!-- A text field that uses WordDelimiterFilter to enable splitting
>> > and
>> > >> matching of
>> > >>       words on case-change, alpha numeric boundaries, and
>> > non-alphanumeric
>> > >> chars,
>> > >>       so that a query of "wifi" or "wi fi" could match a document
>> > >> containing "Wi-Fi".
>> > >>       Synonyms and stopwords are customized by external files, and
>> > >> stemming is enabled.
>> > >>       The attribute autoGeneratePhraseQueries="true" (the default)
>> > causes
>> > >> words that get split to
>> > >>       form phrase queries. For example, WordDelimiterFilter splitting
>> > >> text:pdp-11 will cause the parser
>> > >>       to generate text:"pdp 11" rather than (text:PDP OR text:11).
>> > >>       NOTE: autoGeneratePhraseQueries="true" tends to not work well
>> for
>> > >> non whitespace delimited languages.
>> > >>       -->
>> > >>   <fieldType name="text" class="solr.TextField"
>> > positionIncrementGap="100"
>> > >> autoGeneratePhraseQueries="true">
>> > >>     <analyzer type="index">
>> > >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> > >>       <!-- in this example, we will only use synonyms at query time
>> > >>       <filter class="solr.SynonymFilterFactory"
>> > >> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>> > >>       -->
>> > >>       <!-- Case insensitive stop word removal.
>> > >>         add enablePositionIncrements=true in both the index and query
>> > >>         analyzers to leave a 'gap' for more accurate phrase queries.
>> > >>       -->
>> > >>       <filter class="solr.StopFilterFactory"
>> > >>               ignoreCase="true"
>> > >>               words="stopwords.txt"
>> > >>               enablePositionIncrements="true"
>> > >>               />
>> > >>       <filter class="solr.WordDelimiterFilterFactory"
>> > >> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> > >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>> > >>       <filter class="solr.LowerCaseFilterFactory"/>
>> > >>       <filter class="solr.KeywordMarkerFilterFactory"
>> > >> protected="protwords.txt"/>
>> > >>       <filter class="solr.PorterStemFilterFactory"/>
>> > >>     </analyzer>
>> > >>     <analyzer type="query">
>> > >>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> > >>       <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt"
>> > >> ignoreCase="true" expand="true"/>
>> > >>       <filter class="solr.StopFilterFactory"
>> > >>               ignoreCase="true"
>> > >>               words="stopwords.txt"
>> > >>               enablePositionIncrements="true"
>> > >>               />
>> > >>       <filter class="solr.WordDelimiterFilterFactory"
>> > >> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> > >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>> > >>       <filter class="solr.LowerCaseFilterFactory"/>
>> > >>       <filter class="solr.KeywordMarkerFilterFactory"
>> > >> protected="protwords.txt"/>
>> > >>       <filter class="solr.PorterStemFilterFactory"/>
>> > >>     </analyzer>
>> > >>   </fieldType>
>> > >>
>> > >>   <copyField source="cat" dest="text"/>
>> > >>   <copyField source="subject" dest="text"/>
>> > >>   <copyField source="summary" dest="text"/>
>> > >>   <copyField source="cause" dest="text"/>
>> > >>   <copyField source="status" dest="text"/>
>> > >>   <copyField source="urgency" dest="text"/>
>> > >>
>> > >> I ingest the source fields as text_ws (I know I've changed it a bit)
>> and
>> > >> then copy the field to text. This seems to do what you are asking
>> for.
>> > >>
>> > >> Adam
>> > >>
>> > >> On Thu, Jan 13, 2011 at 12:05 AM, Chamnap Chhorn <
>> > chamnapchh...@gmail.com
>> > >>> wrote:
>> > >>
>> > >>> Hi all,
>> > >>>
>> > >>> I'm just stuck with exact keyword for several days. Hope you guys
>> could
>> > >>> help
>> > >>> me. Here is the scenario:
>> > >>>
>> > >>>  1. It need to be matched with multi-word keyword and case
>> insensitive
>> > >>>  2. Partial word or single word matching with this field is not
>> allowed
>> > >>>
>> > >>> I want to know the field type definition for this field and sample
>> solr
>> > >>> query. I need to combine this search with my full text search which
>> > uses
>> > >>> dismax query.
>> > >>>
>> > >>> Thanks
>> > >>> --
>> > >>> Chhorn Chamnap
>> > >>> http://chamnapchhorn.blogspot.com/
>> > >>>
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Chhorn Chamnap
>> > > http://chamnapchhorn.blogspot.com/
>> >
>>
>
>
>
> --
> Chhorn Chamnap
> http://chamnapchhorn.blogspot.com/
>



-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/

Re: Multi-word exact keyword case-insensitive search suggestions

Reply via email to