Re: Need tokenization that finds part of stringvalue

Erick Erickson Thu, 01 Mar 2012 05:53:59 -0800

Right, there's nothing in Solr that I know of that'll help here. How would
a tokenizer understand that "smartphone" should be "smart" "phone"?
There's no general solution for this issue.


You can do domain-specific solutions with synonyms for instance, or
some other word list that contains terms you're interested in, entries
like smartphone => smart phone
but that has the obvious drawback of requiring that you know all the
terms that might be smashed together.

You *might* be able to do something with shingles, but I'm a little unclear
on how.

Best
Erick

On Tue, Feb 28, 2012 at 4:05 PM, PeterKerk <vettepa...@hotmail.com> wrote:
> I have the following in my schema.xml
>
> <field name="title" type="text_ws" indexed="true" stored="true"/>
> <field name="title_search" type="text" indexed="true" stored="true"/>
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>  <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
>  <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
>        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  </analyzer>
> </fieldType>
>
>
> I want to search on field "title".
> Now my field title holds the value "great smartphone".
> If I search on "smartphone" the item is found. But I want the item also to
> be found on "great" or "phone" it doesnt work.
> I have been playing around with the tokenizer test function, but have failed
> to find the definition for the "text" fieldtype I need.
> Help? :)
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Need-tokenization-that-finds-part-of-stringvalue-tp3785366p3785366.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need tokenization that finds part of stringvalue

Reply via email to