Tokenizers, filters and the like have no real way to
figure out that some words in the query are to be
ignored. In your example, how would one algorithmically
determine that "this kind of winter" is important and that
"Hi", "likes" and "weather" aren't? What's different
about like/likes that indicates that the stemmed version
of "like" shouldn't be important? Both the query
and text could match "likes this kind of winter".

This feels like an XY problem, what use-case are you
trying to solve?

Best,
Erick



On Tue, Nov 25, 2014 at 5:20 AM, vit <bulgako...@yahoo.com> wrote:
> Example what I need:
> Query:
> Hi likes *this kind of winter *weather
> Document shingle field:
> They like *this kind of winter *with many sunny days
>
> So I need to match *this kind of winter *.
>
> What tokenisers and filters and maybe something else should be used for this
> kind of match.
>
> I tried for example this one, but it matches the entire query to a shingle:
> <fieldType name="text_shingle" class="solr.TextField"
> positionIncrementGap="100">
>    <analyzer type="index">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory" />
>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="5"
>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
> tokenSeparator=" "/>
>    </analyzer>
>    <analyzer type="query">
>      <tokenizer class="solr.StandardTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory" />
>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
> maxShingleSize="5"
>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
> tokenSeparator=" "/>
>    </analyzer>
>  </fieldType>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to