Sounds like an attempt to identify stable Multi Word Units, sometimes
used in Natural Language Processing.

In that case, a Shingle factory plus using the field as a facet might
do the trick.

The shingle will generate a "token" that is "this kind of winter" and
facet will give back a count for it. The query then does not matter or
will be on a different field.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 25 November 2014 at 10:28, Erick Erickson <erickerick...@gmail.com> wrote:
> Tokenizers, filters and the like have no real way to
> figure out that some words in the query are to be
> ignored. In your example, how would one algorithmically
> determine that "this kind of winter" is important and that
> "Hi", "likes" and "weather" aren't? What's different
> about like/likes that indicates that the stemmed version
> of "like" shouldn't be important? Both the query
> and text could match "likes this kind of winter".
>
> This feels like an XY problem, what use-case are you
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Tue, Nov 25, 2014 at 5:20 AM, vit <bulgako...@yahoo.com> wrote:
>> Example what I need:
>> Query:
>> Hi likes *this kind of winter *weather
>> Document shingle field:
>> They like *this kind of winter *with many sunny days
>>
>> So I need to match *this kind of winter *.
>>
>> What tokenisers and filters and maybe something else should be used for this
>> kind of match.
>>
>> I tried for example this one, but it matches the entire query to a shingle:
>> <fieldType name="text_shingle" class="solr.TextField"
>> positionIncrementGap="100">
>>    <analyzer type="index">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>      <filter class="solr.LowerCaseFilterFactory" />
>>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="5"
>>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
>> tokenSeparator=" "/>
>>    </analyzer>
>>    <analyzer type="query">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>      <filter class="solr.LowerCaseFilterFactory" />
>>      <filter class="solr.ShingleFilterFactory" minShingleSize="2"
>> maxShingleSize="5"
>>              outputUnigrams="false" outputUnigramsIfNoShingles="true"
>> tokenSeparator=" "/>
>>    </analyzer>
>>  </fieldType>
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to