Sounds like an attempt to identify stable Multi Word Units, sometimes used in Natural Language Processing.
In that case, a Shingle factory plus using the field as a facet might do the trick. The shingle will generate a "token" that is "this kind of winter" and facet will give back a count for it. The query then does not matter or will be on a different field. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 25 November 2014 at 10:28, Erick Erickson <erickerick...@gmail.com> wrote: > Tokenizers, filters and the like have no real way to > figure out that some words in the query are to be > ignored. In your example, how would one algorithmically > determine that "this kind of winter" is important and that > "Hi", "likes" and "weather" aren't? What's different > about like/likes that indicates that the stemmed version > of "like" shouldn't be important? Both the query > and text could match "likes this kind of winter". > > This feels like an XY problem, what use-case are you > trying to solve? > > Best, > Erick > > > > On Tue, Nov 25, 2014 at 5:20 AM, vit <bulgako...@yahoo.com> wrote: >> Example what I need: >> Query: >> Hi likes *this kind of winter *weather >> Document shingle field: >> They like *this kind of winter *with many sunny days >> >> So I need to match *this kind of winter *. >> >> What tokenisers and filters and maybe something else should be used for this >> kind of match. >> >> I tried for example this one, but it matches the entire query to a shingle: >> <fieldType name="text_shingle" class="solr.TextField" >> positionIncrementGap="100"> >> <analyzer type="index"> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory" /> >> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> maxShingleSize="5" >> outputUnigrams="false" outputUnigramsIfNoShingles="true" >> tokenSeparator=" "/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> <filter class="solr.LowerCaseFilterFactory" /> >> <filter class="solr.ShingleFilterFactory" minShingleSize="2" >> maxShingleSize="5" >> outputUnigrams="false" outputUnigramsIfNoShingles="true" >> tokenSeparator=" "/> >> </analyzer> >> </fieldType> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html >> Sent from the Solr - User mailing list archive at Nabble.com.