Tokenizers, filters and the like have no real way to figure out that some words in the query are to be ignored. In your example, how would one algorithmically determine that "this kind of winter" is important and that "Hi", "likes" and "weather" aren't? What's different about like/likes that indicates that the stemmed version of "like" shouldn't be important? Both the query and text could match "likes this kind of winter".
This feels like an XY problem, what use-case are you trying to solve? Best, Erick On Tue, Nov 25, 2014 at 5:20 AM, vit <bulgako...@yahoo.com> wrote: > Example what I need: > Query: > Hi likes *this kind of winter *weather > Document shingle field: > They like *this kind of winter *with many sunny days > > So I need to match *this kind of winter *. > > What tokenisers and filters and maybe something else should be used for this > kind of match. > > I tried for example this one, but it matches the entire query to a shingle: > <fieldType name="text_shingle" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.ShingleFilterFactory" minShingleSize="2" > maxShingleSize="5" > outputUnigrams="false" outputUnigramsIfNoShingles="true" > tokenSeparator=" "/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.ShingleFilterFactory" minShingleSize="2" > maxShingleSize="5" > outputUnigrams="false" outputUnigramsIfNoShingles="true" > tokenSeparator=" "/> > </analyzer> > </fieldType> > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Help-on-matching-a-shingle-in-a-query-to-a-shingle-in-the-document-tp4170852.html > Sent from the Solr - User mailing list archive at Nabble.com.