<fieldType name="text_shingle4" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.ShingleFilterFactory" minShingleSize="4" maxShingleSize="4" outputUnigrams="false"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
I'm using for a field, indexing, then looking at the terms component. I'm seeing shingles that consist of only 2 terms, whereas I'm expecting all the terms to be at least 4 terms... What's up? Thanks.