To answer my own question, and this sucks :) the minShingleSize isn't set in at least 1.4.2. I'm guessing a later version though?
On Tue, Sep 14, 2010 at 5:49 PM, Jason Rutherglen <jason.rutherg...@gmail.com> wrote: > <fieldType name="text_shingle4" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > <filter class="solr.ShingleFilterFactory" minShingleSize="4" > maxShingleSize="4" outputUnigrams="false"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > > I'm using for a field, indexing, then looking at the terms component. > I'm seeing shingles that consist of only 2 terms, whereas I'm > expecting all the terms to be at least 4 terms... What's up? Thanks. >