You could try adding a http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
Regards, Tomás On Mon, Dec 5, 2011 at 6:01 AM, Marian Steinbach <marian.steinb...@gmail.com > wrote: > Hi! > > I am surprised to find an empty string as the most frequent index term in > one of my fields. Until now I didn't even know that empty strings would be > indexed. > > Here is the schema.xml excerpt for that field: > > <fieldType name="text_terms" class="solr.TextField"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory" /> > <filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9]+$" > replacement="" /> > <filter class="solr.LowerCaseFilterFactory" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms_terms.txt" > ignoreCase="true" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords_terms.txt" /> > </analyzer> > </fieldType> > > <field name="terms" type="text_terms" indexed="true" stored="false" > multiValued="true"/> > > > I have the suspicion that PatternReplaceFilterFactory > with pattern="^[0-9]+$" is causing the empty strings. I introduced that > filter to prevent numbers-only strings from being added to the index. > > Any hint on how I can get rid of numbers AND empty strings? > > Thanks! > > Marian >