Just to add things are going not as expected before the keepword, the synonym list is not be expanded for shingles I think I don't understand term position....
On 5 February 2011 16:08, lee carroll <lee.a.carr...@googlemail.com> wrote: > Hi List > I'm trying to achieve the following > > text in "this aisle contains preserves and savoury spreads" > > desired index entry for a field to be used for faceting (ie strict set of > normalised terms) > is "jams" "savoury spreads" ie two facet terms > > current set up for the field is > > <fieldType name="facet" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true"/> > <filter class="solr.SynonymFilterFactory" > synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.KeepWordFilterFactory" > words="goodForKeepWords.txt" ignoreCase="true"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.ShingleFilterFactory" maxShingleSize="2" > outputUnigrams="true"/> > <filter class="solr.SynonymFilterFactory" > synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.KeepWordFilterFactory" > words="goodForKeepWords.txt" ignoreCase="true"/> > </analyzer> > </fieldType> > > The thinking here is > get rid of any mark up nonsense > split into tokens based on whitespace => "this" "aisle" "contains" > "preserves" "and" "savoury" "spreads" > produce shingles of 1 or 2 tokens => "this","this aisle", "aisle", "aisle > contains", "contains", "contains preserves","preserves","and", > "and savoury", > "savoury", "savoury spreads", "spreads" > > expand synonyms using a synomym file (preserves -> jam) => > > "this","this aisle", "aisle", "aisle contains", "contains","contains > preserves","preserves","jam","and","and savoury", "savoury", "savoury > spreads", "spreads" > > produce a normalised term list using a keepword file of jam , "savoury > spreads" in it > > which should place "jam" "savoury spreads" into the index field facet..... > > However i don't get savoury spreads in the index. from the analysis.jsp > everything goes to plan upto the last step where the keepword file does not > like keeping the phrase "savoury spreads". i've tried niavely quoting the > phrase in the keepword file :-) > > What is the best way to achive the above ? Is this the correct approach or > is there a better way ? > > thanks in advance lee > > > > >