Just to add things are going not as expected before the keepword, the
synonym list is not be expanded for shingles I think I don't understand term
position....

On 5 February 2011 16:08, lee carroll <lee.a.carr...@googlemail.com> wrote:

> Hi List
> I'm trying to achieve the following
>
> text in "this aisle contains preserves and savoury spreads"
>
> desired index entry for a field to be used for faceting (ie strict set of
> normalised terms)
> is "jams" "savoury spreads" ie two facet terms
>
> current set up for the field is
>
> <fieldType name="facet" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.KeepWordFilterFactory"
> words="goodForKeepWords.txt" ignoreCase="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="2"
> outputUnigrams="true"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="goodForSynonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.KeepWordFilterFactory"
> words="goodForKeepWords.txt" ignoreCase="true"/>
>       </analyzer>
>     </fieldType>
>
> The thinking here is
> get rid of any mark up nonsense
> split into tokens based on whitespace => "this" "aisle" "contains"
> "preserves" "and" "savoury" "spreads"
> produce shingles of 1 or 2 tokens => "this","this aisle", "aisle", "aisle
> contains", "contains", "contains preserves","preserves","and",
>                                                       "and savoury",
> "savoury", "savoury spreads", "spreads"
>
> expand synonyms using a synomym file (preserves -> jam) =>
>
> "this","this aisle", "aisle", "aisle contains", "contains","contains
> preserves","preserves","jam","and","and savoury", "savoury", "savoury
> spreads", "spreads"
>
> produce a normalised term list using a keepword file of jam , "savoury
> spreads" in it
>
> which should place "jam" "savoury spreads" into the index field facet.....
>
> However i don't get savoury spreads in the index. from the analysis.jsp
> everything goes to plan upto the last step where the keepword file does not
> like keeping the phrase "savoury spreads". i've tried niavely quoting the
> phrase in the keepword file :-)
>
> What is the best way to achive the above ? Is this the correct approach or
> is there a better way ?
>
> thanks in advance lee
>
>
>
>
>

Reply via email to