Re: Facets, termvectors, relevancy and Multi word tokenizing

epnRui Mon, 03 Mar 2014 10:07:55 -0800

Hi guys,

I'm on my way to solve it properly.


This is how my field looks like now:


<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(#)|(%23)" replacement="79f20724d6985c5b857d2fa06a3ff8c6"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(((?i)((european parliament)|(parlament europeenne)))|(EP)|(PE))"
replacement="0ee062d61f44ae0a2aee145076ca6a69european_parliament"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.StopFilterFactory" words="blacklist.txt"
ignoreCase="true"/>
        <filter class="solr.StopFilterFactory" words="en"
ignoreCase="true"/>
        <filter class="solr.HunspellStemFilterFactory"
dictionary="en_GB.dic" affix="en_GB.aff" ignoreCase="true" />
                <filter class="solr.PatternReplaceFilterFactory"
pattern="0ee062d61f44ae0a2aee145076ca6a69european_parliament"
replacement="european parliament" replace="all" />
                <filter class="solr.PatternReplaceFilterFactory"
pattern="79f20724d6985c5b857d2fa06a3ff8c6" replacement="#" replace="all" />
      </analyzer>

I still have one case where I'm facing issues because in fact I want to
preserve the #:
 - #European Parliament is translated into one token instead of two:
"#European" and "Parliament"... anyway, I have some ideas on how to do it.
Ill let you know whatss the final solution



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120948.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facets, termvectors, relevancy and Multi word tokenizing

Reply via email to