Hi everyone! I'm having a problem and I have searched and Haven't found a solution yet and am rather confused at the moment.
I have an application that stores human readable texts in my Solr index. It finds the most relevant terms in that human readable text, I think using termvectors and facets, and it stores the facets terms. All works fine but now I need that the most relevant terms can also be terms of at least two words, like "European Union", which is quite a frequent term in my system...Still the system is getting into the facets "European" "Union" as two separate terms. So, questions are: - Is it possible to have facets of two or more words? - Can I tokenize a phrase into words, but when it comes accross "European Union", it generates one token for "European Union" and not two tokens "European Union"? - Can termvectors be used to find relevancy of multi-word terms like "European Union" ? - Can I use SynonymFilterFactory that would transform: "EU, UE, European Union, Union Europeene" into "European Union" ? At the moment of indexation I have the following analyzer for english language: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="blacklist.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" words="en" ignoreCase="true"/> <filter class="solr.HunspellStemFilterFactory" dictionary="en_GB.dic" affix="en_GB.aff" ignoreCase="true" /> </analyzer> </fieldType> Thank you for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101.html Sent from the Solr - User mailing list archive at Nabble.com.