Hi epnRui,
I don't full follow your e-mail (I think you need to describe your use case) but here are some answers, - Is it possible to have facets of two or more words? Yes. For example if you use ShingleFilterFactory at index time you will see two or more words in facets. - Can I tokenize a phrase into words, but when it comes accross "European Union", it generates one token for "European Union" and not two tokens "European Union"? Yes. For example you can use mappingCharFilter (executed before tokenizer) with this mapping : "European Union" => "European_Union" Regarding synonym filter, please see : http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ Ahmet On Thursday, February 27, 2014 1:10 PM, epnRui <rui_banda...@hotmail.com> wrote: Hi everyone! I'm having a problem and I have searched and Haven't found a solution yet and am rather confused at the moment. I have an application that stores human readable texts in my Solr index. It finds the most relevant terms in that human readable text, I think using termvectors and facets, and it stores the facets terms. All works fine but now I need that the most relevant terms can also be terms of at least two words, like "European Union", which is quite a frequent term in my system...Still the system is getting into the facets "European" "Union" as two separate terms. So, questions are: - Is it possible to have facets of two or more words? - Can I tokenize a phrase into words, but when it comes accross "European Union", it generates one token for "European Union" and not two tokens "European Union"? - Can termvectors be used to find relevancy of multi-word terms like "European Union" ? - Can I use SynonymFilterFactory that would transform: "EU, UE, European Union, Union Europeene" into "European Union" ? At the moment of indexation I have the following analyzer for english language: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="blacklist.txt" ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.StopFilterFactory" words="en" ignoreCase="true"/> <filter class="solr.HunspellStemFilterFactory" dictionary="en_GB.dic" affix="en_GB.aff" ignoreCase="true" /> </analyzer> </fieldType> Thank you for the help! -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101.html Sent from the Solr - User mailing list archive at Nabble.com.