: The question asked, in good faith, was does solr support or extend to : implementing a thesaurus. It looks like it does not which is fine. It does
Well, my point was that "thesaurus" is not a feature description. it's a data structure, and depending on your goals, the existing SynonymFilter may be perfectly usable out of the box. : Use case 1: improve facets : : Motivation : Unstructured lists of labels in facets offer very poor user experience. : Similar to tag clouds users find them arbitrary, with out focus and often : overwhelming. Labels in facets which are grouped in meaningful ways relevant : to the user increase engagement, perceived relevance and user satisfaction. SynonymFilter could definitley be used to help in this situation -- if you create a synonyms.txt file mapping all of the terms in your thesaurus to your Prefered Term you could then use SynonymFilter at index time to get a clean list of facet constraints. (if you wnat a simple list of only the Prefered Terms) Alternately... : Solution : A thesaurus of term relationships could be used to group facet labels : : Implementation : (er completely out of my depth at this point) : Thesaurus relationships defined in a simple text file : term, bt=>term,term nt=> term, term rt=>term, term, pt=>term : if a search specifies a facet to be returned the field terms are identified : by reading the thesaurus into groups, broader terms, narrower terms, related : terms etc : These groups are returned as part of the response for the UI to display : faceted labels as broader, narrower, related terms etc ...what you're describing is a hierarchical faceting model. with a properly structured synonyms.txt used at indexing time and the "hierarchy" trick i describe on slide #32-25 of this presentation... http://people.apache.org/~hossman/apachecon2010/facets/ ...that should also be posisble. : Implementation : (again completely out of depth here) : Allow terms in the index to be identified as bt , nt, .. terms of the search : term. Allow query parser to boost terms differentially based on these : thesaurus relationships see my earlier reply to Péter Király, what you are describing is only slightly more complicated then what i describe there ... this is definitely something that would require a custom QParser, but the heavy lifting could still be done by SynonymFilter (in the case you describe, you'd just need to split your thesarus up into distinct mapping files for BT, NT, etc.. and then have one SynonymFilter for each, and apply the appropriate boost to the queries generated from them. : Again though just to repeat this is hardly a killer for us. We've looked at : solr for a project; created a proto type; generated tons of questions, had : them answered in the main by the docs, some on this list and been amazed at : the fantastic results solr has given us. In fact with a combination of : keepwords and synonyms we have got a pretty nice simple set of facet labels : anyway (my motivation for the original question), so our corpus at the : moment does not really need a thesaurus! :-) glad to hear it -- just didn't want you to think that something wasn't available just because you couldn't find a feature with a specific name -- what you get "out of the box" can be used in a lot of interesting ways if you think "out of hte box". -Hoss