Re: SOLR Thesaurus

Chris Hostetter Fri, 10 Dec 2010 15:07:26 -0800

: The question asked, in good faith, was does solr support or extend to
: implementing a thesaurus. It looks like it does not which is fine. It does


Well, my point was that "thesaurus" is not a feature description.  it's a 
data structure, and depending on your goals, the existing SynonymFilter 
may be perfectly usable out of the box.

: Use case 1: improve facets
: 
: Motivation
: Unstructured lists of labels in facets offer very poor user experience.
: Similar to tag clouds users find them arbitrary, with out focus and often
: overwhelming. Labels in facets which are grouped in meaningful ways relevant
: to the user increase engagement, perceived relevance and user satisfaction.

SynonymFilter could definitley be used to help in this situation -- if you 
create a synonyms.txt file mapping all of the terms in your thesaurus to 
your Prefered Term you could then use SynonymFilter at index time to get a 
clean list of facet constraints. (if you wnat a simple list of only the 
Prefered Terms)

Alternately...

: Solution
: A thesaurus of term relationships could be used to group facet labels
: 
: Implementation
: (er completely out of my depth at this point)
: Thesaurus relationships defined in a simple text file
: term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
: if a search specifies a facet to be returned the field terms are identified
: by reading the thesaurus into groups, broader terms, narrower terms, related
: terms etc
: These groups are returned as part of the response for the UI to display
: faceted labels as broader, narrower, related terms etc

...what you're describing is a hierarchical faceting model.  with a 
properly structured synonyms.txt used at indexing time and the 
"hierarchy" trick i describe on slide #32-25 of this presentation...

http://people.apache.org/~hossman/apachecon2010/facets/

...that should also be posisble.


: Implementation
: (again completely  out of depth here)
: Allow terms in the index to be identified as bt , nt, .. terms of the search
: term. Allow query parser to boost terms differentially based on these
: thesaurus relationships

see my earlier reply to Péter Király, what you are describing is only 
slightly more complicated then what i describe there ... this is 
definitely something that would require a custom QParser, but the heavy 
lifting could still be done by SynonymFilter (in the case you describe, 
you'd just need to split your thesarus up into distinct mapping files for 
BT, NT, etc.. and then have one SynonymFilter for each, and apply the 
appropriate boost to the queries generated from them.

: Again though just to repeat this is hardly a killer for us. We've looked at
: solr for a project; created a proto type; generated tons of questions, had
: them answered in the main by the docs, some on this list and been amazed at
: the fantastic results solr has given us. In fact with a combination of
: keepwords and synonyms we have got a pretty nice simple set of facet labels
: anyway (my motivation for the original question), so our corpus at the
: moment does not really need a thesaurus! :-)

glad to hear it -- just didn't want you to think that something wasn't 
available just because you couldn't find a feature with a specific name -- 
what you get "out of the box" can be used in a lot of interesting ways if 
you think "out of hte box".

-Hoss

Re: SOLR Thesaurus

Reply via email to