Hi Chris,

Its all a bit early in the morning for this mined :-)

The question asked, in good faith, was does solr support or extend to
implementing a thesaurus. It looks like it does not which is fine. It does
support synonyms and synonym rings which is again fine. The ski example was
an illustration in response to a follow up question for more explanation on
what a thesaurus is.

An attempt at an answer of why a thesaurus; is below.

Use case 1: improve facets

Motivation
Unstructured lists of labels in facets offer very poor user experience.
Similar to tag clouds users find them arbitrary, with out focus and often
overwhelming. Labels in facets which are grouped in meaningful ways relevant
to the user increase engagement, perceived relevance and user satisfaction.

Solution
A thesaurus of term relationships could be used to group facet labels

Implementation
(er completely out of my depth at this point)
Thesaurus relationships defined in a simple text file
term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
if a search specifies a facet to be returned the field terms are identified
by reading the thesaurus into groups, broader terms, narrower terms, related
terms etc
These groups are returned as part of the response for the UI to display
faceted labels as broader, narrower, related terms etc

Use case 2: Increase synonym search precision

Motivation
Synonyms rings do not allow differences in synonym to be identified. Rarely
are synonyms exactly equivalent. This leads to a decrease in search
precision.

Solution
Boost queries based on search term thesaurus relationships

Implementation
(again completely  out of depth here)
Allow terms in the index to be identified as bt , nt, .. terms of the search
term. Allow query parser to boost terms differentially based on these
thesaurus relationships



As for the x and y stuff I'm not sure, like i say its quite early in the
morning for me. I'm sure their may well be a different way of achieving the
above (but note it is more than a hierarchy). However the librarians have
been doing this for 50 years now .

Again though just to repeat this is hardly a killer for us. We've looked at
solr for a project; created a proto type; generated tons of questions, had
them answered in the main by the docs, some on this list and been amazed at
the fantastic results solr has given us. In fact with a combination of
keepwords and synonyms we have got a pretty nice simple set of facet labels
anyway (my motivation for the original question), so our corpus at the
moment does not really need a thesaurus! :-)

Thanks Lee


On 9 December 2010 23:38, Chris Hostetter <hossman_luc...@fucit.org> wrote:

>
>
> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
> Narrower
> : Terms (NT) Related Terms (RT) etc
>         ...
> : User supplied Term is say : Ski
> :
> : Prefered term: Skiing
> : Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports
> : Narrower terms: down hill skiing, telemark, cross country
> : Related terms: boarding, snow boarding, winter holidays
>
> I'm still lost.
>
> You've described a black box with some sample input ("Ski") and some
> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but you
> haven't explained what you want to do with tht black box.  Assuming such a
> black box existed in solr what are you expecting/hoping to do with it?
> how would such a black box modify solr's user experience?  what is your
> goal?
>
> Smells like an XY Problem...
> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem>
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>

Reply via email to