Hi Lee,

Perhaps Solr's clustering component might be helpful for your use case?
http://wiki.apache.org/solr/ClusteringComponent




On Fri, Dec 10, 2010 at 9:17 AM, lee carroll
<lee.a.carr...@googlemail.com> wrote:
> Hi Chris,
>
> Its all a bit early in the morning for this mined :-)
>
> The question asked, in good faith, was does solr support or extend to
> implementing a thesaurus. It looks like it does not which is fine. It does
> support synonyms and synonym rings which is again fine. The ski example was
> an illustration in response to a follow up question for more explanation on
> what a thesaurus is.
>
> An attempt at an answer of why a thesaurus; is below.
>
> Use case 1: improve facets
>
> Motivation
> Unstructured lists of labels in facets offer very poor user experience.
> Similar to tag clouds users find them arbitrary, with out focus and often
> overwhelming. Labels in facets which are grouped in meaningful ways relevant
> to the user increase engagement, perceived relevance and user satisfaction.
>
> Solution
> A thesaurus of term relationships could be used to group facet labels
>
> Implementation
> (er completely out of my depth at this point)
> Thesaurus relationships defined in a simple text file
> term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
> if a search specifies a facet to be returned the field terms are identified
> by reading the thesaurus into groups, broader terms, narrower terms, related
> terms etc
> These groups are returned as part of the response for the UI to display
> faceted labels as broader, narrower, related terms etc
>
> Use case 2: Increase synonym search precision
>
> Motivation
> Synonyms rings do not allow differences in synonym to be identified. Rarely
> are synonyms exactly equivalent. This leads to a decrease in search
> precision.
>
> Solution
> Boost queries based on search term thesaurus relationships
>
> Implementation
> (again completely  out of depth here)
> Allow terms in the index to be identified as bt , nt, .. terms of the search
> term. Allow query parser to boost terms differentially based on these
> thesaurus relationships
>
>
>
> As for the x and y stuff I'm not sure, like i say its quite early in the
> morning for me. I'm sure their may well be a different way of achieving the
> above (but note it is more than a hierarchy). However the librarians have
> been doing this for 50 years now .
>
> Again though just to repeat this is hardly a killer for us. We've looked at
> solr for a project; created a proto type; generated tons of questions, had
> them answered in the main by the docs, some on this list and been amazed at
> the fantastic results solr has given us. In fact with a combination of
> keepwords and synonyms we have got a pretty nice simple set of facet labels
> anyway (my motivation for the original question), so our corpus at the
> moment does not really need a thesaurus! :-)
>
> Thanks Lee
>
>
> On 9 December 2010 23:38, Chris Hostetter <hossman_luc...@fucit.org> wrote:
>
>>
>>
>> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
>> Narrower
>> : Terms (NT) Related Terms (RT) etc
>>         ...
>> : User supplied Term is say : Ski
>> :
>> : Prefered term: Skiing
>> : Broader terms could be : Ski and Snow Boarding, Mountain Sports, Sports
>> : Narrower terms: down hill skiing, telemark, cross country
>> : Related terms: boarding, snow boarding, winter holidays
>>
>> I'm still lost.
>>
>> You've described a black box with some sample input ("Ski") and some
>> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but you
>> haven't explained what you want to do with tht black box.  Assuming such a
>> black box existed in solr what are you expecting/hoping to do with it?
>> how would such a black box modify solr's user experience?  what is your
>> goal?
>>
>> Smells like an XY Problem...
>> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem>
>>
>> Your question appears to be an "XY Problem" ... that is: you are dealing
>> with "X", you are assuming "Y" will help you, and you are asking about "Y"
>> without giving more details about the "X" so that we can understand the
>> full issue.  Perhaps the best solution doesn't involve "Y" at all?
>> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>>
>>
>> -Hoss
>>
>

Reply via email to