Hi Lee,

according to my vision the user could decide which relationship types
would he likes to attach to his search, and the application would call
his attention to other possibilities. So there would be no heuristic
method applied, because e.g. boarder terms would cause lots of
misleading results.

Péter

2010/12/10 lee carroll <lee.a.carr...@googlemail.com>:
> Hi Peter,
>
> Thats way to clever for me :-)
> Discovering thesuarus relationships would be fantastic but its not clear
> what heuristics you would need to use to discover broader, narrower, related
> documents etc. Although I might be doing the clustering down i'm sceptical
> about the accuracy.
>
> cheers Lee c
>
> On 10 December 2010 09:38, Peter Sturge <peter.stu...@gmail.com> wrote:
>
>> Hi Lee,
>>
>> Perhaps Solr's clustering component might be helpful for your use case?
>> http://wiki.apache.org/solr/ClusteringComponent
>>
>>
>>
>>
>> On Fri, Dec 10, 2010 at 9:17 AM, lee carroll
>> <lee.a.carr...@googlemail.com> wrote:
>> > Hi Chris,
>> >
>> > Its all a bit early in the morning for this mined :-)
>> >
>> > The question asked, in good faith, was does solr support or extend to
>> > implementing a thesaurus. It looks like it does not which is fine. It
>> does
>> > support synonyms and synonym rings which is again fine. The ski example
>> was
>> > an illustration in response to a follow up question for more explanation
>> on
>> > what a thesaurus is.
>> >
>> > An attempt at an answer of why a thesaurus; is below.
>> >
>> > Use case 1: improve facets
>> >
>> > Motivation
>> > Unstructured lists of labels in facets offer very poor user experience.
>> > Similar to tag clouds users find them arbitrary, with out focus and often
>> > overwhelming. Labels in facets which are grouped in meaningful ways
>> relevant
>> > to the user increase engagement, perceived relevance and user
>> satisfaction.
>> >
>> > Solution
>> > A thesaurus of term relationships could be used to group facet labels
>> >
>> > Implementation
>> > (er completely out of my depth at this point)
>> > Thesaurus relationships defined in a simple text file
>> > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term
>> > if a search specifies a facet to be returned the field terms are
>> identified
>> > by reading the thesaurus into groups, broader terms, narrower terms,
>> related
>> > terms etc
>> > These groups are returned as part of the response for the UI to display
>> > faceted labels as broader, narrower, related terms etc
>> >
>> > Use case 2: Increase synonym search precision
>> >
>> > Motivation
>> > Synonyms rings do not allow differences in synonym to be identified.
>> Rarely
>> > are synonyms exactly equivalent. This leads to a decrease in search
>> > precision.
>> >
>> > Solution
>> > Boost queries based on search term thesaurus relationships
>> >
>> > Implementation
>> > (again completely  out of depth here)
>> > Allow terms in the index to be identified as bt , nt, .. terms of the
>> search
>> > term. Allow query parser to boost terms differentially based on these
>> > thesaurus relationships
>> >
>> >
>> >
>> > As for the x and y stuff I'm not sure, like i say its quite early in the
>> > morning for me. I'm sure their may well be a different way of achieving
>> the
>> > above (but note it is more than a hierarchy). However the librarians have
>> > been doing this for 50 years now .
>> >
>> > Again though just to repeat this is hardly a killer for us. We've looked
>> at
>> > solr for a project; created a proto type; generated tons of questions,
>> had
>> > them answered in the main by the docs, some on this list and been amazed
>> at
>> > the fantastic results solr has given us. In fact with a combination of
>> > keepwords and synonyms we have got a pretty nice simple set of facet
>> labels
>> > anyway (my motivation for the original question), so our corpus at the
>> > moment does not really need a thesaurus! :-)
>> >
>> > Thanks Lee
>> >
>> >
>> > On 9 December 2010 23:38, Chris Hostetter <hossman_luc...@fucit.org>
>> wrote:
>> >
>> >>
>> >>
>> >> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many
>> >> Narrower
>> >> : Terms (NT) Related Terms (RT) etc
>> >>         ...
>> >> : User supplied Term is say : Ski
>> >> :
>> >> : Prefered term: Skiing
>> >> : Broader terms could be : Ski and Snow Boarding, Mountain Sports,
>> Sports
>> >> : Narrower terms: down hill skiing, telemark, cross country
>> >> : Related terms: boarding, snow boarding, winter holidays
>> >>
>> >> I'm still lost.
>> >>
>> >> You've described a black box with some sample input ("Ski") and some
>> >> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but you
>> >> haven't explained what you want to do with tht black box.  Assuming such
>> a
>> >> black box existed in solr what are you expecting/hoping to do with it?
>> >> how would such a black box modify solr's user experience?  what is your
>> >> goal?
>> >>
>> >> Smells like an XY Problem...
>> >> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem>
>> <http://people.apache.org/%7Ehossman/#xyproblem>
>> >>
>> >> Your question appears to be an "XY Problem" ... that is: you are dealing
>> >> with "X", you are assuming "Y" will help you, and you are asking about
>> "Y"
>> >> without giving more details about the "X" so that we can understand the
>> >> full issue.  Perhaps the best solution doesn't involve "Y" at all?
>> >> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>> >>
>> >>
>> >> -Hoss
>> >>
>> >
>>
>

Reply via email to