Hi Peter, Thats way to clever for me :-) Discovering thesuarus relationships would be fantastic but its not clear what heuristics you would need to use to discover broader, narrower, related documents etc. Although I might be doing the clustering down i'm sceptical about the accuracy.
cheers Lee c On 10 December 2010 09:38, Peter Sturge <peter.stu...@gmail.com> wrote: > Hi Lee, > > Perhaps Solr's clustering component might be helpful for your use case? > http://wiki.apache.org/solr/ClusteringComponent > > > > > On Fri, Dec 10, 2010 at 9:17 AM, lee carroll > <lee.a.carr...@googlemail.com> wrote: > > Hi Chris, > > > > Its all a bit early in the morning for this mined :-) > > > > The question asked, in good faith, was does solr support or extend to > > implementing a thesaurus. It looks like it does not which is fine. It > does > > support synonyms and synonym rings which is again fine. The ski example > was > > an illustration in response to a follow up question for more explanation > on > > what a thesaurus is. > > > > An attempt at an answer of why a thesaurus; is below. > > > > Use case 1: improve facets > > > > Motivation > > Unstructured lists of labels in facets offer very poor user experience. > > Similar to tag clouds users find them arbitrary, with out focus and often > > overwhelming. Labels in facets which are grouped in meaningful ways > relevant > > to the user increase engagement, perceived relevance and user > satisfaction. > > > > Solution > > A thesaurus of term relationships could be used to group facet labels > > > > Implementation > > (er completely out of my depth at this point) > > Thesaurus relationships defined in a simple text file > > term, bt=>term,term nt=> term, term rt=>term, term, pt=>term > > if a search specifies a facet to be returned the field terms are > identified > > by reading the thesaurus into groups, broader terms, narrower terms, > related > > terms etc > > These groups are returned as part of the response for the UI to display > > faceted labels as broader, narrower, related terms etc > > > > Use case 2: Increase synonym search precision > > > > Motivation > > Synonyms rings do not allow differences in synonym to be identified. > Rarely > > are synonyms exactly equivalent. This leads to a decrease in search > > precision. > > > > Solution > > Boost queries based on search term thesaurus relationships > > > > Implementation > > (again completely out of depth here) > > Allow terms in the index to be identified as bt , nt, .. terms of the > search > > term. Allow query parser to boost terms differentially based on these > > thesaurus relationships > > > > > > > > As for the x and y stuff I'm not sure, like i say its quite early in the > > morning for me. I'm sure their may well be a different way of achieving > the > > above (but note it is more than a hierarchy). However the librarians have > > been doing this for 50 years now . > > > > Again though just to repeat this is hardly a killer for us. We've looked > at > > solr for a project; created a proto type; generated tons of questions, > had > > them answered in the main by the docs, some on this list and been amazed > at > > the fantastic results solr has given us. In fact with a combination of > > keepwords and synonyms we have got a pretty nice simple set of facet > labels > > anyway (my motivation for the original question), so our corpus at the > > moment does not really need a thesaurus! :-) > > > > Thanks Lee > > > > > > On 9 December 2010 23:38, Chris Hostetter <hossman_luc...@fucit.org> > wrote: > > > >> > >> > >> : a term can have a Prefered Term (PT), many Broader Terms (BT), Many > >> Narrower > >> : Terms (NT) Related Terms (RT) etc > >> ... > >> : User supplied Term is say : Ski > >> : > >> : Prefered term: Skiing > >> : Broader terms could be : Ski and Snow Boarding, Mountain Sports, > Sports > >> : Narrower terms: down hill skiing, telemark, cross country > >> : Related terms: boarding, snow boarding, winter holidays > >> > >> I'm still lost. > >> > >> You've described a black box with some sample input ("Ski") and some > >> corrisponding sample output (PT=..., BT=..., NT=..., RT=....) -- but you > >> haven't explained what you want to do with tht black box. Assuming such > a > >> black box existed in solr what are you expecting/hoping to do with it? > >> how would such a black box modify solr's user experience? what is your > >> goal? > >> > >> Smells like an XY Problem... > >> http://people.apache.org/~hossman/#xyproblem<http://people.apache.org/%7Ehossman/#xyproblem> > <http://people.apache.org/%7Ehossman/#xyproblem> > >> > >> Your question appears to be an "XY Problem" ... that is: you are dealing > >> with "X", you are assuming "Y" will help you, and you are asking about > "Y" > >> without giving more details about the "X" so that we can understand the > >> full issue. Perhaps the best solution doesn't involve "Y" at all? > >> See Also: http://www.perlmonks.org/index.pl?node_id=542341 > >> > >> > >> -Hoss > >> > > >