Just to be more explicit in terms of using synonyms. Our thinking was something like:
1 analyse texts for patterns such as not x and list these out 2 in a synonyms txt file list in effect antonyms eg not pretty -> Ugly not ugly -> pretty not lively -> quiet not very nice -> Ugly etc 3 use a synonym filter referencing the antoymns at index time only. however the language in the text is probably more complex than the above simple phrases and nlp seems to promise a lot :-) should we venture down that route instead? cheers lee c On 10 January 2011 22:04, lee carroll <lee.a.carr...@googlemail.com> wrote: > Hi Grant, > > Its a search relevancy problem. For example: > > a document about london reads like > > London is not very good for a peaceful break. > > we analyse this at the (i can't remember the technical term) is it lexical > level? (bloody hell i think you may have wrote the book !) anyway which > produces tokens in our index of say > > "London good peaceful holiday" > > users search for cities which would be nice for them to take a holiday in > say the search is > "good for a peaceful break" > > and bang london is top. talk about a relevancy problem :-) > > now i was thinking of using phrase matches in the synonyms file but is that > the best approach or could nlp help here? > > cheers lee > > > > > > On 10 January 2011 18:21, Grant Ingersoll <gsing...@apache.org> wrote: > >> >> On Jan 10, 2011, at 12:42 PM, lee carroll wrote: >> >> > Hi >> > >> > I'm indexing a set of documents which have a conversational writing >> style. >> > In particular the authors are very fond >> > of listing facts in a variety of ways (this is to keep a human reader >> > interested) but its causing my index trouble. >> > >> > For example instead of listing facts like: the house is white, the >> castle is >> > pretty. >> > >> > We get the house is the complete opposite of black and the castle is not >> > ugly. >> > >> > What are the best approaches to resolve these sorts of issues. Even if >> its >> > just handling "not" correctly would be a good start >> > >> >> Hmm, good problem. I guess I'd start by stepping back and ask what is the >> problem you are trying to solve? You've stated, I think, one half of the >> problem, namely that your authors have a conversational style, but you >> haven't stated what your users are expecting to do with this information? >> Is this a pure search app? Is it something else that is just backed by >> Solr but the user would never do a search? >> >> Do you have a relevance problem? Also, what is your notion of handling >> "not" correctly? In other words, more details are welcome! >> >> -Grant >> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com >> >> >