Just to be more explicit in terms of using synonyms. Our thinking was
something like:

1 analyse texts for patterns such as not x and list these out
2 in a synonyms txt file list in effect antonyms eg
      not pretty -> Ugly
      not ugly -> pretty
      not lively -> quiet
      not very nice -> Ugly
      etc
3 use a synonym filter referencing the antoymns at index time only.

however the language in the text is probably more complex than the above
simple phrases and nlp seems to promise a lot :-) should we venture down
that route instead?

cheers lee c


On 10 January 2011 22:04, lee carroll <lee.a.carr...@googlemail.com> wrote:

> Hi Grant,
>
> Its a search relevancy problem. For example:
>
> a document about london reads like
>
> London is not very good for a peaceful break.
>
> we analyse this at the (i can't remember the technical term) is it lexical
> level? (bloody hell i think you may have wrote the book !) anyway which
> produces tokens in our index of say
>
> "London good peaceful holiday"
>
> users search for cities which would be nice for them to take a holiday in
> say the search is
> "good for a peaceful break"
>
> and bang london is top. talk about a relevancy problem :-)
>
> now i was thinking of using phrase matches in the synonyms file but is that
> the best approach or could nlp help here?
>
> cheers lee
>
>
>
>
>
> On 10 January 2011 18:21, Grant Ingersoll <gsing...@apache.org> wrote:
>
>>
>> On Jan 10, 2011, at 12:42 PM, lee carroll wrote:
>>
>> > Hi
>> >
>> > I'm indexing a set of documents which have a conversational writing
>> style.
>> > In particular the authors are very fond
>> > of listing facts in a variety of ways (this is to keep a human reader
>> > interested) but its causing my index trouble.
>> >
>> > For example instead of listing facts like: the house is white, the
>> castle is
>> > pretty.
>> >
>> > We get the house is the complete opposite of black and the castle is not
>> > ugly.
>> >
>> > What are the best approaches to resolve these sorts of issues. Even if
>> its
>> > just handling "not" correctly would be a good start
>> >
>>
>> Hmm, good problem.  I guess I'd start by stepping back and ask what is the
>> problem you are trying to solve?  You've stated, I think, one half of the
>> problem, namely that your authors have a conversational style, but you
>> haven't stated what your users are expecting to do with this information?
>>  Is this a pure search app?  Is it something else that is just backed by
>> Solr but the user would never do a search?
>>
>> Do you have a relevance problem?  Also, what is your notion of handling
>> "not" correctly?  In other words, more details are welcome!
>>
>> -Grant
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com
>>
>>
>

Reply via email to