On Jan 10, 2011, at 12:42 PM, lee carroll wrote: > Hi > > I'm indexing a set of documents which have a conversational writing style. > In particular the authors are very fond > of listing facts in a variety of ways (this is to keep a human reader > interested) but its causing my index trouble. > > For example instead of listing facts like: the house is white, the castle is > pretty. > > We get the house is the complete opposite of black and the castle is not > ugly. > > What are the best approaches to resolve these sorts of issues. Even if its > just handling "not" correctly would be a good start >
Hmm, good problem. I guess I'd start by stepping back and ask what is the problem you are trying to solve? You've stated, I think, one half of the problem, namely that your authors have a conversational style, but you haven't stated what your users are expecting to do with this information? Is this a pure search app? Is it something else that is just backed by Solr but the user would never do a search? Do you have a relevance problem? Also, what is your notion of handling "not" correctly? In other words, more details are welcome! -Grant -------------------------- Grant Ingersoll http://www.lucidimagination.com