On Wed, Mar 17, 2010 at 11:48 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> Yes and no. Putting our historian hat on, stop words were often seen as > contributing very little to scores and also taking up a lot of room on disk > back in the days when disk was very precious. Times, as they say, have > changed. Disk is cheap, so that is no longer a concern. > Yes, and the take-away from the Dolamic and Savoy paper is that, performance-aside, removing stopwords is still a necessary evil for good relevance, at least for some languages. Ideally we wouldn't have to remove information to have good relevance, and a good step forward would be to support relevance-ranking algorithms such as the BM25* mentioned in the paper, that provide good relevance without the need to remove stopwords. For now, at least the CommonGrams solution is available in Solr that provides an alternative which can address both concerns (performance and relevance) to some degree. -- Robert Muir rcm...@gmail.com