Don’t remove stopwords. That was a useful hack when we were running search engines on 16-bit machines. These days, it causes more problems than it solves.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2019, at 8:14 PM, akash jayaweera <akash.jayawe...@gmail.com> > wrote: > > Hello All, > I'm trying to identify stopwords for a non-English corpus using TF-IDF > score. I calculated the score for each unique term in the corpus. But my > question is how can I select stopwords using the score. > For example if we have a corpus of football, term "football" get the lowest > TF-IDF score. But for my requirement I don't want to identify "football" as > a stopword. > How can I clearly Identify stopword. Is there any other simple method to > identify stopwords than TF-IDF score. > > Regards, > *Akash Jayaweera.* > > > E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com> > M + 94 77 2472635 <+94%2077%20247%202635>