Don’t remove stopwords. That was a useful hack when we were running search 
engines on 16-bit machines. These days, it causes more problems than it solves.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 22, 2019, at 8:14 PM, akash jayaweera <akash.jayawe...@gmail.com> 
> wrote:
> 
> Hello All,
> I'm trying to identify stopwords for a non-English corpus using TF-IDF
> score. I calculated the score for each unique term in the corpus. But my
> question is how can I select stopwords using the score.
> For example if we have a corpus of football, term "football" get the lowest
> TF-IDF score. But for my requirement I don't want to identify "football" as
> a stopword.
> How can I clearly Identify stopword. Is there any other simple method to
> identify stopwords than TF-IDF score.
> 
> Regards,
> *Akash Jayaweera.*
> 
> 
> E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com>
> M + 94 77 2472635 <+94%2077%20247%202635>

Reply via email to