Re: Identify stopwords using TF-IDF

2019-06-22 Thread Walter Underwood
I haven’t removed stopwords since 1996, when I joined Infoseek. What is your special case where you must remove them? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2019, at 9:51 PM, akash jayaweera > wrote: > > Hello Walter, > > Thank y

Re: Identify stopwords using TF-IDF

2019-06-22 Thread akash jayaweera
Hello Walter, Thank you for the reply. But for some of my use-case I need to identify stopword. So I need a better way to identify domain specific stopwords. I used TF-IDF to identify stopwords. But it has the issue I mentioned above. Regards, *Akash Jayaweera.* E akash.jayawe...@gmail.com M +

Re: Identify stopwords using TF-IDF

2019-06-22 Thread Walter Underwood
Don’t remove stopwords. That was a useful hack when we were running search engines on 16-bit machines. These days, it causes more problems than it solves. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2019, at 8:14 PM, akash jayaweera > w

Identify stopwords using TF-IDF

2019-06-22 Thread akash jayaweera
Hello All, I'm trying to identify stopwords for a non-English corpus using TF-IDF score. I calculated the score for each unique term in the corpus. But my question is how can I select stopwords using the score. For example if we have a corpus of football, term "football" get the lowest TF-IDF score