Hello Walter, Thank you for the reply. But for some of my use-case I need to identify stopword. So I need a better way to identify domain specific stopwords. I used TF-IDF to identify stopwords. But it has the issue I mentioned above.
Regards, *Akash Jayaweera.* E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com> M + 94 77 2472635 <+94%2077%20247%202635> On Sun, Jun 23, 2019 at 10:13 AM Walter Underwood <wun...@wunderwood.org> wrote: > Don’t remove stopwords. That was a useful hack when we were running search > engines on 16-bit machines. These days, it causes more problems than it > solves. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Jun 22, 2019, at 8:14 PM, akash jayaweera <akash.jayawe...@gmail.com> > wrote: > > > > Hello All, > > I'm trying to identify stopwords for a non-English corpus using TF-IDF > > score. I calculated the score for each unique term in the corpus. But my > > question is how can I select stopwords using the score. > > For example if we have a corpus of football, term "football" get the > lowest > > TF-IDF score. But for my requirement I don't want to identify "football" > as > > a stopword. > > How can I clearly Identify stopword. Is there any other simple method to > > identify stopwords than TF-IDF score. > > > > Regards, > > *Akash Jayaweera.* > > > > > > E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com> > > M + 94 77 2472635 <+94%2077%20247%202635> > >