Hello Walter,

Thank you for the reply.
But for some of my use-case I need to identify stopword. So I need a better
way to identify domain specific stopwords. I used TF-IDF to identify
stopwords. But it has the issue I mentioned above.

Regards,
*Akash Jayaweera.*


E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com>
M + 94 77 2472635 <+94%2077%20247%202635>


On Sun, Jun 23, 2019 at 10:13 AM Walter Underwood <wun...@wunderwood.org>
wrote:

> Don’t remove stopwords. That was a useful hack when we were running search
> engines on 16-bit machines. These days, it causes more problems than it
> solves.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Jun 22, 2019, at 8:14 PM, akash jayaweera <akash.jayawe...@gmail.com>
> wrote:
> >
> > Hello All,
> > I'm trying to identify stopwords for a non-English corpus using TF-IDF
> > score. I calculated the score for each unique term in the corpus. But my
> > question is how can I select stopwords using the score.
> > For example if we have a corpus of football, term "football" get the
> lowest
> > TF-IDF score. But for my requirement I don't want to identify "football"
> as
> > a stopword.
> > How can I clearly Identify stopword. Is there any other simple method to
> > identify stopwords than TF-IDF score.
> >
> > Regards,
> > *Akash Jayaweera.*
> >
> >
> > E akash.jayawe...@gmail.com <akash.jayawe...@gmail.com>
> > M + 94 77 2472635 <+94%2077%20247%202635>
>
>

Reply via email to