Thanks for the reply.

Currently I have 20GB Bengali newspaper data ( for corpus building )
I don't have manual stemmed corpus but if needed I will build one.

Basically I need guidance regarding how to do this.
If there are some standard approaches of building stemmer and stopword for
use with solr then please
share it .

Thank you Upayavira for your kind help.

Imtiaz Shakil Siddique


On 10 September 2015 at 13:23, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Thu, Sep 10, 2015, at 04:45 AM, Imtiaz Shakil Siddique wrote:
> > Hi,
> >
> > I am trying to develop stemmer and stopword for Bengaly language which is
> > not shipped with solr.
> >
> > I am trying to make this with machine learning approach but I couldn't
> > find
> > any good documents to study. It would be very helpful if you could shed
> > some lights into this matter.
>
> How are you going to do this with machine learning? What corpus are you
> going to use to learn from? Do you have some documents that have been
> manually stemmed for which you also have the originals?
>
> Upayavira
>

Reply via email to