Thanks for the reply. Currently I have 20GB Bengali newspaper data ( for corpus building ) I don't have manual stemmed corpus but if needed I will build one.
Basically I need guidance regarding how to do this. If there are some standard approaches of building stemmer and stopword for use with solr then please share it . Thank you Upayavira for your kind help. Imtiaz Shakil Siddique On 10 September 2015 at 13:23, Upayavira <u...@odoko.co.uk> wrote: > > > On Thu, Sep 10, 2015, at 04:45 AM, Imtiaz Shakil Siddique wrote: > > Hi, > > > > I am trying to develop stemmer and stopword for Bengaly language which is > > not shipped with solr. > > > > I am trying to make this with machine learning approach but I couldn't > > find > > any good documents to study. It would be very helpful if you could shed > > some lights into this matter. > > How are you going to do this with machine learning? What corpus are you > going to use to learn from? Do you have some documents that have been > manually stemmed for which you also have the originals? > > Upayavira >