Den 14 maj 2012 12:44 skrev "C. Boemann" <c...@boemann.dk>: > > On Monday 14 May 2012 12:29:01 matus.u...@gmail.com wrote: > > Hi, > > > > I don't think that a grammar checker based entirely on a Bayes > > classifier is logically sound. > > > > Simplified: > > > > In order to detect textual spam, the Bayes classifier is first trained > > on examples of spam (training set). > > The classifier quality depends on the training set being > > representative enough, the textual data representation (input to the > > classifier) > > and parameters of the training algm. The trained classifier is then a > > set S of (mean value, variance) pairs in input space which represent > > known spam. > > If a previously unknown input falls into the variance range of any of > > the members of S, then it's labeled as spam. > > > > A grammar checker should have the language grammar represented > > exactly, by a formal grammar usually. Again a feasible representation > > of the textual data is required. Then you check if a sentence can be > > generated by the formal grammar. The answer is in {yes, not}. > > > > Lightproof seems to be rule based. And rule based systems have strong > > maintainability drawbacks. > > > > A combination of a rule based system with Bayes sounds promising. That > > would enable something like context based grammar checking. > > > > br, > > > > -matus uzak > The trouble with rules is that it's hard to codify a language grammar in a way > that wont give false warnings. Also as you said it is hard to maintain, not to > say it requires manual work to define new languages. > > Bayes may not be the best match, but something that is adaptive and can learn > by giving it a corpus, sounds very promising to me. > > From Elvis Stansvik I got the following link which I've passed on to garima > already: > > http://doras.dcu.ie/16776/1/jw_binder_2012-01-10.pdf > > Now this may be what link grammar does already. I just read back and the link > grammar page on the Abisource site does mention tree-bank and statistical > which is what the paper talks about too. There may be differences, but not sure > it's worth it to make us do something on our own. > > So I've maybe changed my mind again and would favour the link grammar > > Right now I'm just waiting for garima to reply with some analysis. He was > going to read the paper.
That's quite ambitious of him; it's not just a paper but a 200+ page dissertation :) > > Boemann > _______________________________________________ > calligra-devel mailing list > calligra-devel@kde.org > https://mail.kde.org/mailman/listinfo/calligra-devel
_______________________________________________ calligra-devel mailing list calligra-devel@kde.org https://mail.kde.org/mailman/listinfo/calligra-devel