Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example :
findFreqTerms(tdm1,30) [33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement" "<U+FB01>nancier" "<U+FB01>nancière" "<U+FB01>nancières" "<U+FB01>nanciers" "<U+FB01>xe" Some french words are not well reading by TM with the reader readPlain. I try to use reader= reader PDF. But it doesn't work so I must transformed PDF text to text. And some words are not understand so when I use TermDocumentMatrix a word like inflation diseappear. It's a big probleme for me. I spend lot of time on this problem, any idea ? Thank's for you time. Best regard"s Mickaël -- View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433394.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.