Re: [R] TM reader with text

2012-03-04 Thread Mickael R problem
"Try this before running removePuncutation(): corpus <- tm_map(corpus, function(x) gsub("[\'\U2019]«»", " ", x))" It will replace quotation marks with a space, and that's enough to separate them from the rest of the word. I try to use your solution. It's work only for characters, not for a Corpus,

Re: [R] TM reader with text

2012-03-03 Thread Mickael R problem
Hello everybody, I don't give up the fight, but it's hard. I have finded a solution for the ligature with a best converter wich tranlated more precisely PDF to plain text. But a new problem has occured. In french particulary, but it should be the case in english too, I have a big problem ' " bracke

Re: [R] TM reader with text

2012-03-01 Thread Mickael R problem
Hi Richard, clearly there is a problem with latin ligature because the word resulting from my ask with findFreqTerms give me some words > "n" "nancement" >> "nancier" "nancière""nancières" >> "nanciers""xe" where U+FB01 is a code for latin ligature. The problem

Re: [R] TM reader with text

2012-02-29 Thread Mickael R problem
my computer run under windows vista 64 sp2. The question about encoding, I don't understand it, sorry ? -- View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433526.html Sent from the R help mailing list archive at Nabble.com.

[R] TM reader with text

2012-02-29 Thread Mickael R problem
Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30) [33] """n" "nancement" "nancier"