Re: [R] TM reader with text

2012-03-04 Thread Mickael R problem
"Try this before running removePuncutation(): corpus <- tm_map(corpus, function(x) gsub("[\'\U2019]«»", " ", x))" It will replace quotation marks with a space, and that's enough to separate them from the rest of the word. I try to use your solution. It's work only for characters, not for a Corpus,

Re: [R] TM reader with text

2012-03-04 Thread Milan Bouchet-Valat
Le samedi 03 mars 2012 à 16:56 -0800, Mickael R problem a écrit : > Hello everybody, > I don't give up the fight, but it's hard. I have finded a solution for the > ligature with a best converter wich tranlated more precisely PDF to plain > text. But a new problem has occured. In french particulary,

Re: [R] TM reader with text

2012-03-03 Thread Mickael R problem
Hello everybody, I don't give up the fight, but it's hard. I have finded a solution for the ligature with a best converter wich tranlated more precisely PDF to plain text. But a new problem has occured. In french particulary, but it should be the case in english too, I have a big problem ' " bracke

Re: [R] TM reader with text

2012-03-01 Thread Milan Bouchet-Valat
Le jeudi 01 mars 2012 à 07:07 -0800, Mickael R problem a écrit : > Hi Richard, > clearly there is a problem with latin ligature because the word resulting > from my ask with findFreqTerms give me some words > "n" > > "nancement" > >> "nancier" "nancière""nancières" >

Re: [R] TM reader with text

2012-03-01 Thread Mickael R problem
Hi Richard, clearly there is a problem with latin ligature because the word resulting from my ask with findFreqTerms give me some words > "n" "nancement" >> "nancier" "nancière""nancières" >> "nanciers""xe" where U+FB01 is a code for latin ligature. The problem

Re: [R] TM reader with text

2012-02-29 Thread Mickael R problem
my computer run under windows vista 64 sp2. The question about encoding, I don't understand it, sorry ? -- View this message in context: http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433526.html Sent from the R help mailing list archive at Nabble.com.

Re: [R] TM reader with text

2012-02-29 Thread Richard M. Heiberger
Most, maybe all, of the example words you posted include ligatures, With "financier" for example, the leading "fi" is rendered in PDF and in most typesetting situations as a ligature with the a single complex character representing the "fi' combination. fi fl I pasted the "fi" and "fl" ligature

Re: [R] TM reader with text

2012-02-29 Thread David Winsemius
On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote: Hello everybody, I work, I try, with TM but I have a problem with some special words in french. I think this is due to the manner to transform PDF to text, but I'm not perfectly sure. Let's see to the example : findFreqTerms(tdm1,30)