Most, maybe all, of the example words you posted include ligatures, With "financier" for example, the leading "fi" is rendered in PDF and in most typesetting situations as a ligature with the a single complex character representing the "fi' combination.
ï¬ ï¬ I pasted the "fi" and "fl" ligatures in this email. I hope they get through. I don't know the package you are using, I hope it has arguments that tell it about ligatures. Rich On Wed, Feb 29, 2012 at 6:49 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote: > > Hello everybody, >> I work, I try, with TM but I have a problem with some special words in >> french. I think this is due to the manner to transform PDF to text, but >> I'm >> not perfectly sure. >> Let's see to the example : >> >> findFreqTerms(tdm1,30) >> [33] "<U+F0A3>" "<U+FB01>n" "<U+FB01>nancement" >> "<U+FB01>nancier" "<U+FB01>nancière" "<U+FB01>nancières" >> "<U+FB01>nanciers" "<U+FB01>xe" >> >> Some french words are not well reading by TM with the reader readPlain. I >> try to use reader= reader PDF. But it doesn't work so I must transformed >> PDF >> text to text. And some words are not understand so when I use >> TermDocumentMatrix a word like inflation diseappear. It's a big probleme >> for >> me. I spend lot of time on this problem, any idea ? Thank's for you time. >> > > You included no information about your platform, locale settings, or > encoding of the text. > > ?Encoding > ?sessionInfo > > -- > > David Winsemius, MD > West Hartford, CT > > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.