On 6 Sep, [EMAIL PROTECTED] wrote: >>>>>> "D" == Derek B Noonburg <[EMAIL PROTECTED]> writes: > > D> On 5 Sep, [EMAIL PROTECTED] wrote: >>> Perhaps you can look at >>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440747 >>> even thought it might not be your fault :-) > > D> Looks like font subsets without any useful encoding info. > > I see, no matter what -enc one guesses (Big5 or UTF8* surely), there > is no hope of extracting any chars, and one can only read the file > with xpdf, like it was just a big image blob?
Right. The text encoding ("-enc ...") only affects the final output. Internally, Xpdf takes two steps: first it converts all text to Unicode, then it converts Unicode to the selected text encoding (Big5, UTF-8, etc.). The terminology is a little confusing -- "encoding" means a couple different things. You're familiar with text encodings, as mentioned above. Fonts also have encodings, which map character codes (used in PDF text drawing operations) to either glyph names or glyph IDs or CIDs (depending on the font type - but basically it's some sort of ID used internally by the font to select the glyph to draw). If a PDF font does not have a "ToUnicode" map, and does not have usable encoding information (standard glyph names), then the first step (converting text to Unicode) doesn't work, and your choice of output text encoding doesn't matter at all. - Derek -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]