D> (converting text to Unicode) doesn't work, and your choice of output
D> text encoding doesn't matter at all.

I see. The reason I can still read the electric bill using xpdf is
it must be stored in some wasteful image-like format, but not
a pdfimages kind of image. OK.

And if a chars are to be extracted, there must be the triple yes line
below, I suppose.

$ pdffonts -upw xxxxx phone_bill.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
IKKPHJ+DFKaiShu-SB-Estd-BF           CID TrueType yes yes yes      2  0 
<--pdftotext can use this
MingLiU                              CID TrueType no  no  no      12  0
DFKaiShu-SB-Estd-BF                  CID TrueType no  no  no      13  0

$ pdffonts -opw '' pdfbug.pdf #electirc_bill.pdf
name                                 type         emb sub uni object ID
------------------------------------ ------------ --- --- --- ---------
10E58a5CourierNew                    CID TrueType yes no  no      10  0
6E5578TimesNewRoman                  CID TrueType yes no  no      11  0
7E5591DFKaiShuW7-B5                  CID TrueType yes no  no      12  0
9E57c1Batang                         CID TrueType yes no  no      13  0
8E56ecDFKai-SB                       CID TrueType yes no  no      14  0

Thanks.
(By the way there's also
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440746 but that is a
pdftohtml bug, so never mind.)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to