D> (converting text to Unicode) doesn't work, and your choice of output D> text encoding doesn't matter at all.
I see. The reason I can still read the electric bill using xpdf is it must be stored in some wasteful image-like format, but not a pdfimages kind of image. OK. And if a chars are to be extracted, there must be the triple yes line below, I suppose. $ pdffonts -upw xxxxx phone_bill.pdf name type emb sub uni object ID ------------------------------------ ------------ --- --- --- --------- IKKPHJ+DFKaiShu-SB-Estd-BF CID TrueType yes yes yes 2 0 <--pdftotext can use this MingLiU CID TrueType no no no 12 0 DFKaiShu-SB-Estd-BF CID TrueType no no no 13 0 $ pdffonts -opw '' pdfbug.pdf #electirc_bill.pdf name type emb sub uni object ID ------------------------------------ ------------ --- --- --- --------- 10E58a5CourierNew CID TrueType yes no no 10 0 6E5578TimesNewRoman CID TrueType yes no no 11 0 7E5591DFKaiShuW7-B5 CID TrueType yes no no 12 0 9E57c1Batang CID TrueType yes no no 13 0 8E56ecDFKai-SB CID TrueType yes no no 14 0 Thanks. (By the way there's also http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440746 but that is a pdftohtml bug, so never mind.) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]