FYI the encoding problems still exist in the master branch today. I am very interested in this patch by mpsuzuki, what can we do to move this forward?
On Wed, Mar 28, 2018 at 2:26 PM, suzuki toshiya <[email protected]> wrote: > Dear Adam, > > Adam Reichold wrote: >>> I see. where is the appropriate place to add a document of >>> poppler::ustring class itself? >> >> Personally, I would suggest Doxygen comments in the public header. > > Thanks! Now I'm trying to write... also I found Doxygen comments > for text_list needs the improvement. > > During the check of the existing functions (to add documents), > I found a few inconsistencies about BOM. > > * ustring::to_latin1() this function does not use iconv(), > this function just cast the types between unsigned short and > char. BOM could not be converted to Latin-1, but the exist of > BOM is not checked. if stored UTF-16 has a BOM, broken 8bit > would be inserted in the beginning of the result. > > * ustring::from_latin1() this function does not use iconv() > either. BOM is not inserted to the beginning. no-BOM UTF-16 > string is created. > > * ustring::to_utf8() BOM or no-BOM is decided by iconv(). > > * ustring::from_utf8() assuming iconv() returns with-BOM UTF-16. > > I would collect Debian software packages depending libpoppler-cpp, > and check how they use ustring object. In my rough check it > would be less than 10, checking all of them would not be so > time-consuming. If there are softwares which always the skip > first character of UTF-16 (based on the assumption as the > ustring is always with UTF-16 with BOM), some discussion is > needed. > > Regards, > mpsuzuki > > _______________________________________________ > poppler mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
