Well, 0xFFFF fits in a unsigned short, two bytes. An unsigned int is 4 bytes, and 0xFFFFFFFF fits in it. I don't think that it is the size of the int that's the problem.
On Wed, Aug 5, 2020 at 12:52 AM John Gardner <[email protected]> wrote: > Going from the fact that unimap.c > <https://github.com/n-t-roff/heirloom-doctools/blob/19c8adab5f59a2c8eba9f546fb36bdbbff86937d/troff/troff.d/unimap.c> > uses > an int > <https://github.com/n-t-roff/heirloom-doctools/blob/f3a16e2ba0c411441fd5de5340be73674bd51307/troff/troff.d/unimap.h#L29> > for > storing Unicode codepoints, it might be that Heirloom uses a data type of > insufficient size (an unsigned int is limited to values between 0–0xFFFF, > meaning astral codepoints get truncated in memory). > > In other words, it's a bug in Heirloom Doctools. > > On Wed, 5 Aug 2020 at 14:26, T. Kurt Bond <[email protected]> wrote: > >> Thanks for the tip! As it turns out, I am using the OTF. >> >> On Wed, Aug 5, 2020 at 12:24 AM John Gardner <[email protected]> >> wrote: >> >>> > The version I got was .ttf, not .otf >>> >>> I opened both the original OTF <https://dn-works.com/ufas/> and the >>> FontLibrary.org >>> TTF <https://fontlibrary.org/en/font/symbola> in Glyphs >>> <https://glyphsapp.com/>; the OTF has 12,589 glyphs, whereas the TTF >>> only has 7,956 glyphs. >>> >>> Try the OTF version of Symbola. In fact, *always* prefer an OTF over >>> TTF when possible. >>> >>> On Wed, 5 Aug 2020 at 13:10, Richard Morse <[email protected]> wrote: >>> >>>> Hm. Just for my edification, I tried a few things. >>>> >>>> I’m on a Mac, and I don’t know when I compiled Heirloom troff, but it >>>> was a year or two ago, so something things may be different. >>>> >>>> I downloaded the Symbola font from fontlibrary.org. The version I got >>>> was .ttf, not .otf. >>>> >>>> The various things that you tried did not work for me either. \[u1F0A1] >>>> did work, but that’s because (according to fret, at least), that’s the >>>> font’s internal name for the symbol, which is not guaranteed to be true >>>> across all fonts, so you can’t really use that for a “fallback” system. >>>> >>>> Looking at the output of troff without going through dpost, it looks >>>> like it is completely ignoring the character. I tried explicitly setting >>>> LC_CTYPE to ‘en_US.UTF-8’ and ‘UTF-8’ (both in the terminal, and using the >>>> .lc_ctype command), but that had no effect. >>>> >>>> I wonder if troff has a compiled in list of unicode characters that it >>>> understands, and if you try to use one it deems invalid it just ignores it? >>>> (This may be borne out by >>>> https://github.com/n-t-roff/heirloom-doctools/blob/master/troff/troff.d/unimap.c >>>> , but I don’t really know enough about the code to be certain.) >>>> >>>> Ricky >>>> >>>> > On Aug 4, 2020, at 10:14 PM, T. Kurt Bond <[email protected]> >>>> wrote: >>>> > >>>> > In Emacs M-x describe-coding-system tells me the coding system for >>>> saving the buffer is utf-8-unix. I don't have any LC_* environment >>>> variables set, but LANG=en_US.UTF-8. >>>> > >>>> > I'm not very knowledgeable about the insides of Unicode fonts, >>>> unfortunately. >>>> > >>>> > On Tue, Aug 4, 2020 at 4:27 PM Richard Morse <[email protected]> wrote: >>>> > Huh. I’m afraid I’m out of my depth then; you might check and see if >>>> your LC_* environment variables are set to something incompatible with >>>> utf-8 (or, maybe, check and make sure the file in UTF-8, not UCS-16 or >>>> something if you’re on Windows), but hopefully someone with more experience >>>> and knowledge will speak up… >>>> > >>>> > Ricky >>>> > >>>> > > On Aug 4, 2020, at 3:59 PM, T. Kurt Bond <[email protected]> >>>> wrote: >>>> > > >>>> > > And if I add "and explicit unicode character reference \U'1F0A1'" >>>> to the >>>> > > file, that character doesn't show up either. >>>> > > >>>> > > On Tue, Aug 4, 2020 at 2:47 PM Richard Morse <[email protected]> wrote: >>>> > > >>>> > >> According to the Heirloom Troff manual, I think that you cannot >>>> just >>>> > >> insert Unicode characters (although maybe if your LC* environment >>>> variables >>>> > >> are set correctly, you can?). It says: >>>> > >> >>>> > >>> Both nroff and troff allow references to specific Unicode >>>> characters >>>> > >> with the \U'X' escape sequence; >>>> > >>> it causes the character at position U+X to be printed (X is a >>>> > >> hexadecimal number). For troff, >>>> > >>> it is required that this character is available in one of the >>>> fonts >>>> > >> mounted at this point. >>>> > >>> As an example, \U'20AC' prints the Euro character €. When >>>> register .g is >>>> > >> set to 1 Unicode >>>> > >>> characters can also be accessed with \[uXXXX] where XXXX is a >>>> four digit >>>> > >> hexadecimal number. >>>> > >> >>>> > >> So I think you would need to use `\U'1F0A1'` for the character to >>>> show up? >>>> > >> >>>> > >> Ricky >>>> > >> >>>> > >> >>>> > >>> On Aug 4, 2020, at 12:28 PM, T. Kurt Bond <[email protected]> >>>> wrote: >>>> > >>> >>>> > >>> (The heirloom-doctools README.md >>>> > >>> < >>>> https://github.com/n-t-roff/heirloom-doctools/blob/master/README.md> >>>> > >> says >>>> > >>> to ask Heirloom doctools questions on this list.) >>>> > >>> >>>> > >>> I'd like to use the Symbola font in Heirloom troff. I tried the >>>> > >> following: >>>> > >>> >>>> > >>> .do xflag 3 >>>> > >>> .\" fp 5 Optima Optima-Regular ttf >>>> > >>> .fp 5 Symbola Symbola otf >>>> > >>> .LP >>>> > >>> Here is some normal text. >>>> > >>> .\" PLAYING CARD ACE OF SPACES is Unicode 0x1F0A1 >>>> > >>> .ft Symbola >>>> > >>> 🂡 And some normal text. ❊ >>>> > >>> .ft P >>>> > >>> More normal text. >>>> > >>> >>>> > >>> That's a literal PLAYING CARD ACE OF SPADES Unicode character at >>>> the >>>> > >> start >>>> > >>> of the line between the two .ft requests. That character does >>>> not show >>>> > >> up >>>> > >>> in the troff output, even through the EIGHT TEARDROP-SPOKED >>>> PROPELLER >>>> > >>> ASTERISK Unicode character at the end of the line *does* show up, >>>> > >>> as CPSuni274A where the CPS<name> outputs the character of that >>>> name. >>>> > >> The >>>> > >>> Symbola font is embedded in the PDF output (created from the >>>> PostScript >>>> > >>> output), and the text "And some normal text" and the EIGHT >>>> > >> TEARDROP-SPOKED >>>> > >>> PROPELLER ASTERISK Unicode character are in the Symbola font in >>>> the troff >>>> > >>> output. >>>> > >>> >>>> > >>> However, if I manually add a CPSuni1F0A1 to the troff output, >>>> *that* >>>> > >> character >>>> > >>> *does* show up. >>>> > >>> >>>> > >>> Any ideas as to why the literal PLAYING CARD ACE OF SPADES Unicode >>>> > >>> character in the document source is being ignored and not written >>>> to the >>>> > >>> troff output? >>>> > >>> >>>> > >>> I actually have a document that needs to use the PLAYING CARD ACE >>>> OF >>>> > >> SPADES >>>> > >>> Unicode character. The ultimate goal is to have the Symbola font >>>> used >>>> > >> as a >>>> > >>> fallback font, which should happen automatically in Heirloom >>>> troff, since >>>> > >>> it searches all the fonts when a font is missing a character, but >>>> I made >>>> > >>> the example use the Symbola font directly because that shows the >>>> problem >>>> > >>> directly. >>>> > >>> >>>> > >>> -- >>>> > >>> T. Kurt Bond, [email protected], https://tkurtbond.github.io >>>> > >> >>>> > >> >>>> > > >>>> > > -- >>>> > > T. Kurt Bond, [email protected], https://tkurtbond.github.io >>>> > >>>> > >>>> > >>>> > -- >>>> > T. Kurt Bond, [email protected], https://tkurtbond.github.io >>>> >>>> >>>> >> >> -- >> T. Kurt Bond, [email protected], https://tkurtbond.github.io >> > -- T. Kurt Bond, [email protected], https://tkurtbond.github.io
