Hi Team,
PDF viewers are not rendering all of the tamil letters as expected in the
PDF generated using PDFbox. It seems I have to do the required
substitutions while generating the PDF to get it rendered as expected.
Attempting the substitutions, any help would be appreciated.
Ligature Substitutions - Tamil Use Cases
Below are the 5 possible cases for a base character to join with vowels.
There are 18 base characters, however the cases will be the same for the
remaining seventeen.
Case 1 - vowel follows the base character - No change required. PDF
viewers render as expected.
க + ா = கா
Case 2 - Vowel on top of base character - No change required. PDF
viewers render as expected.
க + ி = கி
க + ீ = கீ
க + ் = க்
Case 3 - base character follows the vowel - Need to reverse the
glyphes
க + ெ = கெ -> ெ + க = கெ
க + ே = கே -> ே + க = கே
க + ை = கை -> ை + க = கை
Case 4 - base character follows the composite vowel - Need to split and
reorder the glyphs
க + ொ = கொ -> க + ெ + ா -> ெ + க + ா = கொ
க + ோ = கோ -> க + ே + ா -> ே + க + ா = கோ
க + ௌ = கௌ -> க + ெ + ள -> ெ + க + ள = கௌ
Case 5 - Base character and vowel needs to point new glypse id - New
resultant glyphe without unicode character - Substitute new glyphe for a
series of glyphes
க + ு = கு -> கு
க + ூ = கூ - > கூ
Below in table representation,
Input text
JDK
TTF
PDFbox generate PDF
Input text
Char Sequence
Code points
gid
Actual*
Expected
க்
க + ்
2965 3021
Character : க
Codepoint : 2965
unicode : ub95
Character : ்
Codepoint : 3021
unicode : ubcd
1828
1862
க்
க்
All good
கா
க + ா
2965 3006
Character : க
Codepoint : 2965
unicode : ub95
Character : ா
Codepoint : 3006
unicode : ubbe
1828
1851
கா
கா
All good
கி
க + ி
2965 3007
Character : க
Codepoint : 2965
unicode : ub95
Character : ி
Codepoint : 3007
unicode : ubbf
1828
1852
கி
கி
All good
கீ
க + ீ
2965 3008
Character : க
Codepoint : 2965
unicode : ub95
Character : ீ
Codepoint : 3008
unicode : ubc0
1828
1853
கீ
கீ
All good
கு
க + ு
2965 3009
Character : க
Codepoint : 2965
unicode : ub95
Character : ு
Codepoint : 3009
unicode : ubc1
1828
1854
கு
கு (gid = 6698)
New glyphe expected.
கூ
க + ூ
2965 3010
Character : க
Codepoint : 2965
unicode : ub95
Character : ூ
Codepoint : 3010
unicode : ubc2
1828
1855
கூ
கூ ( gid = 6716)
New glyphe expected.
கெ
க + ெ
2965 3014
Character : க
Codepoint : 2965
unicode : ub95
Character : ெ
Codepoint : 3014
unicode : ubc6
1828
1856
கெ
ெ + க = கெ
Reversing the glyphes expected.
கே
க + ே
2965 3015
Character : க
Codepoint : 2965
unicode : ub95
Character : ே
Codepoint : 3015
unicode : ubc7
1828
1857
கே
ே + க = கே
Reversing the glyphes expected.
கை
க + ை
2965 3016
Character : க
Codepoint : 2965
unicode : ub95
Character : ை
Codepoint : 3016
unicode : ubc8
1828
1858
கை
ை + க = கை
Reversing the glyphes expected.
கொ
க + ொ
2965 3018
Character : க
Codepoint : 2965
unicode : ub95
Character : ொ
Codepoint : 3018
unicode : ubca
1828
1859
கொ
க + ெ + ா
ெ + க + ா = கொ
Split and reorder expected.
கோ
க + ோ
2965 3019
Character : க
Codepoint : 2965
unicode : ub95
Character : ோ
Codepoint : 3019
unicode : ubcb
1828
1860
கோ
க + ே + ா
ே + க + ா = கோ
Split and reorder expected.
கௌ
க + ௌ
2965 3020
Character : க
Codepoint : 2965
unicode : ub95
Character : ௌ
Codepoint : 3020
unicode : ubcc
1828
1861
கௌ
க + ெ + ள
ெ + க + ள = கௌ
Split and reorder expected.
* Actual - the dotted circle will be invisible.
Attached the actual output and expected output. Did a hard coded
substitution(For the glyphe id without having unicode, hardcoded at
PDCIDFontType2#public byte[] encode(int unicode). Reverse, split and
reorder input text charsequence before calling the showtext. Also added the
glyphe id that does not have a unicode at TrueTypeEmbedder Subsetter for
embedding the glyphe into the generated pdf.) just to obtain the expected
output.
How to handle these substitutions in an efficient way? Looking at the
GlyphSubstitutionTable, fontbox.cmap.Identity-H,
fontbox.unicode.Scripts.txt. Couldn’t get it so far. Any help would be
appreciated.
thank you,
Jeyan
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]