As per my other post, retried with Version 2008 (1.71) as from latest Ubuntu/Debian package. Still same issue: no combination of options seems to sort em- and en- dashes :-(
Stuart 2009/12/10 Ross Moore <[email protected]>: > Hello Stuart, > > On 10/12/2009, at 2:50 AM, Stuart Rossiter wrote: > >> Hi, >> >> This revisits issues raised (but not resolved) in a 2003 post: >> http://tug.org/mailman/htdig/latex2html/2003-August/002400.html >> >> It appears that latex2html is (still) converting em- and en-dashes to >> -- and - respectively. Since hyphens are also left as -, there is then >> no way to distinguish (in the HTML) between things that were en-dashes >> and normal hyphens (so you can't do the conversions to &endash; etc. >> manually, even if you want to). >> >> Also, the main script has do_cmd_texteemdash and do_cmd_textendash >> routines (to convert to --- and -- respectively), but these don't seem >> to get used when you explicitly use \textemdash and \textendash >> commands, which I thought would be a way round this problem (it still >> does the conversions to -- and -). > > No, that is not entirely correct. > The coding has: > > # these can be overridded in charset (.pl) extension files: > sub do_cmd_textemdash { join('','---', $_[0]);} > sub do_cmd_textendash { join('','--', $_[0]);} > > So if you set the charset then you can get other results. > > Alternatively, you can override these in a configuration file, > as that gets read after the main script has been loaded. > > >> >> So it appears that: >> >> -- latex2html can't distinguish these dashes properly (I assume that, >> as for quotes, this is an issue with being able to definitively >> identify them), although it's distinguishing *something* in doing the >> conversions to -- and - ! (so maybe this *can* be fixed?) > > It is also a matter of output encodings. > > By default, LaTeX2HTML was written to produce Latin 1 output, > that is, ISO-8859-1 encoding. > This does not include single characters for endash and emdash. > > If you want single characters, and HTML coding that validates, > then you must either use entities, or expand the charset, or both. > There are switches -unicode and -entities for this. > > With the -unicode switch you should get – and — > respectively, for -- and --- within normal paragraphs. > > With switches -unicode -entities then the parameter entities > are supposed to be translated into named entites: > – and &emdash; > > Or with switches -unicode -utf8 then you should get > the correct single characters in UTF8 encoding. > > >> >> -- there is also no way to "preserve" the dashes from the original in >> a way which would allow for accurate manual adjustments afterwards. > > This statement is true when you do not specify -unicode . > It is not true when you do include this switch. > > LaTeX2HTML was written at a time when browser support for Unicode > was very flaky indeed. That is why the defaults are what they are. > Since then web technologies have advanced considerably, and other > tools do quite a good job of translating LaTeX coding into HTML, > or XHTML or XML. > > On the other hand, customising LaTeX2HTML is not that hard, > **provided** you can use Perl, and have a good understanding > of just what it is that you really want to do. > > >> >> Am I missing something, or is there any advice people can offer? > > > Hopefully the above helps. > >> >> Thanks in advance, >> Stuart > > > Cheers, > > Ross > > ------------------------------------------------------------------------ > Ross Moore [email protected] > Mathematics Department office: E7A-419 > Macquarie University tel: +61 (0)2 9850 8955 > Sydney, Australia 2109 fax: +61 (0)2 9850 8114 > ------------------------------------------------------------------------ > > > > _______________________________________________ latex2html mailing list [email protected] http://tug.org/mailman/listinfo/latex2html
