[emacs-bidi] Re: RTL support

Benjamin Riefenstahl Mon, 28 Nov 2005 06:42:16 -0800

Hi Gregg,

Sorry, this is much too long again :-(

Gregg Reynolds writes:
> This might be better expressed by saying the choice of number
> polarity - MSD or LSD first in strings - is a design choice.  One
> could model English text with LSD-first digit strings if one wanted.

At the encoding level you could.  At the input level you have to
emulate how the users would do it on paper.  If you don't, users will
find that awkward and look for something else.  We don't design
scripts, we just design how they are implemented in computers.

> RTL/LTR refers solely to graphical syntax, not to an encoding model.

RTL refers to how I read, write and line-break a written phrase as a
human being.  Computers have to map this to their notions of graphics
and coordinate-systems.  And than they have to store it in a
convenient form.

> I guess the point is that we can get there in stages.  First you
> implement RTL layout, then shaping, then bidi.

It *is* done in stages.  We have had Hebrew modules in Elisp for ages.
We have emacs-bidi now.  The only thing is that this is not in the
stock Emacs.  The same as support for FE scripts was not in stock
Emacs until the developers thought it was mature enough and somebody
had (or made) the necessary time to integrate it.

This development process takes time.  The developers do it in their
free time, after all, and the issues are complicated.  Doing it in the
stages that you indicated would probably take more time (and require
more overhead in discussing it :-() than just doing it well enough in
fewer steps.

>> A large part (maybe still a majority) of the people that write
>> Arabic and Hebrew on computers write in more than just one
>> language.  This is even if you discount numbers and trademarks.
>
> Yes, I've heard this claimed many times, but I've never seen any
> evidence to back it up.  My personal experience is that it is simply
> not true.  In the Arab world, at least, *most* people do *not*
> operate in multiple languages (just like in the US),

Hm.  In your opinion, native Arab speakers who only work in their own
language are how much of the whole user base for Arabic?  50%?  70%?
90%?

There are westerners studying Arabic, business types (native Arabs and
westerners) doing multi-language word-processing (think of any larger
company), there are programmers, scripters, folks building HTML pages,
even graphical designers that need to incorporate correct Western
elements into their designs, ... I've probably forgotten some
important groups.  All of these work with mixed content.  I'd say
these are probably much more than 30% of the computer users and even
just 10% would IMO be more than enough of a market to justify to
implement mixed content and not start a separate code branch for RTL
only.

> Even scholarly articles written in English about Arabic generally
> use transliteration.

There were times when they didn't.  Now authors regularly apologize to
their readers that they can't do it.  So there is demand, but
publishers currently think it's too expensive.

> Things are no different in the Arab world.  When newspapers need to
> write "CNN" or "FBI", they transliterate it.

As far as I have seen (and you have probably more experience),
newspapers and magazines sometimes do and sometimes don't.  But
advertisements seem to be more likely to use at least their brand
names in Latin script.  And those advertisements targeting the wealthy
probably use English marketing phrases, as they do over here in
Germany, right?

>> A large part of the user base right now does need mixed content.  So
>
> That may be true for the *current* emacs user base.

The current base and the base of the *very* near future is what Emacs
is written for.  Like most software Emacs is not written for an ideal
world.  That would waste too much resources.

Emacs is actually an exception in that it is so easily adapted to
users' needs in most respects, that it has a much better flexibility
than a lot of other software.  This is more difficult with some things
than with others, as this example shows.  And even here we actually
have emacs-bidi now.

> Besides, to me the user base is everybody in the world.  Whoever
> wants to use it, should be able to use it.

In the way that you are framing it, I think that is unrealistic.

benny

PS: Clarifications of my previous post:

>> [...] Unicode put the complicated parts into the IO model (for
>> human IO) with BIDI reordering, while any software module that
>> doesn't have human IO can completely ignore the issue.  The same
>
> I don't understand what you say here.  Unicode as I understand it
> doesn't have anything at all to say about IO; it just defines
> character semantics and syntax (accent after base char, etc.)

Yes, Unicode specifies the encoding level.  But by exclusion those
things that are required by the task at hand, but are not specified in
Unicode itself, have to be done in other levels.  In this way Unicode
works on the basis of an implied architecture.

>>Not to mention that it makes it even more complicated for more
>>advanced - read: user-friendly - versions of IO.
>
> I don't see how.  Can you provide an example of how this would make
> things more complicated?

I spend too much on these posts already to also think of exiciting
examples, too.  Sorry ;-)

What I mean is that with a given visual encoding, if your IO model is
not the exact same as the encoding model for whatever reason, than the
encoding gets in your way in a big way and makes for even more
complicated code (== more bugs, less features).

So mandating a visual (or not-quite-logical) encoding was no realistic
choice for Unicode and is also not realistic for a text-processing
platform such as Emacs.

>> Every module in Emacs that needs to look at the logical order would
>> have to make the reordering anyway.  And as Emacs is about text
>> processing that would probably be a lot of modules.
>
> I don't see why.  Example?

E.g. search and replace.  I have a number of functions that do that to
fix up text automatically.  This works on Emacs' internal text model.
Sometimes I need to find stuff in a certain order, which is of course
the logical order.  If I were processing bidi text (say an XML file
containing Syriac content, that's something that I actually have) and
the text was not in logical order, I'd have to think about it.
Everytime when I code a search I'd have see if the logical/visual
dichotomy has an impact in the particular case.

Let's say only half of the approximately thousand Elisp modules in
stock Emacs have a need for a similar review at a superficial level,
to see if they are compatible with the chosen visual or
not-quite-logical ordering.  That is a lot of work.  With a plain
logical ordering, such a review is probably still needed in some
places, but changes should be necessary much more rarely.

_______________________________________________
emacs-bidi mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-bidi

[emacs-bidi] Re: RTL support

Reply via email to