Hi Gregg,
Sorry, this is much too long again :-( Gregg Reynolds writes: > This might be better expressed by saying the choice of number > polarity - MSD or LSD first in strings - is a design choice. One > could model English text with LSD-first digit strings if one wanted. At the encoding level you could. At the input level you have to emulate how the users would do it on paper. If you don't, users will find that awkward and look for something else. We don't design scripts, we just design how they are implemented in computers. > RTL/LTR refers solely to graphical syntax, not to an encoding model. RTL refers to how I read, write and line-break a written phrase as a human being. Computers have to map this to their notions of graphics and coordinate-systems. And than they have to store it in a convenient form. > I guess the point is that we can get there in stages. First you > implement RTL layout, then shaping, then bidi. It *is* done in stages. We have had Hebrew modules in Elisp for ages. We have emacs-bidi now. The only thing is that this is not in the stock Emacs. The same as support for FE scripts was not in stock Emacs until the developers thought it was mature enough and somebody had (or made) the necessary time to integrate it. This development process takes time. The developers do it in their free time, after all, and the issues are complicated. Doing it in the stages that you indicated would probably take more time (and require more overhead in discussing it :-() than just doing it well enough in fewer steps. >> A large part (maybe still a majority) of the people that write >> Arabic and Hebrew on computers write in more than just one >> language. This is even if you discount numbers and trademarks. > > Yes, I've heard this claimed many times, but I've never seen any > evidence to back it up. My personal experience is that it is simply > not true. In the Arab world, at least, *most* people do *not* > operate in multiple languages (just like in the US), Hm. In your opinion, native Arab speakers who only work in their own language are how much of the whole user base for Arabic? 50%? 70%? 90%? There are westerners studying Arabic, business types (native Arabs and westerners) doing multi-language word-processing (think of any larger company), there are programmers, scripters, folks building HTML pages, even graphical designers that need to incorporate correct Western elements into their designs, ... I've probably forgotten some important groups. All of these work with mixed content. I'd say these are probably much more than 30% of the computer users and even just 10% would IMO be more than enough of a market to justify to implement mixed content and not start a separate code branch for RTL only. > Even scholarly articles written in English about Arabic generally > use transliteration. There were times when they didn't. Now authors regularly apologize to their readers that they can't do it. So there is demand, but publishers currently think it's too expensive. > Things are no different in the Arab world. When newspapers need to > write "CNN" or "FBI", they transliterate it. As far as I have seen (and you have probably more experience), newspapers and magazines sometimes do and sometimes don't. But advertisements seem to be more likely to use at least their brand names in Latin script. And those advertisements targeting the wealthy probably use English marketing phrases, as they do over here in Germany, right? >> A large part of the user base right now does need mixed content. So > > That may be true for the *current* emacs user base. The current base and the base of the *very* near future is what Emacs is written for. Like most software Emacs is not written for an ideal world. That would waste too much resources. Emacs is actually an exception in that it is so easily adapted to users' needs in most respects, that it has a much better flexibility than a lot of other software. This is more difficult with some things than with others, as this example shows. And even here we actually have emacs-bidi now. > Besides, to me the user base is everybody in the world. Whoever > wants to use it, should be able to use it. In the way that you are framing it, I think that is unrealistic. benny PS: Clarifications of my previous post: >> [...] Unicode put the complicated parts into the IO model (for >> human IO) with BIDI reordering, while any software module that >> doesn't have human IO can completely ignore the issue. The same > > I don't understand what you say here. Unicode as I understand it > doesn't have anything at all to say about IO; it just defines > character semantics and syntax (accent after base char, etc.) Yes, Unicode specifies the encoding level. But by exclusion those things that are required by the task at hand, but are not specified in Unicode itself, have to be done in other levels. In this way Unicode works on the basis of an implied architecture. >>Not to mention that it makes it even more complicated for more >>advanced - read: user-friendly - versions of IO. > > I don't see how. Can you provide an example of how this would make > things more complicated? I spend too much on these posts already to also think of exiciting examples, too. Sorry ;-) What I mean is that with a given visual encoding, if your IO model is not the exact same as the encoding model for whatever reason, than the encoding gets in your way in a big way and makes for even more complicated code (== more bugs, less features). So mandating a visual (or not-quite-logical) encoding was no realistic choice for Unicode and is also not realistic for a text-processing platform such as Emacs. >> Every module in Emacs that needs to look at the logical order would >> have to make the reordering anyway. And as Emacs is about text >> processing that would probably be a lot of modules. > > I don't see why. Example? E.g. search and replace. I have a number of functions that do that to fix up text automatically. This works on Emacs' internal text model. Sometimes I need to find stuff in a certain order, which is of course the logical order. If I were processing bidi text (say an XML file containing Syriac content, that's something that I actually have) and the text was not in logical order, I'd have to think about it. Everytime when I code a search I'd have see if the logical/visual dichotomy has an impact in the particular case. Let's say only half of the approximately thousand Elisp modules in stock Emacs have a need for a similar review at a superficial level, to see if they are compatible with the chosen visual or not-quite-logical ordering. That is a lot of work. With a plain logical ordering, such a review is probably still needed in some places, but changes should be necessary much more rarely. _______________________________________________ emacs-bidi mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-bidi
