Benjamin Riefenstahl wrote: > Hi Gregg, > Hi Benny,
> > I am not an Emacs developer, and I don't plan to work on this issue > right now. I also don't believe that you have brought up new > arguments to change the decisions about how this is to done in Emacs > in the future. I do think that an occasional check of these ideas is > a good thing, though. Fair enough. (However, my original post wasn't intended to be argumentative, but to ask about some specific design options. So to be clear, I don't mean to advocate any particular option at this point, since I don't know enough about Emacs internals. The general observation that graphical layout, text reordering (via bidi or any other algo), and shaping are mutually orthogonal applies generally to any notion of text processing. > > So this exchange is mostly about: "Would I personally find useful > software that worked along the lines hat you suggest." > Yep; this is my itch. I happen to think scratching it would benefit many others, but of course that is (informed) speculation. > Gregg Reynolds writes: > >>Two things. One is, directionality a design choice, not a >>reflection of some kind of objective reality. > > > The fact that Arabic and other scripts are written and read from right > to left is a design feature of the script that we can't just ignore > when we implement it in computers, we have to deal with it at some > level. The question is at *which* level. Sorry, I wasn't clear. I mean that modeling a single script/language as mono- or bi-directional is a design choice, not a statement of a Law of Nature. This might be better expressed by saying the choice of number polarity - MSD or LSD first in strings - is a design choice. One could model English text with LSD-first digit strings if one wanted. This means, among other things, that "RTL" does not imply "bidi", any more than "LTR" does. RTL/LTR refers solely to graphical syntax, not to an encoding model. > >>There is no necessary relationship between the IO model implemented >>by an application and the corresponding textual representation, > > > Exactly. Which is why Unicode put the complicated parts into the IO > model (for human IO) with BIDI reordering, while any software module > that doesn't have human IO can completely ignore the issue. The same I don't understand what you say here. Unicode as I understand it doesn't have anything at all to say about IO; it just defines character semantics and syntax (accent after base char, etc.) Note that there are no complicated parts for monodirectional text. It's the bidi requirement itself that creates the complication. Another clarification: I'm not arguing against bidi support where it is truly needed, namely in mixed language texts. Nor am I arguing that Emacs should not have bidi support - it should, obviously. I guess the point is that we can get there in stages. First you implement RTL layout, then shaping, then bidi. That way we have *usable* software without having to wait for bidi support, and eventually we do have full bidi support. Vim provides the model: you can switch on/off RTL layout and Arabic shaping independently; hopefully someday somebody will add bidi support too. But in the meantime it is very useful for working with Arabic text. I'd simply like for Emacs to be as useful, since I'm firmly in the Emacs camp when it comes to editors. > goes for most software that directly implements human IO but uses > pre-fabricated building blocks for it (using e.g. GTK or Qt). > > If OTOH you use visual ordering in the encoding you make life easier > for a few primitive versions of the IO and complicated for all the > rest of the software. Not to mention that it makes it even more > complicated for more advanced - read: user-friendly - versions of IO. I don't see how. Can you provide an example of how this would make things more complicated? I mean other than with math routines. That I admit is the big problem. There are ways around it, but that's for another thread. > > There is a third possibility in our case, using visual order within > Emacs and only storing the text in logical order. That is possible in > a simple text editor (and I am sure there are some of those around). > But Emacs does a lot more, of course. Every module in Emacs that > needs to look at the logical order would have to make the reordering > anyway. And as Emacs is about text processing that would probably be I don't see why. Example? > a lot of modules. > > That's I think we're talking about two separate things. In my opinion, the internal encoding used by Emacs is irrelevant, so long as I know what it is. I just want RTL layout and Arabic shaping, both of which simply operate on a string of chars/glyphs. Actually, even when Emacs has full bidi support, I would still want a "transparent" mode that will provide a graphical representation of the true (physical) ordering of the text. The question of how best to represent text internally is an interesting one, but I haven't given it much thought. I do think Emacs did the right thing by *not* adopting Unicode as its internal representation. the choice. I personally prefer the first way of doing it. > > >>More important is that RTL has no necessary relationship to mixed >>content or bidi reordering. If you only ever write documents in >>Arabic (Hebrew, Persian, Pashto, whatever) then why do you need >>bidi? > > > A large part (maybe still a majority) of the people that write Arabic > and Hebrew on computers write in more than just one language. This is > even if you discount numbers and trademarks. Yes, I've heard this claimed many times, but I've never seen any evidence to back it up. My personal experience is that it is simply not true. In the Arab world, at least, *most* people do *not* operate in multiple languages (just like in the US), and from what I've personally seen they get along fine using Arabic only on a computer, just as most Americans get along fine using English only. Even scholarly articles written in English about Arabic generally use transliteration. Things are no different in the Arab world. When newspapers need to write "CNN" or "FBI", they transliterate it. Then need for full mixed directional support is quite specialized, probably everywhere in the world. Add to that the fact that multilanguage computing w/out bidi support is quite feasible. I do it all the time using Vim and even Emacs. > > >>To be clear: monolingual Arabic text is not mixed content, whether >>it contains digit strings or not. So why should an Arabic user pay >>the Unicode tax of bidi support? > > > A large part of the user base right now does need mixed content. So That may be true for the *current* emacs user base. Then again, Emacs has no RTL user base, since Emacs doesn't support RTL. Whether or not the potential RTL user base truly needs multilanguage (mixed directionality) support is a matter of speculation. But we *know* that they need RTL layout and shaping, and we also know that RTL layout and shaping is sufficient to make software useful. Besides, to me the user base is everybody in the world. Whoever wants to use it, should be able to use it. Lack of bidi support need not prevent the software from being useful for people who don't need bidi support. > you would get the tax of supporting several versions of software, the > software for people that don't need mixed content and another version > for people that do. Even if the first version on its own might be > cheaper, on the whole this will get more costly. Not to mention that > it would end up in a system where the "natives" get the "stupid" > mono-lingual software and the "experts" and the westerners can afford > the "intelligent" software for the mixed content. I guess I wasn't clear - see my note above. As you note, it wouldn't make much sense to support two RTL versions of a piece of software, one with and one without bidi support. But there would be no reason to do so; RTL w/out bidi would just be a stage on the way to full bidi implementation. It's interesting that you perceive the "intelligent" software as the stuff with bidi support. In my experience it is just the opposite: editors with Unicode bidi support are really stupid, from the end user point of view. They are often almost impossible to use, thanks to bizarro cursor behaviour and the directional ambiguity Unicode explicitly assigns to characters like puncuation, parens, etc. I find Vim much simpler and more user-friendly. Somewhere in the GCC list archives there's a note from RMS in response to an issue involving support for some obscure feature of the ISO C++ standard (if I recall correctly), in which he says it all in a very few words, something along the lines of "Standards are recommendations; we should design to meet the needs of our community; if the Standard helps with that, then we support it, but if not we shouldn't hesitate to ignore it and do what is best for the community." Software support for Unicode RTL scripts provides a classic example of getting things backwards - designing to satisfy the standard instead of community needs. In summary, there's more than one way to skin a cat, as the (American) saying goes. Emacs (and other software) can be quite useful to RTL users without bidi support. It's better to have bidi support, naturally, but the cost if bidi implementation need not stand in the way of providing useful stuff, and providing useful stuff by supporting non-bidi RTL and shaping need not inhibit implementation of bidi support. thanks, gregg _______________________________________________ emacs-bidi mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-bidi
