> Date: Tue, 02 Feb 2010 09:08:40 +0100 > From: martin rudalics <[email protected]> > CC: [email protected], [email protected] > > > I already implemented such a feature: a per-buffer variable that > > forces all paragraphs to be either L2R or R2L. A value of `nil' means > > the direction of each paragraph is dynamically determined by applying > > the rules described in the Unicode Standard Annex 9 (UAX#9). > > I meant a function which does (1) set such a variable
You mean, besides "M-x set-variable RET"? > and (2) apply it to one or all windows showing a buffer. Currently, the variable is per-buffer, so it affects all the windows showing that buffer. Why would one need to do that only in some windows showing a buffer? > Calling this function would temporarily override any L2R/R2L > specifications specified for a file, buffer, or paragraph. There are no specifications for a file (unless you set the variable I'm talking about in file's local variables section). As for individual paragraphs, control of their base direction is not by some Emacs setting, but by inserting special formatting characters at the beginning of each paragraph. These characters (LRM and RLM) are supposed to be invisible by default, i.e. displayed as zero-width space, but they have strong directionality, L for LRM and R for RLM. Since UAX#9 says that a paragraph's base direction is determined by its first strong directional character, each one of these two characters sets the paragraph direction according to directionality of the character. It would be easy enough to write a command that inserts LRM or RLM at the beginning of each paragraph in a buffer or region. But that's application level, and I still have a lot of turf to cover before I get to that. > BTW, do UAX#9 paragraphs require new definitions for `paragraph-start' > or `paragraph-separate'? It does: Paragraphs are divided by the Paragraph Separator or appropriate Newline Function [...]. Paragraphs may also be determined by higher-level protocols: for example, the text in two different cells of a table will be in different paragraphs. and the table of Bidirectional Character Types says that a Paragaraph Separator type is assigned to the following characters: Paragraph separator, appropriate Newline Functions, higher-level protocol paragraph determination Accordingly, in the Unicode Database, the characters CR and LF (a.k.a. NL) that normally separate lines have the Paragraph Separator (B) type. This could sound like a disaster (each line being a separate paragraph), since Emacs uses hard newlines to fill paragraphs. Fortunately, UAX#9 leaves a fire escape: it says (see above) that paragraphs can also be determined by ``higher-level protocols''. I used this fire escape to preserve the normal Emacs notion of a paragraph, including the usual sense of `paragraph-start' and `paragraph-separate'. For instance the code that determines the base direction of each paragraph looks back for a position that matches `paragraph-start', and then finds the first strong directional character after that. So UAX#9 does define a default for paragraph start that is different from Emacs, but gives us a way to preserve ours. Which we did. _______________________________________________ emacs-bidi mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-bidi
