Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Bram Moolenaar
Ron Aaron wrote: > On Friday, May 31, 2013 12:27:21 PM UTC+3, Bram Moolenaar wrote: > > > I find it a bit annoying that Unicode has two forms for the same character. > > They should have made a choice to either use a base character plus composing > > characters, or the combined form. Now we ne

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Bram Moolenaar
Christian Brabandt wrote: > On Fr, 31 Mai 2013, Ron Aaron wrote: > > > I think there should be an option (probably an option, not a regex > > flag) which controls whether or not the engine finds "ff" (unicode > > 0xfb00) when searching for "f", for example. It seems to me that > > most people ma

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
On Friday, May 31, 2013 4:29:04 PM UTC+3, LCD 47 wrote: Thanks for the links It is indeed a can of worms, but we need to live with it in some reasonable manner. -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. Fo

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
On Friday, May 31, 2013 4:39:14 PM UTC+3, Mike Williams wrote: > Alas I have zero knowledge of Hebrew so will have to bow to your > superior knowledge. You will know better the use case of finding base > characters with and without combining marks. Thanks, I guess... There are no ligatures in

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Mike Williams
On 31/05/2013 13:31, Ron Aaron wrote: On Friday, May 31, 2013 1:56:56 PM UTC+3, Mike Williams wrote: On 31/05/2013 11:23, Ron Aaron wrote: "ff" is a ligature, not a composed character. Although it has a decomposed form it cannot be recomposed with Unicode composing rules (f is not a composing c

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie LCD 47
On 31 May 2013, Bram Moolenaar wrote: [...] > I find it a bit annoying that Unicode has two forms for the same > character. They should have made a choice to either use a base > character plus composing characters, or the combined form. Now we need > to solve this in software everywhere. [...]

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Matteo Cavalleri
i'm basicaly new to unicode, so this message will probably make me look as a total newbie (which in fact i am), but i've been totally sucked in by this thread and started searching around in google... :) anyway, unless i understood everything wrong, if we have "ffi" and decompose it, and we decom

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
On Friday, May 31, 2013 2:02:27 PM UTC+3, Christian Brabandt wrote: > Wouldn't it be enough, if we enhance the equivalence classes a bit? I > have posted a patch, which enhances the equivalence classes a while ago. > It didn't include U+FB00, but I think, we could easily add the missing > char

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
On Friday, May 31, 2013 1:56:56 PM UTC+3, Mike Williams wrote: > On 31/05/2013 11:23, Ron Aaron wrote: > > "ff" is a ligature, not a composed character. Although it has a decomposed > form it cannot be recomposed with Unicode composing rules (f is not a > composing character) There are others inc

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Christian Brabandt
Hi Ron! On Fr, 31 Mai 2013, Ron Aaron wrote: > I think there should be an option (probably an option, not a regex flag) > which controls whether or not the engine finds "ff" (unicode 0xfb00) when > searching for "f", for example. It seems to me that most people may not need > it, but those of

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Mike Williams
On 31/05/2013 11:23, Ron Aaron wrote: I think there should be an option (probably an option, not a regex flag) which controls whether or not the engine finds "ff" (unicode 0xfb00) when searching for "f", for example. It seems to me that most people may not need it, but those of us who frequentl

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
I think there should be an option (probably an option, not a regex flag) which controls whether or not the engine finds "ff" (unicode 0xfb00) when searching for "f", for example. It seems to me that most people may not need it, but those of us who frequently edit multilingual or other rich texts

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Ron Aaron
On Friday, May 31, 2013 12:27:21 PM UTC+3, Bram Moolenaar wrote: > I find it a bit annoying that Unicode has two forms for the same character. > They should have made a choice to either use a base character plus composing > characters, or the combined form. Now we need to solve this in software

Re: Matching decomposable Unicode characters

2013-05-31 Fir de Conversatie Bram Moolenaar
Ron Aaron wrote: > I was puzzled when searching for this in some Hebrew text: > > /ארבע\Z/ > > That it did not match this: > > אַרְבָּעָה > > As it happens, the אַ is Unicode combined form of the aleph plus the > vowel patah. > > There are two issues: > > 1) First, is that the normal

Matching decomposable Unicode characters

2013-05-30 Fir de Conversatie Ron Aaron
I was puzzled when searching for this in some Hebrew text: /ארבע\Z/ That it did not match this: אַרְבָּעָה As it happens, the אַ is Unicode combined form of the aleph plus the vowel patah. There are two issues: 1) First, is that the normal user would expect a match here, since the sym