Bruno Haible <br...@clisp.org> writes: >> Once I have established a good set of self tests, I will run them both >> against libunistring for 5.0.0 and 5.2.0 to see if I can find any string >> that behaves differently. > > I've now pushed an update to Unicode 6.0.0 as well. So, by choosing the > appropriate gnulib commit when you run ./autogen.sh in the libunistring > checkout, you can test against any of the Unicode versions 5.1.0, 5.2.0, > 6.0.0.
That is great. >> > How will this work with the glibc add-on? Will it incorporate some parts >> > of libunistring literally, or will it load libunistring dynamically? >> >> I have no idea yet. Right now, libidna doesn't even link to >> libunistring dynamically because I want to make sure I get the "right" >> libunistring implementation. > > You can dynamically load it (with dlopen()), then get a pointer to the > variable '_libunistring_version', fetch its value, and compare it with > your expectations. > > Or you can use parts of libunistring as gnulib modules, with module > 'libunistring-optional', and your library will be compiled to use a > maximum of the preinstalled library (still considering version requirements). I'll consider this. >> Given the complexities in IDNA2008 I am wondering whether it might not >> make more sense to let glibc ask a system daemon to do the string >> conversion rather than to do everything in glibc. There is still a lot >> of work being done on various pre- and post- IDNA2008 mappings because >> IDNA2008 by itself is neither backwards/forwards compatible or safe to >> use. This may be something you want to configure on a per-system basis. > > Why would you need a system daemon when you need a configuration file? > You need a daemon if different processes need to communicate or if > the configuration is so huge that it should be parsed only once. > For example, the 'pam' facility comes as a set of plugins dynamically > loaded by glibc; they have a set of configuration files; but no daemon > is needed. With the caveat that my implementation isn't finished, and I may realize better ways to do this, but if the RFC 5892 tables are generated dynamically from the Unicode properties, it will take significant time. Generating the property on a character-by-character basis would work, but without caching of results that may be too inefficient. Right now I'm hard coding the 5892 tables, because Unicode versions change relatively infrequently, but I'm less certain that is the right approach in the long term. A system daemon could generate the entire table and then convert strings quickly. A compromise may be to generate the tables during build time, and hard code a check against libunistring version. If the libunistring library is upgraded, you would have to rebuild libidna too. /Simon