On Tue, Dec 10, 2002 at 05:18:38PM -0500, Branden Robinson wrote: > On Thu, Dec 05, 2002 at 08:33:08PM -0600, John Hasler wrote: > > However, if that data can only be usefully expressed in precisely that way > > (that is, reverse-engineering those algorithms would regenerate the file) > > then the copyright on the file is probably unenforceable. > > Exactly. If there is no possibility for original expression within the > technical constraints imposed, one has no ability to generate the sort > of work which copyright is designed to protect.
about 48 or more scripts are encoded. ASCII was frozen. That leaves 47! ways to order the scripts (and they did not choose alphabetic by english name). Latin alone has 840 "code points" (characters). Even given that there is a traditional ordering in the portions of this, there are other big spans that have no natural order. Bunch more choices made here. Then, each character has a potential of 22 binary "properties", (not derived from UnicodeData.txt, but in a separate file PropList.txt), and 14 "fields", most of which have 20 to 256 or more options. I would venture to guess that even with a perfect oracle, it would be essentially imposible to reverse engineer the Unicode data files, much less the ancillary algorithms. That is, a 32 bit search space with at least 36 properties to be discovered per data point is whopping big. Jim Penny